-
Notifications
You must be signed in to change notification settings - Fork 306
Open
Labels
Milestone
Description
Summary
Explore and implement context engineering techniques within the Response API to optimize conversation context management, reduce token usage, and improve response quality.
Background
As conversations grow longer, naive context management (appending all history) leads to:
- Token limit exhaustion
- Increased latency and cost
- Irrelevant context diluting important information
- Context window overflow
Context engineering aims to intelligently manage what context is passed to the LLM.
Research Areas
1. Context Compression
- Summarization: Automatically summarize older conversation turns
- Key extraction: Extract and retain only key facts/entities
- Sliding window: Keep only N most recent turns with compressed history
- Hierarchical summarization: Multi-level summaries (turn → topic → session)
2. Context Relevance Scoring
- Semantic similarity: Score each historical message against current input
- Attention-based selection: Use lightweight models to predict relevance
- Topic modeling: Group and select context by topic relevance
- Recency weighting: Balance relevance with recency
3. Dynamic Context Window
- Adaptive sizing: Adjust context size based on query complexity
- Budget allocation: Allocate token budget across context components
- Overflow strategies: Graceful degradation when context exceeds limits
4. Context Augmentation
- RAG integration: Retrieve relevant external knowledge
- Tool results caching: Reuse previous tool call results
- Cross-session context: Share relevant context across sessions
Potential Implementation
type ContextEngineer interface {
// Optimize context before sending to LLM
OptimizeContext(ctx context.Context, history []*StoredResponse, currentInput string, tokenBudget int) (*OptimizedContext, error)
// Score relevance of each historical message
ScoreRelevance(ctx context.Context, history []*StoredResponse, currentInput string) ([]float64, error)
// Summarize conversation history
SummarizeHistory(ctx context.Context, history []*StoredResponse) (*Summary, error)
}
type OptimizedContext struct {
Messages []Message
TokenCount int
DroppedCount int
Summary *string
Metadata map[string]any
}Success Metrics
- Token reduction ratio while maintaining response quality
- Response latency improvement
- User satisfaction scores (A/B testing)
- Context relevance precision/recall
References
- Lost in the Middle - Position bias in long contexts
- LongLLMLingua - Prompt compression
- MemGPT - Tiered memory management
Related
- Parent PR: [Feat][Memory] Add OpenAI Response API support #802
- Depends on: [Feat][Router] Support Milvus as Response API storage backend #803, [Feat][Router] Support Redis as Response API storage backend #804 (persistent storage for context metadata)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Backlog