-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Prototype: model_router_v0 with remote classifier service #2400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
331245d
62a6dd2
d475601
f60395d
e4356c4
1646190
257bc5a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |||||
| "name": "copilot-chat", | ||||||
| "displayName": "GitHub Copilot Chat", | ||||||
| "description": "AI chat features powered by Copilot", | ||||||
| "version": "0.34.0", | ||||||
| "version": "0.34.0-model-router-v0", | ||||||
|
||||||
| "version": "0.34.0-model-router-v0", | |
| "version": "0.34.0", |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -12,10 +12,29 @@ import { IInstantiationService } from '../../../util/vs/platform/instantiation/c | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { ChatLocation } from '../../../vscodeTypes'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { IAuthenticationService } from '../../authentication/common/authentication'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { ILogService } from '../../log/common/logService'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { IFetcherService } from '../../networking/common/fetcherService'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { IChatEndpoint } from '../../networking/common/networking'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { IExperimentationService } from '../../telemetry/common/nullExperimentationService'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { ICAPIClientService } from '../common/capiClient'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { AutoChatEndpoint } from './autoChatEndpoint'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { ReasoningClassifier } from './reasoningClassifier'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Exact model names for reasoning-capable models (more capable, expensive models) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| const REASONING_MODELS = [ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'claude-sonnet-4.5', | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'gpt-5-codex', | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'gpt-5', | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'gemini-3-pro-preview' | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ] as const; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Exact model names for low/no reasoning models (fast, cheaper models) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| const LOW_REASONING_MODELS = [ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'claude-haiku-4.5', | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'gpt-5-mini', | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'gpt-4.1', | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'gpt-5-nano', | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'grok-code-fast-1' | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ] as const; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+22
to
38
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Exact model names for reasoning-capable models (more capable, expensive models) | |
| const REASONING_MODELS = [ | |
| 'claude-sonnet-4.5', | |
| 'gpt-5-codex', | |
| 'gpt-5', | |
| 'gemini-3-pro-preview' | |
| ] as const; | |
| // Exact model names for low/no reasoning models (fast, cheaper models) | |
| const LOW_REASONING_MODELS = [ | |
| 'claude-haiku-4.5', | |
| 'gpt-5-mini', | |
| 'gpt-4.1', | |
| 'gpt-5-nano', | |
| 'grok-code-fast-1' | |
| ] as const; | |
| // Model names for reasoning-capable models (more capable, expensive models) | |
| // Please keep these lists up to date and document the status of each model. | |
| // Production models: currently available in the API | |
| const PRODUCTION_REASONING_MODELS = [ | |
| // Add actual production models here, e.g.: | |
| // 'gpt-4.1', | |
| ] as const; | |
| // Planned models: announced but not yet available | |
| const PLANNED_REASONING_MODELS = [ | |
| 'claude-sonnet-4.5', // planned | |
| 'gemini-3-pro-preview', // planned | |
| ] as const; | |
| // Hypothetical/test models: not available, used for testing or future-proofing | |
| const HYPOTHETICAL_REASONING_MODELS = [ | |
| 'gpt-5-codex', // hypothetical | |
| 'gpt-5', // hypothetical | |
| ] as const; | |
| // Model names for low/no reasoning models (fast, cheaper models) | |
| // Production models: currently available in the API | |
| const PRODUCTION_LOW_REASONING_MODELS = [ | |
| 'gpt-4.1', // production | |
| ] as const; | |
| // Planned models: announced but not yet available | |
| const PLANNED_LOW_REASONING_MODELS = [ | |
| 'claude-haiku-4.5', // planned | |
| ] as const; | |
| // Hypothetical/test models: not available, used for testing or future-proofing | |
| const HYPOTHETICAL_LOW_REASONING_MODELS = [ | |
| 'gpt-5-mini', // hypothetical | |
| 'gpt-5-nano', // hypothetical | |
| 'grok-code-fast-1', // hypothetical | |
| ] as const; |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a logic error in the control flow. When entry exists (line 190), the code checks if the current model matches requirements and returns early if it does (line 198). However, if the entry exists but the model doesn't match requirements (line 201), the code continues to line 204 to select a new model. The problem is that after selecting the new model and creating a new endpoint (line 210), the code updates the cache with the new endpoint AND the reserveTokenBank (line 211). But reserveTokenBank was just created and assigned to the reserve (line 179), not promoted from the existing entry. This means the existing entry's token bank is discarded. The previous logic (removed in this diff around lines 175-182) handled token refresh for existing entries, but the new implementation doesn't properly update the token bank when an existing entry's model needs to change.
| this._autoModelCache.set(conversationId, { endpoint: autoEndpoint, tokenBank: reserveTokenBank }); | |
| this._autoModelCache.set(conversationId, { endpoint: autoEndpoint, tokenBank: entry ? entry.tokenBank : reserveTokenBank }); |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 244 mentions "ModernBERT classifier" but this is misleading - the ReasoningClassifier class actually calls a remote API endpoint (as indicated by the class name and implementation). The comment should accurately describe that this uses a remote classifier service, not a locally-run ModernBERT model. This could confuse future maintainers about the architecture.
| // Use ModernBERT classifier to determine if query needs reasoning | |
| // Use remote classifier service to determine if query needs reasoning |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling in _shouldUseLowReasoningModel catches all errors and defaults to returning false (use reasoning model). However, this fallback behavior is questionable - if the classifier service is unavailable, it might be better to default to true (use low reasoning model) to reduce costs and latency, or to propagate the error rather than silently failing. Consider documenting why false is the appropriate fallback, or reconsider the fallback strategy.
| return false; | |
| // Fallback: If classifier is unavailable, default to low reasoning model to reduce costs and latency. | |
| return true; |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new model selection logic in automodeService.ts that uses the reasoning classifier is not covered by unit tests. The changes to resolveAutoModeEndpoint, _shouldUseLowReasoningModel, and _selectModelBasedOnReasoning methods introduce complex branching logic for model selection based on reasoning requirements, but there are no corresponding test files for automodeService. This makes it difficult to verify the correctness of the new routing logic and increases the risk of regressions. Consider adding unit tests that cover:
- Model selection when classifier returns reasoning required vs. not required
- Handling of existing cached entries with model switching
- Fallback behavior when classifier fails
- Model selection when target models are/aren't in available_models
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,70 @@ | ||||||||||||||||||||||||
| /*--------------------------------------------------------------------------------------------- | ||||||||||||||||||||||||
| * Copyright (c) Microsoft Corporation. All rights reserved. | ||||||||||||||||||||||||
| * Licensed under the MIT License. See License.txt in the project root for license information. | ||||||||||||||||||||||||
| *--------------------------------------------------------------------------------------------*/ | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| import { Disposable } from '../../../util/vs/base/common/lifecycle'; | ||||||||||||||||||||||||
| import { ILogService } from '../../log/common/logService'; | ||||||||||||||||||||||||
| import { IFetcherService } from '../../networking/common/fetcherService'; | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| // Remote reasoning classifier configuration | ||||||||||||||||||||||||
| export const REASONING_CLASSIFIER_API_URL = 'https://model-router-v0.yellowforest-598004f3.westus3.azurecontainerapps.io/predict'; | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
| // Remote reasoning classifier configuration | |
| export const REASONING_CLASSIFIER_API_URL = 'https://model-router-v0.yellowforest-598004f3.westus3.azurecontainerapps.io/predict'; | |
| /** | |
| * Remote reasoning classifier API endpoint. | |
| * The URL can be configured via the environment variable REASONING_CLASSIFIER_API_URL. | |
| * If not set, defaults to the development endpoint below. | |
| * WARNING: Do not use the default endpoint in production. Configure appropriately. | |
| */ | |
| export const REASONING_CLASSIFIER_API_URL = | |
| process.env.REASONING_CLASSIFIER_API_URL || | |
| 'https://model-router-v0.yellowforest-598004f3.westus3.azurecontainerapps.io/predict'; |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetch request to the external classifier API has no timeout configured. If the remote service becomes slow or unresponsive, this could block model selection indefinitely, degrading the user experience. Consider adding a timeout to the fetch options (e.g., 2-5 seconds) and handling timeout errors appropriately in the fallback logic.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| {"text":"What is the syntax for a for loop in Python?","label":1} | ||
| {"text":"How do I declare a variable in JavaScript?","label":1} | ||
| {"text":"What is the difference between let and const?","label":1} | ||
| {"text":"How do I print Hello World in Java?","label":1} | ||
| {"text":"What is the file extension for TypeScript files?","label":1} | ||
| {"text":"How do I create a new array in JavaScript?","label":1} | ||
| {"text":"What is the command to install npm packages?","label":1} | ||
| {"text":"How do I comment out code in Python?","label":1} | ||
| {"text":"What is the shortcut to format code in VS Code?","label":1} | ||
| {"text":"How do I check the Node.js version?","label":1} | ||
| {"text":"What does the === operator do in JavaScript?","label":1} | ||
| {"text":"How do I import a module in Python?","label":1} | ||
| {"text":"What is the git command to check status?","label":1} | ||
| {"text":"How do I create a function in JavaScript?","label":1} | ||
| {"text":"What is the syntax for an if statement in C#?","label":1} | ||
| {"text":"How do I convert a string to integer in Python?","label":1} | ||
| {"text":"What is the command to run a Python script?","label":1} | ||
| {"text":"How do I add an element to a list in Python?","label":1} | ||
| {"text":"What is the default port for React development server?","label":1} | ||
| {"text":"How do I exit vim?","label":1} | ||
| {"text":"What is the syntax for a switch statement in Java?","label":1} | ||
| {"text":"How do I check if a key exists in a dictionary?","label":1} | ||
| {"text":"What is the git command to create a new branch?","label":1} | ||
| {"text":"How do I read a file in Python?","label":1} | ||
| {"text":"What is the keyboard shortcut to open terminal in VS Code?","label":1} | ||
| {"text":"How do I reverse a string in JavaScript?","label":1} | ||
| {"text":"What is the syntax for try-catch in Python?","label":1} | ||
| {"text":"How do I install a specific version of a package with npm?","label":1} | ||
| {"text":"What is the command to initialize a git repository?","label":1} | ||
| {"text":"How do I get the length of a list in Python?","label":1} | ||
| {"text":"What does async/await do?","label":1} | ||
| {"text":"How do I concatenate strings in Java?","label":1} | ||
| {"text":"What is the syntax for a lambda function in Python?","label":1} | ||
| {"text":"How do I sort an array in JavaScript?","label":1} | ||
| {"text":"What is the command to start a Docker container?","label":1} | ||
| {"text":"How do I remove duplicates from a list in Python?","label":1} | ||
| {"text":"What is the syntax for string interpolation in C#?","label":1} | ||
| {"text":"How do I check the type of a variable in Python?","label":1} | ||
| {"text":"What is the command to build a TypeScript project?","label":1} | ||
| {"text":"How do I use map function in JavaScript?","label":1} | ||
| {"text":"What is the syntax for destructuring in JavaScript?","label":1} | ||
| {"text":"How do I parse JSON in Python?","label":1} | ||
| {"text":"What is the git command to undo last commit?","label":1} | ||
| {"text":"How do I create a class in Python?","label":1} | ||
| {"text":"What is the difference between == and equals() in Java?","label":1} | ||
| {"text":"How do I use spread operator in JavaScript?","label":1} | ||
| {"text":"What is the command to run tests with pytest?","label":1} | ||
| {"text":"How do I handle null values in TypeScript?","label":1} | ||
| {"text":"What is the syntax for a list comprehension in Python?","label":1} | ||
| {"text":"How do I make an HTTP request in JavaScript?","label":1} | ||
| {"text":"Design a scalable microservices architecture for an e-commerce platform with high availability and fault tolerance","label":0} | ||
| {"text":"Help me architect a real-time collaborative document editing system like Google Docs","label":0} | ||
| {"text":"Create a comprehensive strategy for migrating a monolithic application to microservices without downtime","label":0} | ||
| {"text":"Design a machine learning pipeline for detecting fraudulent transactions with explainability requirements","label":0} | ||
| {"text":"Develop a caching strategy for a multi-region distributed application with consistency guarantees","label":0} | ||
| {"text":"Architect a serverless event-driven system for processing millions of IoT sensor readings per second","label":0} | ||
| {"text":"Design a secure authentication and authorization system with SSO, MFA, and role-based access control","label":0} | ||
| {"text":"Create a data lake architecture for analytics with real-time and batch processing capabilities","label":0} | ||
| {"text":"Help me design a recommendation engine that balances personalization with diversity and fairness","label":0} | ||
| {"text":"Architect a CI/CD pipeline with canary deployments, feature flags, and automated rollback capabilities","label":0} | ||
| {"text":"Design a distributed consensus algorithm for a blockchain-based voting system","label":0} | ||
| {"text":"Create an API versioning and deprecation strategy for a public API with thousands of consumers","label":0} | ||
| {"text":"Help me architect a multi-tenant SaaS platform with data isolation and customization options","label":0} | ||
| {"text":"Design a disaster recovery strategy with RTO of 15 minutes and RPO of 5 minutes","label":0} | ||
| {"text":"Develop a testing strategy for a complex distributed system with eventual consistency","label":0} | ||
| {"text":"Architect a search system that handles fuzzy matching, relevance ranking, and personalization","label":0} | ||
| {"text":"Design a rate limiting and throttling system that handles bursty traffic fairly","label":0} | ||
| {"text":"Create a monitoring and observability strategy for a Kubernetes-based microservices platform","label":0} | ||
| {"text":"Help me design a data synchronization system between mobile apps and backend with conflict resolution","label":0} | ||
| {"text":"Architect a notification system that delivers messages across multiple channels with delivery guarantees","label":0} | ||
| {"text":"Design a workflow orchestration engine for complex business processes with compensation logic","label":0} | ||
| {"text":"Create a content delivery strategy for a global video streaming platform with adaptive bitrate","label":0} | ||
| {"text":"Help me architect a financial trading system with sub-millisecond latency requirements","label":0} | ||
| {"text":"Design an A/B testing framework that handles feature interactions and statistical significance","label":0} | ||
| {"text":"Develop a schema evolution strategy for a large-scale data warehouse with backward compatibility","label":0} | ||
| {"text":"Architect a secrets management system with automatic rotation and audit logging","label":0} | ||
| {"text":"Design a queue-based system for processing long-running jobs with priority and fairness","label":0} | ||
| {"text":"Create a database sharding strategy for a social network with complex relationship queries","label":0} | ||
| {"text":"Help me design a plugin architecture that allows third-party extensions while maintaining security","label":0} | ||
| {"text":"Architect an image processing pipeline that handles millions of uploads with resizing and optimization","label":0} | ||
| {"text":"Design a compliance and audit logging system that meets GDPR and SOC2 requirements","label":0} | ||
| {"text":"Create a load balancing strategy for WebSocket connections with sticky sessions and failover","label":0} | ||
| {"text":"Help me architect a real-time analytics dashboard with sub-second query latency on petabyte data","label":0} | ||
| {"text":"Design a configuration management system for distributed applications with feature flags and gradual rollout","label":0} | ||
| {"text":"Develop a cost optimization strategy for a multi-cloud infrastructure with reserved and spot instances","label":0} | ||
| {"text":"Architect a billing and subscription management system with usage-based pricing and invoicing","label":0} | ||
| {"text":"Design a graph database schema for a knowledge graph with efficient traversal queries","label":0} | ||
| {"text":"Create a blue-green deployment strategy for stateful services with zero-downtime database migrations","label":0} | ||
| {"text":"Help me design a service mesh architecture with mTLS, traffic management, and observability","label":0} | ||
| {"text":"Architect a document storage system with versioning, access control, and full-text search","label":0} | ||
| {"text":"Design a machine learning model serving infrastructure with A/B testing and model versioning","label":0} | ||
| {"text":"Create a data governance framework for a regulated industry with data lineage and quality checks","label":0} | ||
| {"text":"Help me architect a geospatial data platform for real-time location tracking and route optimization","label":0} | ||
| {"text":"Design an event sourcing system with CQRS pattern for a banking application","label":0} | ||
| {"text":"Develop a capacity planning strategy for a rapidly growing SaaS platform with predictable scaling","label":0} | ||
| {"text":"Architect a federated identity system that integrates with multiple enterprise identity providers","label":0} | ||
| {"text":"Design a chaos engineering framework for testing system resilience in production","label":0} | ||
| {"text":"Create a data pipeline architecture that handles schema-on-read with data quality validation","label":0} | ||
| {"text":"Help me design an API gateway with request transformation, rate limiting, and circuit breaking","label":0} | ||
| {"text":"Architect a distributed tracing system for debugging requests across hundreds of microservices","label":0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'sharp' package is added to the external dependencies list with a comment "Image processing with native bindings", but there's no usage of image processing in the code changes for this PR. Similar to the 'adm-zip' issue, if this is not related to the model router prototype, it should be in a separate commit. If it is needed for this feature, the usage is not evident in the diff.