ai-chat-ui: warn when chat session token usage crosses a threshold#17387
ai-chat-ui: warn when chat session token usage crosses a threshold#17387
Conversation
Adds a dismissable notification that alerts users when a chat session's total token usage reaches a user-configured absolute threshold, with quick actions to compact the current session (summarize-and-continue) or start a new chat. The warning fires once per threshold crossing and re-arms when usage drops below the threshold and crosses again (e.g. after compacting). Closing and reopening the chat view re-creates the widget and will re-notify if the session is still above threshold — accepted as a rare corner case. The token usage indicator's color bands and tooltip now derive from the same absolute threshold instead of a hardcoded 200k context window, so visual feedback aligns with the warning trigger regardless of the actual model's context size. The CHAT_CONTEXT_WINDOW_SIZE constant is removed. New preferences: ai-features.chat.tokenUsageWarning.enabled (boolean, default: false) ai-features.chat.tokenUsageWarning.tokenThreshold (number, default: 160000) Threshold detection lives on the chat input widget — it already listens to the session model's responseChanged events and the active-session semantics match the "notify for the session the user is engaged with" UX contract. A pure decideTokenUsageWarning helper keeps the threshold/notified-state logic unit-testable. The transient tree-view edit input widget opts out of emitting warnings to avoid duplicate toasts while editing a past request. Closes #17323
sgraband
left a comment
There was a problem hiding this comment.
Thanks for the changes! This will make it much nicer to work with the Chat! I have some inline comments and some general remarks:
- From a UI perspective i am not sure if i like the fact that the context used indicator now only shows the "progress" till the warning threshold and not the full context size. Its fine that for example the color changes after we hit the limit, but i still want to see how much i have left until the limit imho. (This is also for the tooltip, i still would like to see the real limit)
- Warning could be somthing like: you have hit 80 % of the context limit. Would you like to compact?
| @inject(MessageService) @optional() | ||
| protected readonly messageService: MessageService | undefined; | ||
|
|
||
| @inject(CommandService) @optional() | ||
| protected readonly commandService: CommandService | undefined; |
| } | ||
|
|
||
| protected isTokenUsageWarningEnabled(): boolean { | ||
| return this.preferenceService?.get<boolean>(CHAT_VIEW_TOKEN_USAGE_WARNING_ENABLED, true) ?? true; |
There was a problem hiding this comment.
Fallback value does not match default value specified in the preference (false).
| this.commandService.executeCommand('ai-chat.new-with-task-context').catch(error => { | ||
| console.error("Failed to execute 'ai-chat.new-with-task-context' from token usage warning", error); | ||
| }); | ||
| } else if (selected === newSessionAction) { | ||
| this.commandService.executeCommand('ai-chat-ui.new-chat').catch(error => { |
There was a problem hiding this comment.
Can we use the exported constants for the command ids instead?
| this.tokenUsageEnabled = this.preferenceService?.get<boolean>(CHAT_VIEW_TOKEN_USAGE_ENABLED, false) ?? false; | ||
| if (this.preferenceService) { | ||
| this.toDispose.push(this.preferenceService.onPreferenceChanged(change => { | ||
| if (change.preferenceName === CHAT_VIEW_TOKEN_USAGE_ENABLED) { | ||
| if (change.preferenceName === CHAT_VIEW_TOKEN_USAGE_ENABLED | ||
| || change.preferenceName === CHAT_VIEW_TOKEN_USAGE_WARNING_TOKEN_THRESHOLD) { | ||
| this.tokenUsageEnabled = this.preferenceService?.get<boolean>(CHAT_VIEW_TOKEN_USAGE_ENABLED, false) ?? false; | ||
| this.update(); | ||
| } |
There was a problem hiding this comment.
Two small things:
- Reassigning this.tokenUsageEnabled when the threshold changes is a no-op (the value only depends on CHAT_VIEW_TOKEN_USAGE_ENABLED). I think the intent is to trigger a re-render so the indicator's color bands update; a short comment would make that clearer, or the assignment could be moved back under the ENABLED branch only.
- CHAT_VIEW_TOKEN_USAGE_WARNING_ENABLED is not included. Toggling the warning on while a session is already over the threshold won't fire a warning until the next responseChanged. Could we include it here and call evaluateTokenUsageWarning so users get immediate feedback?
| export const CHAT_VIEW_TOKEN_USAGE_WARNING_ENABLED = 'ai-features.chat.tokenUsageWarning.enabled'; | ||
| export const CHAT_VIEW_TOKEN_USAGE_WARNING_TOKEN_THRESHOLD = 'ai-features.chat.tokenUsageWarning.tokenThreshold'; | ||
|
|
||
| export const CHAT_VIEW_TOKEN_USAGE_WARNING_TOKEN_THRESHOLD_DEFAULT = 160000; // 80% of 200k |
There was a problem hiding this comment.
I feel like a percentage would be cleaner here imho. If we want the user to specify a token limit this would need to be done on the model settings instead imo, as a non percentage value for all models is not usable i believe.
If the issue is that we right now cannot retrieve the max tokens for a model atm, i would still use a percentage here that then alwys resolves against 200k at the moment and then in a follow up use the real value when we can select it. WDYT?
I would try not to introduce a preference that we will change later on.
There was a problem hiding this comment.
yeah I see your point. My initial version was done using percentages. The problem is I want different behavior depending on context window and model. Eg opus 4.6 i want to compact around 500k on the 1mil context size, for 4.7 maybe around 900k as it degrades slower and for chatgpt i want it to be 170 as the context is 200k.
There was a problem hiding this comment.
Is it already possible to set the value per model? I agree that this should be supported.
When this is supported however i feel like a percentage is still the more usable option, especially as we mostly talk about steps of 1000s meaning in the settings spotting the difference between 20000 and 2000 is much harder than 10% and 1% imho.
There was a problem hiding this comment.
no, this is not possible yet. Biggest issue. We could of course first add the context size per model information eg from the model endpoint (anthropic and google provide such information, openai doesn't) and then finish this pr
There was a problem hiding this comment.
No its fine if that is out of the scope of this PR imho. But i would prepare this setting in a way as we envision it down the line.
When we have the context size per model, would we prefer to provide a percentage or a abolsute token number? And would we have a global setting? Or would this be configured per model and therefore be in the model configuration?
I personally would tend to a percentage and maybe have a fallback value as a global setting, which can be overwritten for every model? So my proposal would be to change this setting to percentage and keep it but maybe rename it to defaultTokenThreshold?
WDYT?
| 'theia/ai/chat-ui/tokenUsageWarningTokenThreshold', | ||
| 'Total number of tokens in the current chat session at which the token usage warning is triggered. ' + | ||
| 'Choose a value appropriate for your model\'s context window (e.g. lower it for small-context models). ' + | ||
| 'Only applies when the token usage warning is enabled.' |
There was a problem hiding this comment.
This preference now also drives the color bands and tooltip of the token usage indicator (a separate feature controlled by tokenUsageIndicator.enabled), so 'Only applies when the token usage warning is enabled.' is no longer accurate. Could you update the description to mention that the indicator's thresholds derive from this value too? Alternatively, it might be worth considering whether the indicator should have its own threshold so the two features remain independently configurable, i.e. someone using only the indicator (warning off) doesn't need to reason about a preference named after the warning.
| // Evaluate the warning on attach. For an existing widget switching between | ||
| // already-notified sessions the per-instance `notifiedSessions` Set prevents | ||
| // re-notifying. When the chat view is closed and reopened, a fresh widget | ||
| // is created with an empty Set, so a session still above the threshold will | ||
| // legitimately trigger the warning again — accepted as a rare corner case. | ||
| this.evaluateTokenUsageWarning(chatModel); |
There was a problem hiding this comment.
notifiedSessions lives on the widget, so closing and reopening the chat view resets it. The PR description acknowledges this as a rare corner case, but with multiple sessions above the threshold, every session switch after reopening the view will re-trigger the warning (because the chatModel setter calls evaluateTokenUsageWarning). Fine to keep the current design, but it might be worth either (a) storing the notified state on the ChatSession/ChatModel so it survives widget recreation, or (b) being a bit more explicit in the comment that it's "per reopen × per session", not just "per widget lifecycle".
What it does
Adds a dismissable notification that alerts users when a chat session's total token usage reaches a user-configured absolute threshold, with quick actions to compact the current session (summarize-and-continue) or start a new chat.
The warning fires once per threshold crossing and re-arms when usage drops below the threshold and crosses again (e.g. after compacting). Closing and reopening the chat view re-creates the widget and will re-notify if the session is still above threshold — accepted as a rare corner case.
The token usage indicator's color bands and tooltip now derive from the same absolute threshold instead of a hardcoded 200k context window, so visual feedback aligns with the warning trigger regardless of the actual model's context size. The CHAT_CONTEXT_WINDOW_SIZE constant is removed.
New preferences:
ai-features.chat.tokenUsageWarning.enabled (boolean, default: false)
ai-features.chat.tokenUsageWarning.tokenThreshold (number, default: 160000)
Threshold detection lives on the chat input widget — it already listens to the session model's responseChanged events and the active-session semantics match the "notify for the session the user is engaged with" UX contract. A pure decideTokenUsageWarning helper keeps the threshold/notified-state logic unit-testable. The transient tree-view edit input widget opts out of emitting warnings to avoid duplicate toasts while editing a past request.
Closes #17323
How to test
Enable the warning, set a treshold and generate something so you go over the threshold.
Follow-ups
Breaking changes
Attribution
Review checklist
nlsservice (for details, please see the Internationalization/Localization section in the Coding Guidelines)Reminder for reviewers