Skip to content

ai-chat-ui: warn when chat session token usage crosses a threshold#17387

Open
eneufeld wants to merge 1 commit intomasterfrom
feat/17323
Open

ai-chat-ui: warn when chat session token usage crosses a threshold#17387
eneufeld wants to merge 1 commit intomasterfrom
feat/17323

Conversation

@eneufeld
Copy link
Copy Markdown
Contributor

What it does

Adds a dismissable notification that alerts users when a chat session's total token usage reaches a user-configured absolute threshold, with quick actions to compact the current session (summarize-and-continue) or start a new chat.

The warning fires once per threshold crossing and re-arms when usage drops below the threshold and crosses again (e.g. after compacting). Closing and reopening the chat view re-creates the widget and will re-notify if the session is still above threshold — accepted as a rare corner case.

The token usage indicator's color bands and tooltip now derive from the same absolute threshold instead of a hardcoded 200k context window, so visual feedback aligns with the warning trigger regardless of the actual model's context size. The CHAT_CONTEXT_WINDOW_SIZE constant is removed.

New preferences:
ai-features.chat.tokenUsageWarning.enabled (boolean, default: false)
ai-features.chat.tokenUsageWarning.tokenThreshold (number, default: 160000)

Threshold detection lives on the chat input widget — it already listens to the session model's responseChanged events and the active-session semantics match the "notify for the session the user is engaged with" UX contract. A pure decideTokenUsageWarning helper keeps the threshold/notified-state logic unit-testable. The transient tree-view edit input widget opts out of emitting warnings to avoid duplicate toasts while editing a past request.

Closes #17323

How to test

Enable the warning, set a treshold and generate something so you go over the threshold.

Follow-ups

Breaking changes

  • This PR introduces breaking changes and requires careful review. If yes, the breaking changes section in the changelog has been updated.

Attribution

Review checklist

Reminder for reviewers

Adds a dismissable notification that alerts users when a chat session's
total token usage reaches a user-configured absolute threshold, with
quick actions to compact the current session (summarize-and-continue)
or start a new chat.

The warning fires once per threshold crossing and re-arms when usage
drops below the threshold and crosses again (e.g. after compacting).
Closing and reopening the chat view re-creates the widget and will
re-notify if the session is still above threshold — accepted as a rare
corner case.

The token usage indicator's color bands and tooltip now derive from the
same absolute threshold instead of a hardcoded 200k context window, so
visual feedback aligns with the warning trigger regardless of the
actual model's context size. The CHAT_CONTEXT_WINDOW_SIZE constant is
removed.

New preferences:
  ai-features.chat.tokenUsageWarning.enabled        (boolean, default: false)
  ai-features.chat.tokenUsageWarning.tokenThreshold (number,  default: 160000)

Threshold detection lives on the chat input widget — it already listens
to the session model's responseChanged events and the active-session
semantics match the "notify for the session the user is engaged with"
UX contract. A pure decideTokenUsageWarning helper keeps the
threshold/notified-state logic unit-testable. The transient tree-view
edit input widget opts out of emitting warnings to avoid duplicate
toasts while editing a past request.

Closes #17323
@eneufeld eneufeld requested a review from sgraband April 22, 2026 11:40
@github-project-automation github-project-automation Bot moved this to Waiting on reviewers in PR Backlog Apr 22, 2026
Copy link
Copy Markdown
Contributor

@sgraband sgraband left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! This will make it much nicer to work with the Chat! I have some inline comments and some general remarks:

  • From a UI perspective i am not sure if i like the fact that the context used indicator now only shows the "progress" till the warning threshold and not the full context size. Its fine that for example the color changes after we hit the limit, but i still want to see how much i have left until the limit imho. (This is also for the tooltip, i still would like to see the real limit)
  • Warning could be somthing like: you have hit 80 % of the context limit. Would you like to compact?

Comment on lines +179 to +183
@inject(MessageService) @optional()
protected readonly messageService: MessageService | undefined;

@inject(CommandService) @optional()
protected readonly commandService: CommandService | undefined;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe marking them @optional() here is unnecessary and only forces defensive null checks in showTokenUsageWarning. Could you drop @optional() on both and remove the if (!this.messageService) / if (!this.commandService) guards?

}

protected isTokenUsageWarningEnabled(): boolean {
return this.preferenceService?.get<boolean>(CHAT_VIEW_TOKEN_USAGE_WARNING_ENABLED, true) ?? true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback value does not match default value specified in the preference (false).

Comment on lines +826 to +830
this.commandService.executeCommand('ai-chat.new-with-task-context').catch(error => {
console.error("Failed to execute 'ai-chat.new-with-task-context' from token usage warning", error);
});
} else if (selected === newSessionAction) {
this.commandService.executeCommand('ai-chat-ui.new-chat').catch(error => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the exported constants for the command ids instead?

Comment on lines 682 to 689
this.tokenUsageEnabled = this.preferenceService?.get<boolean>(CHAT_VIEW_TOKEN_USAGE_ENABLED, false) ?? false;
if (this.preferenceService) {
this.toDispose.push(this.preferenceService.onPreferenceChanged(change => {
if (change.preferenceName === CHAT_VIEW_TOKEN_USAGE_ENABLED) {
if (change.preferenceName === CHAT_VIEW_TOKEN_USAGE_ENABLED
|| change.preferenceName === CHAT_VIEW_TOKEN_USAGE_WARNING_TOKEN_THRESHOLD) {
this.tokenUsageEnabled = this.preferenceService?.get<boolean>(CHAT_VIEW_TOKEN_USAGE_ENABLED, false) ?? false;
this.update();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things:

  1. Reassigning this.tokenUsageEnabled when the threshold changes is a no-op (the value only depends on CHAT_VIEW_TOKEN_USAGE_ENABLED). I think the intent is to trigger a re-render so the indicator's color bands update; a short comment would make that clearer, or the assignment could be moved back under the ENABLED branch only.
  2. CHAT_VIEW_TOKEN_USAGE_WARNING_ENABLED is not included. Toggling the warning on while a session is already over the threshold won't fire a warning until the next responseChanged. Could we include it here and call evaluateTokenUsageWarning so users get immediate feedback?

export const CHAT_VIEW_TOKEN_USAGE_WARNING_ENABLED = 'ai-features.chat.tokenUsageWarning.enabled';
export const CHAT_VIEW_TOKEN_USAGE_WARNING_TOKEN_THRESHOLD = 'ai-features.chat.tokenUsageWarning.tokenThreshold';

export const CHAT_VIEW_TOKEN_USAGE_WARNING_TOKEN_THRESHOLD_DEFAULT = 160000; // 80% of 200k
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a percentage would be cleaner here imho. If we want the user to specify a token limit this would need to be done on the model settings instead imo, as a non percentage value for all models is not usable i believe.

If the issue is that we right now cannot retrieve the max tokens for a model atm, i would still use a percentage here that then alwys resolves against 200k at the moment and then in a follow up use the real value when we can select it. WDYT?

I would try not to introduce a preference that we will change later on.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I see your point. My initial version was done using percentages. The problem is I want different behavior depending on context window and model. Eg opus 4.6 i want to compact around 500k on the 1mil context size, for 4.7 maybe around 900k as it degrades slower and for chatgpt i want it to be 170 as the context is 200k.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it already possible to set the value per model? I agree that this should be supported.

When this is supported however i feel like a percentage is still the more usable option, especially as we mostly talk about steps of 1000s meaning in the settings spotting the difference between 20000 and 2000 is much harder than 10% and 1% imho.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is not possible yet. Biggest issue. We could of course first add the context size per model information eg from the model endpoint (anthropic and google provide such information, openai doesn't) and then finish this pr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No its fine if that is out of the scope of this PR imho. But i would prepare this setting in a way as we envision it down the line.
When we have the context size per model, would we prefer to provide a percentage or a abolsute token number? And would we have a global setting? Or would this be configured per model and therefore be in the model configuration?

I personally would tend to a percentage and maybe have a fallback value as a global setting, which can be overwritten for every model? So my proposal would be to change this setting to percentage and keep it but maybe rename it to defaultTokenThreshold?

WDYT?

Comment on lines +53 to +56
'theia/ai/chat-ui/tokenUsageWarningTokenThreshold',
'Total number of tokens in the current chat session at which the token usage warning is triggered. ' +
'Choose a value appropriate for your model\'s context window (e.g. lower it for small-context models). ' +
'Only applies when the token usage warning is enabled.'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This preference now also drives the color bands and tooltip of the token usage indicator (a separate feature controlled by tokenUsageIndicator.enabled), so 'Only applies when the token usage warning is enabled.' is no longer accurate. Could you update the description to mention that the indicator's thresholds derive from this value too? Alternatively, it might be worth considering whether the indicator should have its own threshold so the two features remain independently configurable, i.e. someone using only the indicator (warning off) doesn't need to reason about a preference named after the warning.

Comment on lines +640 to +645
// Evaluate the warning on attach. For an existing widget switching between
// already-notified sessions the per-instance `notifiedSessions` Set prevents
// re-notifying. When the chat view is closed and reopened, a fresh widget
// is created with an empty Set, so a session still above the threshold will
// legitimately trigger the warning again — accepted as a rare corner case.
this.evaluateTokenUsageWarning(chatModel);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notifiedSessions lives on the widget, so closing and reopening the chat view resets it. The PR description acknowledges this as a rare corner case, but with multiple sessions above the threshold, every session switch after reopening the view will re-trigger the warning (because the chatModel setter calls evaluateTokenUsageWarning). Fine to keep the current design, but it might be worth either (a) storing the notified state on the ChatSession/ChatModel so it survives widget recreation, or (b) being a bit more explicit in the comment that it's "per reopen × per session", not just "per widget lifecycle".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Waiting on reviewers

Development

Successfully merging this pull request may close these issues.

Notify users when chat session token usage approaches a configurable threshold

2 participants