Skip to content

hotfix: narrow overly broad keywords in error diagnosis and add CherryIN context#14000

Open
SiinXu wants to merge 2 commits intomainfrom
hotfix/error-diagnosis-keywords
Open

hotfix: narrow overly broad keywords in error diagnosis and add CherryIN context#14000
SiinXu wants to merge 2 commits intomainfrom
hotfix/error-diagnosis-keywords

Conversation

@SiinXu
Copy link
Copy Markdown
Collaborator

@SiinXu SiinXu commented Apr 3, 2026

What this PR does

Before this PR:

  • Error classifier uses overly broad keywords (proxy, stream, json) that cause false positive matches on unrelated error messages
  • buildContextHint in ErrorDiagnosisService has the same broad keyword issue for proxy and mcp
  • CherryIN provider is not identified as an ecosystem partner in diagnosis prompts
  • Node.js experimental warnings (e.g., (node:12345) [UNDICI-EHPA]) from Claude Code stderr are included in error output

After this PR:

  • Proxy keywords narrowed: proxyproxy error / proxy connection / proxy refused
  • Stream keywords narrowed: streamstream error / stream interrupted / stream closed
  • Parse keywords: removed bare json match (keeps unexpected token, invalid response, parse error)
  • MCP keywords narrowed in buildContextHint: mcpmcp server / mcp connection / mcp error
  • Added isCherryInProvider() helper to add CherryIN ecosystem partner context to diagnosis prompts
  • Node.js warnings filtered from Claude Code stderr via regex check
  • Added tests for all narrowed keyword behaviors

Why we need it and why it was done in this way

These are follow-up bug fixes from the self-review of #13894 (AI error diagnosis feature). The broad keywords caused false classifications — e.g., any error message mentioning "json" would be classified as a parse error, and any message mentioning "stream" would be classified as a stream interruption.

The following tradeoffs were made:

  • Each keyword is narrowed to more specific phrases rather than using regex, keeping the classifier fast and readable

The following alternatives were considered:

  • Using regex patterns — rejected in favor of simple string matching for maintainability

Breaking changes

None

Special notes for your reviewer

This is a follow-up fix PR for #13894. The original PR was merged to main but these fix commits were on the feature branch and not included in the squash merge.

Checklist

  • PR: The PR description is expressive enough and will help future contributors
  • Code: Write code that humans can understand and Keep it simple
  • Refactor: You have left the code cleaner than you found it
  • Upgrade: N/A
  • Documentation: Not required (internal behavior improvement)
  • Self-review: I have reviewed my own code

Release note

NONE

Copy link
Copy Markdown
Collaborator

@EurFelux EurFelux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

This review was translated by Claude.

The overall direction is good, narrowing down the keywords can effectively reduce misclassification. There are a few issues that need to be fixed:

  1. The isNodeWarning regex is too broad — it will also filter out real Node.js errors in [ERR_*] format. It's recommended to narrow it down to only match the Warning: keyword
  2. Inconsistent proxy keywords between buildContextHint and errorClassifier — the classifier has proxy refused but the context hint doesn't, which will result in not getting the corresponding context hints after classification
  3. Tests lack coverage for proxy refused and stream closed

}

const errorChunks: string[] = []
const isNodeWarning = (chunk: string) => /^\(node:\d+\)\s*\[/.test(chunk.trim())
Copy link
Copy Markdown
Collaborator

@EurFelux EurFelux Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

This comment was translated by Claude.

This regex /^\(node:\d+\)\s*\[/ matches all diagnostic information in the format (node:<pid>) [<code>], not limited to warnings. Node.js diagnostic output includes multiple types:

  • (node:12345) [UNDICI-EHPA] ExperimentalWarning: ... ✅ Should be filtered
  • (node:12345) [DEP0001] DeprecationWarning: ... ✅ Should be filtered
  • (node:12345) [ERR_SOCKET_CLOSED] Error: ... ❌ Should not be filtered

The current regex will also swallow real errors in the [ERR_*] format. Suggest changing to only match messages containing Warning:

const isNodeWarning = (chunk: string) => /^\(node:\d+\).*Warning:/.test(chunk.trim())

Additionally, this helper has a large scope (defined at the top of the function, used hundreds of lines later). Suggest moving it to be defined locally within the stderr handler, or extracting it as a private method of ClaudeCodeService.


Original Content

这个正则 /^\(node:\d+\)\s*\[/ 匹配的是所有 (node:<pid>) [<code>] 格式的诊断信息,不仅限于 warning。Node.js 的诊断输出包括多种类型:

  • (node:12345) [UNDICI-EHPA] ExperimentalWarning: ... ✅ 应过滤
  • (node:12345) [DEP0001] DeprecationWarning: ... ✅ 应过滤
  • (node:12345) [ERR_SOCKET_CLOSED] Error: ... ❌ 不应过滤

当前的正则会把 [ERR_*] 格式的真实错误也吞掉。建议改为只匹配包含 Warning 的消息:

const isNodeWarning = (chunk: string) => /^\(node:\d+\).*Warning:/.test(chunk.trim())

另外这个 helper 的作用域很大(定义在函数顶部、几百行后才被使用),建议移到 stderr handler 内部就近定义,或者提取为 ClaudeCodeService 的 private 方法。

Comment on lines +99 to +100
msg.includes('proxy error') ||
msg.includes('proxy connection') ||
Copy link
Copy Markdown
Collaborator

@EurFelux EurFelux Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

This comment was translated by Claude.

In errorClassifier.ts, a new proxy refused category was added to the proxy category, but buildContextHint here only narrowed to proxy error / proxy connection, not including proxy refused.

This will cause an error classified as proxy (because it matched proxy refused) to fall back to the final generic fallback when generating context hints, losing proxy-related diagnostic context.

It is recommended to add proxy refused to maintain consistency between both sides:

msg.includes('proxy error') ||
msg.includes('proxy connection') ||
msg.includes('proxy refused') ||

Original Content

errorClassifier.ts 中 proxy 类别新增了 proxy refused,但 buildContextHint 这里只收窄到了 proxy error / proxy connection,没有包含 proxy refused

这会导致一个被分类为 proxy 的错误(因为匹配了 proxy refused)在生成上下文提示时走到最后的 generic fallback,拿不到 proxy 相关的诊断上下文。

建议补上 proxy refused,保持两边一致:

msg.includes('proxy error') ||
msg.includes('proxy connection') ||
msg.includes('proxy refused') ||

it('classifies proxy connection as proxy', () => {
const result = classifyError(makeError({ message: 'proxy connection failed' }))
expect(result.category).toBe('proxy')
})
Copy link
Copy Markdown
Collaborator

@EurFelux EurFelux Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

This comment was translated by Claude.

The tests cover proxy error and proxy connection, but are missing a positive test for proxy refused. Similarly, the stream category added the stream closed keyword but lacks corresponding tests.

Suggested additions:

it('classifies proxy refused as proxy', () => {
  const result = classifyError(makeError({ message: 'proxy refused connection' }))
  expect(result.category).toBe('proxy')
})

it('classifies stream closed as stream', () => {
  const result = classifyError(makeError({ message: 'stream closed unexpectedly' }))
  expect(result.category).toBe('stream')
})

Original Content

测试覆盖了 proxy errorproxy connection,但缺少对 proxy refused 的正向测试。同样,stream 分类新增了 stream closed 关键词但没有对应的测试。

建议补充:

it('classifies proxy refused as proxy', () => {
  const result = classifyError(makeError({ message: 'proxy refused connection' }))
  expect(result.category).toBe('proxy')
})

it('classifies stream closed as stream', () => {
  const result = classifyError(makeError({ message: 'stream closed unexpectedly' }))
  expect(result.category).toBe('stream')
})

@SiinXu SiinXu changed the title fix: narrow overly broad keywords in error diagnosis and add CherryIN context hotfix: narrow overly broad keywords in error diagnosis and add CherryIN context Apr 3, 2026
SiinXu

This comment was marked as low quality.

SiinXu and others added 2 commits April 3, 2026 19:15
… context

- Add isCherryInProvider() to identify CherryIN as ecosystem partner in diagnosis prompts
- Narrow proxy keywords from 'proxy' to 'proxy error'/'proxy connection'/'proxy refused'
- Narrow stream keywords from 'stream' to 'stream error'/'stream interrupted'/'stream closed'
- Narrow MCP keywords from 'mcp' to 'mcp server'/'mcp connection'/'mcp error'
- Remove bare 'json' from parse classifier to avoid false positives
- Filter Node.js experimental warnings from Claude Code stderr
- Add tests for narrowed keyword matching

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Siin Xu <31815270+SiinXu@users.noreply.github.com>
- Narrow isNodeWarning regex to match only `Warning:` lines, avoiding
  false suppression of real Node.js `[ERR_*]` errors
- Add missing `proxy refused` keyword to buildContextHint for consistency
  with errorClassifier
- Add test coverage for `proxy refused` and `stream closed` keywords

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Siin Xu <31815270+SiinXu@users.noreply.github.com>
@SiinXu SiinXu force-pushed the hotfix/error-diagnosis-keywords branch from 80fdd50 to 1e70e44 Compare April 3, 2026 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants