Skip to content

feat: add content sanitizer to strip hidden text before AI processing#2137

Open
gentlemandev wants to merge 1 commit intomainfrom
feat/content-sanitizer
Open

feat: add content sanitizer to strip hidden text before AI processing#2137
gentlemandev wants to merge 1 commit intomainfrom
feat/content-sanitizer

Conversation

@gentlemandev
Copy link
Copy Markdown
Collaborator

Summary

  • New utility utils/ai/content-sanitizer.ts that strips invisible/hidden content from emails before passing to AI
  • Prevents attackers from embedding invisible instructions in emails that the AI would process but users can't see

What it strips

  • Zero-width Unicode: \u200B, \u200C, \u200D, \u2060, \uFEFF
  • RTL/LTR overrides: \u202A-\u202E, \u2066-\u2069
  • Hidden HTML elements: display:none, visibility:hidden, font-size:0, opacity:0
  • White-on-white text: color:#fff/#ffffff/white/rgb(255,255,255)
  • Offscreen positioning: position:absolute + left:-9999px
  • Zero-dimension elements: width:0/height:0 + overflow:hidden
  • HTML comments: <!-- ... -->

API

stripHiddenText(text: string): string      // for plain text
stripHiddenHtml(html: string): string      // for HTML emails
sanitizeForAI({ textPlain?, textHtml? })   // convenience wrapper

Not yet wired in

This PR adds the utility and tests. A follow-up PR will call sanitizeForAI() before aiChooseRule(), aiGenerateArgs(), and aiDraftReply().

Test plan

  • 32 tests covering all hidden content types
  • Edge cases: nested hidden elements, mixed attacks, legitimate HTML preservation
  • Undefined/empty input handling

🤖 Generated with Claude Code

…I processing

Strips zero-width Unicode characters, RTL/LTR overrides, hidden HTML
elements (display:none, visibility:hidden, zero font-size, opacity:0,
white-on-white text, offscreen positioning), and HTML comments that
attackers use to inject invisible instructions into email content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
inbox-zero Ignored Ignored Apr 4, 2026 0:41am


/** Strip hidden/invisible content from HTML before AI processing */
export function stripHiddenHtml(html: string): string {
let result = html.replace(HTML_COMMENT, "");
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/web/utils/ai/content-sanitizer.ts">

<violation number="1" location="apps/web/utils/ai/content-sanitizer.ts:14">
P1: The `color` regex also matches `background-color`, causing false positives that remove visible content.</violation>

<violation number="2" location="apps/web/utils/ai/content-sanitizer.ts:46">
P2: `sanitizeForAI` drops empty-string inputs by using truthy checks; use an explicit `undefined` check so empty content is preserved.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

/visibility\s*:\s*hidden/i,
/font-size\s*:\s*0(?:px|em|rem|%|pt)?\s*[;"']/i,
/opacity\s*:\s*0\s*[;"']/i,
/color\s*:\s*(?:#fff(?:fff)?|white|rgb\(\s*255\s*,\s*255\s*,\s*255\s*\))\s*[;"']/i,
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The color regex also matches background-color, causing false positives that remove visible content.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/ai/content-sanitizer.ts, line 14:

<comment>The `color` regex also matches `background-color`, causing false positives that remove visible content.</comment>

<file context>
@@ -0,0 +1,87 @@
+  /visibility\s*:\s*hidden/i,
+  /font-size\s*:\s*0(?:px|em|rem|%|pt)?\s*[;"']/i,
+  /opacity\s*:\s*0\s*[;"']/i,
+  /color\s*:\s*(?:#fff(?:fff)?|white|rgb\(\s*255\s*,\s*255\s*,\s*255\s*\))\s*[;"']/i,
+];
+
</file context>
Fix with Cubic

textHtml?: string;
}): { textPlain?: string; textHtml?: string } {
return {
textPlain: input.textPlain ? stripHiddenText(input.textPlain) : undefined,
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: sanitizeForAI drops empty-string inputs by using truthy checks; use an explicit undefined check so empty content is preserved.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/web/utils/ai/content-sanitizer.ts, line 46:

<comment>`sanitizeForAI` drops empty-string inputs by using truthy checks; use an explicit `undefined` check so empty content is preserved.</comment>

<file context>
@@ -0,0 +1,87 @@
+  textHtml?: string;
+}): { textPlain?: string; textHtml?: string } {
+  return {
+    textPlain: input.textPlain ? stripHiddenText(input.textPlain) : undefined,
+    textHtml: input.textHtml ? stripHiddenHtml(input.textHtml) : undefined,
+  };
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants