Skip to content

Commit 6c5c247

Browse files
nearestnaborsclaudecursoragent
authored
Prevent vale style review from removing code blocks (#701)
* Prevent vale style review from removing code blocks The LLM was making overly aggressive suggestions that removed entire code blocks from documentation. This adds multiple safeguards: - Add getLinesInCodeBlocks() to detect lines inside fenced code blocks - Skip any suggestion targeting lines inside code blocks - Reject suggestions where suggested text is <70% of original length - Strengthen LLM prompt with explicit "NEVER MODIFY CODE" instructions - Extract validation helpers to reduce cognitive complexity Fixes the issue where PR #700 removed 173 lines of code examples. Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix inconsistent length validation and prompt contradiction in vale-style-review - Make isDestructiveChange use trimmed lengths to match wouldRemoveContent - Update prompt to reflect actual validation logic (allows 30% shorter text) This fixes false positives where whitespace-only changes were incorrectly rejected, and aligns the LLM prompt with the actual 70% length threshold in the code. * Fix nested code block detection in getLinesInCodeBlocks The function now tracks the opening backtick count and only closes a code block when encountering a line with at least as many backticks. This prevents nested code blocks (e.g., 4+ backticks wrapping 3-backtick blocks) from being incorrectly detected. * Fix CommonMark code fence parsing bugs in vale-style-review - Enforce 3-space indentation limit for code fences per CommonMark spec - Validate closing fences only accept whitespace after backticks - Prevent misidentification of indented backticks as fence markers - Ensure non-whitespace content after backticks keeps fence open * Fix lint errors in vale-style-review.ts - Extract magic numbers to constants (MAX_FENCE_INDENT, TAB_STOP_WIDTH) - Reduce cognitive complexity in getLinesInCodeBlocks by extracting helper functions - Fix line formatting in isDestructiveChange Co-Authored-By: Claude Opus 4.5 <[email protected]> * Fix: Support tilde-fenced code blocks in vale-style-review The getLinesInCodeBlocks function now detects both backtick ( and ~~~ - Enhanced isValidClosingFence to match opening fence character - Updated getLinesInCodeBlocks to track fence type --------- Co-authored-by: Rachel Lee Nabors <[email protected]> Co-authored-by: Claude Opus 4.5 <[email protected]> Co-authored-by: Cursor Agent <[email protected]>
1 parent 0d24efc commit 6c5c247

File tree

1 file changed

+159
-11
lines changed

1 file changed

+159
-11
lines changed

scripts/vale-style-review.ts

Lines changed: 159 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,9 @@ const DIFF_HUNK_REGEX = /^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/;
4545
const OWNER = "ArcadeAI";
4646
const REPO = "docs";
4747
const MAX_LENGTH_CHANGE_RATIO = 0.5;
48+
const MIN_SUGGESTED_LENGTH_RATIO = 0.7; // Suggested must be at least 70% of original length
49+
const MAX_FENCE_INDENT = 3; // CommonMark allows 0-3 spaces before fence markers
50+
const TAB_STOP_WIDTH = 4; // CommonMark tab stops are at multiples of 4
4851

4952
// Load style guide
5053
const STYLE_GUIDE_PATH = join(__dirname, "..", "STYLEGUIDE.md");
@@ -288,16 +291,22 @@ ${STYLE_GUIDE}
288291
289292
TASK: Fix the Vale style issues listed below for this file. Return JSON with your suggestions.
290293
294+
CRITICAL - NEVER MODIFY CODE:
295+
- NEVER suggest changes to lines inside code blocks (lines between \`\`\` or ~~~ markers)
296+
- NEVER suggest changes to code examples, import statements, function calls, or variable names
297+
- SKIP any Vale issue that appears inside a code block or affects code
298+
291299
RULES:
292300
1. ONLY fix the specific issues listed - do not make any other changes
293301
2. Make MINIMAL changes - only change the specific word or phrase mentioned in the issue
294302
3. NEVER delete content, rewrite sentences, or change anything beyond the flagged issue
295303
4. If a message says "Use 'X' instead of 'Y'", find ONLY Y and replace with X - nothing else
296304
5. Preserve technical accuracy - never change code or technical details
297305
6. For passive voice - only fix if active voice is clearer
298-
7. If an issue should NOT be fixed (e.g., passive voice is appropriate), omit it
306+
7. If an issue should NOT be fixed (e.g., passive voice is appropriate, or it's inside code), OMIT IT
299307
8. The "original" field must contain the EXACT full line from the file
300308
9. The "suggested" field must be identical to "original" EXCEPT for the specific fix
309+
10. The "suggested" field should not be significantly shorter than the "original" - you are fixing style, not removing content
301310
302311
FILE: ${filename}
303312
@@ -396,6 +405,136 @@ async function getSuggestions(
396405
}
397406
}
398407

408+
// Convert leading whitespace (including tabs) to equivalent space count
409+
// Per CommonMark spec: tabs behave as if replaced by spaces with tab stop of 4
410+
// Tab stops are at positions 0, 4, 8, 12, etc. (multiples of 4)
411+
function countLeadingWhitespace(line: string): number {
412+
let position = 0;
413+
for (const char of line) {
414+
if (char === " ") {
415+
position += 1;
416+
} else if (char === "\t") {
417+
// Advance to next tab stop (multiple of TAB_STOP_WIDTH)
418+
position = Math.ceil((position + 1) / TAB_STOP_WIDTH) * TAB_STOP_WIDTH;
419+
} else {
420+
// Non-whitespace character - stop counting
421+
break;
422+
}
423+
}
424+
return position;
425+
}
426+
427+
// Count consecutive fence characters (backticks or tildes) at the start of a trimmed line
428+
// Returns both the count and the character type
429+
function countLeadingFenceChars(trimmedLine: string): {
430+
count: number;
431+
char: string;
432+
} {
433+
const firstChar = trimmedLine[0];
434+
if (firstChar !== "`" && firstChar !== "~") {
435+
return { count: 0, char: "" };
436+
}
437+
438+
let count = 0;
439+
for (const char of trimmedLine) {
440+
if (char === firstChar) {
441+
count += 1;
442+
} else {
443+
break;
444+
}
445+
}
446+
return { count, char: firstChar };
447+
}
448+
449+
// Check if a line could be a fence marker (has valid indentation and starts with ``` or ~~~)
450+
function isFenceCandidate(line: string): boolean {
451+
const leadingWhitespace = countLeadingWhitespace(line);
452+
const trimmedLine = line.trim();
453+
return (
454+
leadingWhitespace <= MAX_FENCE_INDENT &&
455+
(trimmedLine.startsWith("```") || trimmedLine.startsWith("~~~"))
456+
);
457+
}
458+
459+
// Check if a line is a valid closing fence
460+
function isValidClosingFence(
461+
trimmedLine: string,
462+
fenceInfo: { count: number; char: string },
463+
openingFenceInfo: { count: number; char: string }
464+
): boolean {
465+
// For closing fence: must use same character, have at least as many chars as opening,
466+
// AND only whitespace after the fence chars (per CommonMark spec)
467+
const afterFenceChars = trimmedLine.slice(fenceInfo.count);
468+
return (
469+
fenceInfo.char === openingFenceInfo.char &&
470+
fenceInfo.count >= openingFenceInfo.count &&
471+
afterFenceChars.trim() === ""
472+
);
473+
}
474+
475+
// Check which lines are inside code blocks (fenced with ``` or ~~~)
476+
// Returns a Set of line numbers (1-indexed) that are inside code blocks
477+
function getLinesInCodeBlocks(content: string): Set<number> {
478+
const linesInCodeBlocks = new Set<number>();
479+
const lines = content.split("\n");
480+
let inCodeBlock = false;
481+
let openingFenceInfo = { count: 0, char: "" };
482+
483+
for (let i = 0; i < lines.length; i += 1) {
484+
const line = lines[i];
485+
const lineNum = i + 1; // 1-indexed
486+
const trimmedLine = line.trim();
487+
488+
if (isFenceCandidate(line)) {
489+
const fenceInfo = countLeadingFenceChars(trimmedLine);
490+
491+
if (inCodeBlock) {
492+
linesInCodeBlocks.add(lineNum);
493+
if (isValidClosingFence(trimmedLine, fenceInfo, openingFenceInfo)) {
494+
inCodeBlock = false;
495+
openingFenceInfo = { count: 0, char: "" };
496+
}
497+
} else {
498+
// Opening a code block
499+
linesInCodeBlocks.add(lineNum);
500+
inCodeBlock = true;
501+
openingFenceInfo = fenceInfo;
502+
}
503+
} else if (inCodeBlock) {
504+
linesInCodeBlocks.add(lineNum);
505+
}
506+
}
507+
508+
return linesInCodeBlocks;
509+
}
510+
511+
// Validate that a suggestion has the required fields with correct types
512+
function hasValidFields(s: Suggestion): boolean {
513+
return (
514+
typeof s.line === "number" &&
515+
typeof s.original === "string" &&
516+
typeof s.suggested === "string" &&
517+
typeof s.rule === "string"
518+
);
519+
}
520+
521+
// Check if a suggestion would remove too much content
522+
function wouldRemoveContent(s: Suggestion): boolean {
523+
const originalLen = s.original.trim().length;
524+
const suggestedLen = s.suggested.trim().length;
525+
return (
526+
originalLen > 0 && suggestedLen < originalLen * MIN_SUGGESTED_LENGTH_RATIO
527+
);
528+
}
529+
530+
// Check if a suggestion makes destructive length changes
531+
function isDestructiveChange(s: Suggestion): boolean {
532+
const lengthDiff = Math.abs(
533+
s.suggested.trim().length - s.original.trim().length
534+
);
535+
return lengthDiff > s.original.trim().length * MAX_LENGTH_CHANGE_RATIO;
536+
}
537+
399538
// Format suggestions as GitHub review comments
400539
// Only includes suggestions for lines that are in the PR diff and don't already have Vale comments
401540
function formatReviewComments(options: {
@@ -414,6 +553,9 @@ function formatReviewComments(options: {
414553
} = options;
415554
const lines = fileContent.split("\n");
416555

556+
// Get lines that are inside code blocks - NEVER modify these
557+
const linesInCodeBlocks = getLinesInCodeBlocks(fileContent);
558+
417559
return suggestions
418560
.filter((s) => {
419561
// Skip if there's already a Vale comment on this line
@@ -423,30 +565,36 @@ function formatReviewComments(options: {
423565
return false;
424566
}
425567
// Validate required fields exist and have correct types
426-
if (
427-
typeof s.line !== "number" ||
428-
typeof s.original !== "string" ||
429-
typeof s.suggested !== "string" ||
430-
typeof s.rule !== "string"
431-
) {
568+
if (!hasValidFields(s)) {
432569
return false;
433570
}
434571
// Validate line number is in range
435572
if (s.line < 1 || s.line > lines.length) {
436573
return false;
437574
}
575+
// CRITICAL: Never modify lines inside code blocks
576+
if (linesInCodeBlocks.has(s.line)) {
577+
console.log(
578+
` Skipping line ${s.line} (inside code block - code must not be modified)`
579+
);
580+
return false;
581+
}
438582
// Validate line is in the PR diff (GitHub API requirement)
439583
if (!commentableLines.has(s.line)) {
440584
return false;
441585
}
442-
// Reject destructive suggestions (length change > 50% of original)
443-
const lengthDiff = Math.abs(s.suggested.length - s.original.length);
444-
if (lengthDiff > s.original.length * MAX_LENGTH_CHANGE_RATIO) {
586+
// Reject suggestions that remove content (suggested is too short)
587+
if (wouldRemoveContent(s)) {
445588
console.log(
446-
` Skipping destructive suggestion on line ${s.line} (length change: ${lengthDiff} chars)`
589+
` Skipping line ${s.line} (suggested text too short - would remove content)`
447590
);
448591
return false;
449592
}
593+
// Reject destructive suggestions (length change > 50% of original)
594+
if (isDestructiveChange(s)) {
595+
console.log(` Skipping destructive suggestion on line ${s.line}`);
596+
return false;
597+
}
450598
// Validate original content matches (loosely)
451599
const actualLine = lines[s.line - 1];
452600
return actualLine.includes(

0 commit comments

Comments
 (0)