fix(health): replace file ratio with test density scoring

awaliuddin · claude · awaliuddin · commit 01f1e40ee758 · 2026-03-12T13:22:51.000-07:00
File ratio counted test FILES, not tests — adding 16 tests to the same
file moved the score zero. New 3-tier system: real line coverage (Istanbul),
test density via grep (tests per source file), file ratio as last resort.

37 tests / 3 files = 12.3 density → 20/20 (was 7/20 with file ratio).

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "forge",
-  "version": "3.4.3",
+  "version": "3.4.4",
   "description": "AI-powered development governance \u2014 automated quality gates, agent orchestration, and project health monitoring for Claude Code",
   "author": {
     "name": "NXTG AI",
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,20 +6,36 @@ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Version
 
 ---
 
-## [3.4.3] — 2026-03-12
+## [3.4.4] — 2026-03-12
 
 ### Fixed
 
-- **Test scoring accuracy** — Health score now uses real line coverage (Istanbul/c8/nyc) for the Tests dimension when a coverage report exists. Previously, real coverage was displayed in the note but ignored for scoring — the score always used the file ratio proxy. Projects with 80% real coverage now score 16/20 instead of whatever the file ratio happened to be.
-- **File ratio proxy too punishing** — When no coverage report exists, the file ratio proxy now awards a 5-point floor for "has tests at all" plus up to 15 scaled by ratio. A project with 1 test file and 3 source files now scores 10/20 (was 7/20). The note tells users to run `--coverage` for accurate scoring.
+- **Test scoring measures real test quality** — Replaced the file ratio proxy (which counted test FILES, not tests) with a 3-tier scoring system:
+  1. **Real line coverage** from Istanbul/c8/nyc when a coverage report exists (gold standard)
+  2. **Test density** — counts actual `it()`/`test()`/`def test_`/`#[test]` declarations via grep (~50ms), scores by tests-per-source-file: <1 sparse (5pts), 1-3 basic (10pts), 3-5 solid (15pts), 5+ thorough (20pts)
+  3. **File ratio** as last resort when test case patterns can't be detected
+- Adding tests now moves the score. A project with 37 tests across 3 source files (12.3/file) scores **20/20** instead of the old 7/20. Previously, adding 16 tests to the same file changed the score by exactly zero.
 
 ### Tests
 
-- **26/26 vitest** (was 25 — 1 new test for scoring accuracy)
+- **27/27 vitest** (was 26 — 1 new test for density tier scoring)
 - **43/43 node:test** (unchanged)
 
 ---
 
+## [3.4.3] — 2026-03-12
+
+### Fixed
+
+- **Test scoring accuracy (patch, superseded by v3.4.4)** — Added 5-point floor for file ratio proxy and real coverage preference. This was an incremental patch; v3.4.4 replaces it with proper test density scoring.
+
+### Tests
+
+- **26/26 vitest**
+- **43/43 node:test**
+
+---
+
 ## [3.4.2] — 2026-03-12
 
 ### Fixed
@@ -121,6 +137,7 @@ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Version
 
 ---
 
+[3.4.4]: https://github.com/nxtg-ai/forge-plugin/compare/v3.4.3...v3.4.4
 [3.4.3]: https://github.com/nxtg-ai/forge-plugin/compare/v3.4.2...v3.4.3
 [3.4.2]: https://github.com/nxtg-ai/forge-plugin/compare/v3.4.1...v3.4.2
 [3.4.1]: https://github.com/nxtg-ai/forge-plugin/compare/v3.4.0...v3.4.1
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -196,7 +196,7 @@ Contextual knowledge, patterns, best practices...
 
 ## Key Dimensions
 
-- **Version:** 3.4.3
+- **Version:** 3.4.4
 - **Components:** 21 commands, 22 agents, 29 skills, 6 hooks, 8 MCP tools
 - **Build:** None (pure markdown, auto-loaded by Claude Code)
 - **MCP Server:** Node.js ES module (`@modelcontextprotocol/sdk@^1.12.1`)
diff --git a/plugins/nxtg-forge/.claude-plugin/plugin.json b/plugins/nxtg-forge/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "forge",
-  "version": "3.4.3",
+  "version": "3.4.4",
   "description": "AI-powered development governance \u2014 automated quality gates, agent orchestration, and project health monitoring for Claude Code",
   "author": {
     "name": "NXTG AI",
diff --git a/plugins/nxtg-forge/servers/governance-mcp/tests/health-score.test.mjs b/plugins/nxtg-forge/servers/governance-mcp/tests/health-score.test.mjs
@@ -105,16 +105,44 @@ describe('getHealthScore', () => {
     try { unlinkSync(join(root, '.env')); } catch {}
   });
 
-  it('test scoring uses real coverage when available, falls back to file ratio with 5pt floor', () => {
+  it('test scoring uses test density when no coverage report exists', () => {
     const root = getFixturePath();
     const result = getHealthScore(root);
     const testCheck = result.checks.find((c) => c.name === 'Test Coverage');
 
-    // Fixture: 2 source files, 1 test file → 50% file ratio, no coverage report
-    // Formula: 5 (floor) + round(50/100 * 15) = 5 + 8 = 13
-    expect(testCheck.points).toBe(13);
-    expect(testCheck.note).toContain('file ratio');
-    expect(testCheck.note).toContain('--coverage');
+    // Fixture: 2 source files, 1 test file with 4 test cases (it() calls)
+    // Density: 4 / 2 = 2.0 tests/file → tier "1-3 = basic" → 10/20
+    expect(testCheck.points).toBe(10);
+    expect(testCheck.note).toContain('4 tests');
+    expect(testCheck.note).toContain('/file');
+  });
+
+  it('test density scoring rewards adding more tests to the same file', () => {
+    const root = getFixturePath();
+
+    // Add more test cases to push density from 2.0 to 5.5 tests/file (thorough tier)
+    const extraTests = join(root, 'tests', 'extra.test.ts');
+    writeFileSync(extraTests, `import { describe, it, expect } from 'vitest';
+describe('extra', () => {
+  it('test a', () => { expect(1).toBe(1); });
+  it('test b', () => { expect(2).toBe(2); });
+  it('test c', () => { expect(3).toBe(3); });
+  it('test d', () => { expect(4).toBe(4); });
+  it('test e', () => { expect(5).toBe(5); });
+  it('test f', () => { expect(6).toBe(6); });
+  it('test g', () => { expect(7).toBe(7); });
+});
+`);
+
+    const result = getHealthScore(root);
+    const testCheck = result.checks.find((c) => c.name === 'Test Coverage');
+
+    // 4 original + 7 new = 11 test cases, 2 source files → density 5.5 → 20/20
+    expect(testCheck.points).toBe(20);
+    expect(testCheck.status).toBe('pass');
+
+    // Cleanup
+    try { unlinkSync(extraTests); } catch {}
   });
 
   it('grade letter matches score boundaries', () => {
diff --git a/plugins/nxtg-forge/servers/governance-mcp/tools.mjs b/plugins/nxtg-forge/servers/governance-mcp/tools.mjs
@@ -228,10 +228,25 @@ export function getCodeMetrics(root = process.env.FORGE_PROJECT_ROOT || process.
   const srcCount = parseInt(sourceFiles) || 0;
   const tstCount = parseInt(testFiles) || 0;
 
+  // Count individual test cases by grepping for test declarations (~50ms)
+  // Covers: JS/TS (it/test), Python (def test_), Rust (#[test]), Go (func Test)
+  const testCaseCount = (() => {
+    const patterns = {
+      node: `grep -rE "^\\s*(it|test)\\s*\\(" --include="*.test.*" --include="*.spec.*" . 2>/dev/null | wc -l`,
+      rust: `grep -rE "#\\[test\\]" --include="*.rs" . 2>/dev/null | wc -l`,
+      python: `grep -rE "^\\s*def test_" --include="*.py" . 2>/dev/null | wc -l`,
+      go: `grep -rE "^func Test" --include="*_test.go" . 2>/dev/null | wc -l`,
+    };
+    const cmd = patterns[projectType];
+    if (!cmd) return 0;
+    return parseInt(run(cmd, { cwd: appRoot, shell: "/bin/bash" })) || 0;
+  })();
+
   return {
     projectType,
     sourceFiles: srcCount,
     testFiles: tstCount,
+    testCaseCount,
     // testFileRatio: test files / source files (proxy metric, not real line coverage)
     testFileRatio: tstCount && srcCount ? Math.round((tstCount / srcCount) * 100) : 0,
     // testCoverage: actual line coverage % from Istanbul/c8 report, null if unavailable
@@ -269,18 +284,27 @@ export function getHealthScore(root = process.env.FORGE_PROJECT_ROOT || process.
     checks.push({ name: "Git Clean", status: "warn", points: 5, note: `${git.modified} modified, ${git.untracked} untracked` });
   }
 
-  // Has tests (20 pts) — prefer real line coverage, fall back to file ratio proxy
+  // Has tests (20 pts) — tiered: real coverage → test density → file ratio
   if (metrics.testFiles > 0) {
     let testScore;
     let coverageNote;
     if (metrics.testCoverage !== null) {
-      // Real line coverage from Istanbul/c8/nyc — use directly
+      // Tier 1: Real line coverage from Istanbul/c8/nyc
       testScore = Math.min(20, Math.round((metrics.testCoverage / 100) * 20));
       coverageNote = `${metrics.testCoverage}% line coverage`;
+    } else if (metrics.testCaseCount > 0 && metrics.sourceFiles > 0) {
+      // Tier 2: Test density — test cases per source file
+      // Benchmarks: <1 = sparse, 1-3 = basic, 3-5 = solid, 5+ = thorough
+      const density = metrics.testCaseCount / metrics.sourceFiles;
+      if (density >= 5) testScore = 20;
+      else if (density >= 3) testScore = 15;
+      else if (density >= 1) testScore = 10;
+      else testScore = 5;
+      coverageNote = `${metrics.testCaseCount} tests across ${metrics.sourceFiles} files (${density.toFixed(1)}/file)`;
     } else {
-      // File ratio proxy: 5 pts floor (has tests) + up to 15 scaled by ratio
-      testScore = Math.min(20, 5 + Math.round((metrics.testFileRatio / 100) * 15));
-      coverageNote = `${metrics.testFileRatio}% file ratio (run with --coverage for accurate scoring)`;
+      // Tier 3: File ratio fallback
+      testScore = Math.min(20, Math.round((metrics.testFileRatio / 100) * 20));
+      coverageNote = `${metrics.testFileRatio}% file ratio`;
     }
     score += testScore;
     checks.push({ name: "Test Coverage", status: testScore >= 15 ? "pass" : "warn", points: testScore, note: coverageNote });

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "forge",`
`3`		`- "version": "3.4.3",`
	`3`	`+ "version": "3.4.4",`
`4`	`4`	`"description": "AI-powered development governance \u2014 automated quality gates, agent orchestration, and project health monitoring for Claude Code",`
`5`	`5`	`"author": {`
`6`	`6`	`"name": "NXTG AI",`