Skip to content

Commit 01f1e40

Browse files
awaliuddinclaude
andcommitted
fix(health): replace file ratio with test density scoring
File ratio counted test FILES, not tests — adding 16 tests to the same file moved the score zero. New 3-tier system: real line coverage (Istanbul), test density via grep (tests per source file), file ratio as last resort. 37 tests / 3 files = 12.3 density → 20/20 (was 7/20 with file ratio). Co-Authored-By: Claude Opus 4.6 <[email protected]>
1 parent 0a6100a commit 01f1e40

6 files changed

Lines changed: 87 additions & 18 deletions

File tree

.claude-plugin/plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "forge",
3-
"version": "3.4.3",
3+
"version": "3.4.4",
44
"description": "AI-powered development governance \u2014 automated quality gates, agent orchestration, and project health monitoring for Claude Code",
55
"author": {
66
"name": "NXTG AI",

CHANGELOG.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,36 @@ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Version
66

77
---
88

9-
## [3.4.3] — 2026-03-12
9+
## [3.4.4] — 2026-03-12
1010

1111
### Fixed
1212

13-
- **Test scoring accuracy** — Health score now uses real line coverage (Istanbul/c8/nyc) for the Tests dimension when a coverage report exists. Previously, real coverage was displayed in the note but ignored for scoring — the score always used the file ratio proxy. Projects with 80% real coverage now score 16/20 instead of whatever the file ratio happened to be.
14-
- **File ratio proxy too punishing** — When no coverage report exists, the file ratio proxy now awards a 5-point floor for "has tests at all" plus up to 15 scaled by ratio. A project with 1 test file and 3 source files now scores 10/20 (was 7/20). The note tells users to run `--coverage` for accurate scoring.
13+
- **Test scoring measures real test quality** — Replaced the file ratio proxy (which counted test FILES, not tests) with a 3-tier scoring system:
14+
1. **Real line coverage** from Istanbul/c8/nyc when a coverage report exists (gold standard)
15+
2. **Test density** — counts actual `it()`/`test()`/`def test_`/`#[test]` declarations via grep (~50ms), scores by tests-per-source-file: <1 sparse (5pts), 1-3 basic (10pts), 3-5 solid (15pts), 5+ thorough (20pts)
16+
3. **File ratio** as last resort when test case patterns can't be detected
17+
- Adding tests now moves the score. A project with 37 tests across 3 source files (12.3/file) scores **20/20** instead of the old 7/20. Previously, adding 16 tests to the same file changed the score by exactly zero.
1518

1619
### Tests
1720

18-
- **26/26 vitest** (was 25 — 1 new test for scoring accuracy)
21+
- **27/27 vitest** (was 26 — 1 new test for density tier scoring)
1922
- **43/43 node:test** (unchanged)
2023

2124
---
2225

26+
## [3.4.3] — 2026-03-12
27+
28+
### Fixed
29+
30+
- **Test scoring accuracy (patch, superseded by v3.4.4)** — Added 5-point floor for file ratio proxy and real coverage preference. This was an incremental patch; v3.4.4 replaces it with proper test density scoring.
31+
32+
### Tests
33+
34+
- **26/26 vitest**
35+
- **43/43 node:test**
36+
37+
---
38+
2339
## [3.4.2] — 2026-03-12
2440

2541
### Fixed
@@ -121,6 +137,7 @@ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Version
121137

122138
---
123139

140+
[3.4.4]: https://github.com/nxtg-ai/forge-plugin/compare/v3.4.3...v3.4.4
124141
[3.4.3]: https://github.com/nxtg-ai/forge-plugin/compare/v3.4.2...v3.4.3
125142
[3.4.2]: https://github.com/nxtg-ai/forge-plugin/compare/v3.4.1...v3.4.2
126143
[3.4.1]: https://github.com/nxtg-ai/forge-plugin/compare/v3.4.0...v3.4.1

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ Contextual knowledge, patterns, best practices...
196196

197197
## Key Dimensions
198198

199-
- **Version:** 3.4.3
199+
- **Version:** 3.4.4
200200
- **Components:** 21 commands, 22 agents, 29 skills, 6 hooks, 8 MCP tools
201201
- **Build:** None (pure markdown, auto-loaded by Claude Code)
202202
- **MCP Server:** Node.js ES module (`@modelcontextprotocol/sdk@^1.12.1`)

plugins/nxtg-forge/.claude-plugin/plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "forge",
3-
"version": "3.4.3",
3+
"version": "3.4.4",
44
"description": "AI-powered development governance \u2014 automated quality gates, agent orchestration, and project health monitoring for Claude Code",
55
"author": {
66
"name": "NXTG AI",

plugins/nxtg-forge/servers/governance-mcp/tests/health-score.test.mjs

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -105,16 +105,44 @@ describe('getHealthScore', () => {
105105
try { unlinkSync(join(root, '.env')); } catch {}
106106
});
107107

108-
it('test scoring uses real coverage when available, falls back to file ratio with 5pt floor', () => {
108+
it('test scoring uses test density when no coverage report exists', () => {
109109
const root = getFixturePath();
110110
const result = getHealthScore(root);
111111
const testCheck = result.checks.find((c) => c.name === 'Test Coverage');
112112

113-
// Fixture: 2 source files, 1 test file → 50% file ratio, no coverage report
114-
// Formula: 5 (floor) + round(50/100 * 15) = 5 + 8 = 13
115-
expect(testCheck.points).toBe(13);
116-
expect(testCheck.note).toContain('file ratio');
117-
expect(testCheck.note).toContain('--coverage');
113+
// Fixture: 2 source files, 1 test file with 4 test cases (it() calls)
114+
// Density: 4 / 2 = 2.0 tests/file → tier "1-3 = basic" → 10/20
115+
expect(testCheck.points).toBe(10);
116+
expect(testCheck.note).toContain('4 tests');
117+
expect(testCheck.note).toContain('/file');
118+
});
119+
120+
it('test density scoring rewards adding more tests to the same file', () => {
121+
const root = getFixturePath();
122+
123+
// Add more test cases to push density from 2.0 to 5.5 tests/file (thorough tier)
124+
const extraTests = join(root, 'tests', 'extra.test.ts');
125+
writeFileSync(extraTests, `import { describe, it, expect } from 'vitest';
126+
describe('extra', () => {
127+
it('test a', () => { expect(1).toBe(1); });
128+
it('test b', () => { expect(2).toBe(2); });
129+
it('test c', () => { expect(3).toBe(3); });
130+
it('test d', () => { expect(4).toBe(4); });
131+
it('test e', () => { expect(5).toBe(5); });
132+
it('test f', () => { expect(6).toBe(6); });
133+
it('test g', () => { expect(7).toBe(7); });
134+
});
135+
`);
136+
137+
const result = getHealthScore(root);
138+
const testCheck = result.checks.find((c) => c.name === 'Test Coverage');
139+
140+
// 4 original + 7 new = 11 test cases, 2 source files → density 5.5 → 20/20
141+
expect(testCheck.points).toBe(20);
142+
expect(testCheck.status).toBe('pass');
143+
144+
// Cleanup
145+
try { unlinkSync(extraTests); } catch {}
118146
});
119147

120148
it('grade letter matches score boundaries', () => {

plugins/nxtg-forge/servers/governance-mcp/tools.mjs

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -228,10 +228,25 @@ export function getCodeMetrics(root = process.env.FORGE_PROJECT_ROOT || process.
228228
const srcCount = parseInt(sourceFiles) || 0;
229229
const tstCount = parseInt(testFiles) || 0;
230230

231+
// Count individual test cases by grepping for test declarations (~50ms)
232+
// Covers: JS/TS (it/test), Python (def test_), Rust (#[test]), Go (func Test)
233+
const testCaseCount = (() => {
234+
const patterns = {
235+
node: `grep -rE "^\\s*(it|test)\\s*\\(" --include="*.test.*" --include="*.spec.*" . 2>/dev/null | wc -l`,
236+
rust: `grep -rE "#\\[test\\]" --include="*.rs" . 2>/dev/null | wc -l`,
237+
python: `grep -rE "^\\s*def test_" --include="*.py" . 2>/dev/null | wc -l`,
238+
go: `grep -rE "^func Test" --include="*_test.go" . 2>/dev/null | wc -l`,
239+
};
240+
const cmd = patterns[projectType];
241+
if (!cmd) return 0;
242+
return parseInt(run(cmd, { cwd: appRoot, shell: "/bin/bash" })) || 0;
243+
})();
244+
231245
return {
232246
projectType,
233247
sourceFiles: srcCount,
234248
testFiles: tstCount,
249+
testCaseCount,
235250
// testFileRatio: test files / source files (proxy metric, not real line coverage)
236251
testFileRatio: tstCount && srcCount ? Math.round((tstCount / srcCount) * 100) : 0,
237252
// testCoverage: actual line coverage % from Istanbul/c8 report, null if unavailable
@@ -269,18 +284,27 @@ export function getHealthScore(root = process.env.FORGE_PROJECT_ROOT || process.
269284
checks.push({ name: "Git Clean", status: "warn", points: 5, note: `${git.modified} modified, ${git.untracked} untracked` });
270285
}
271286

272-
// Has tests (20 pts) — prefer real line coverage, fall back to file ratio proxy
287+
// Has tests (20 pts) — tiered: real coverage → test density → file ratio
273288
if (metrics.testFiles > 0) {
274289
let testScore;
275290
let coverageNote;
276291
if (metrics.testCoverage !== null) {
277-
// Real line coverage from Istanbul/c8/nyc — use directly
292+
// Tier 1: Real line coverage from Istanbul/c8/nyc
278293
testScore = Math.min(20, Math.round((metrics.testCoverage / 100) * 20));
279294
coverageNote = `${metrics.testCoverage}% line coverage`;
295+
} else if (metrics.testCaseCount > 0 && metrics.sourceFiles > 0) {
296+
// Tier 2: Test density — test cases per source file
297+
// Benchmarks: <1 = sparse, 1-3 = basic, 3-5 = solid, 5+ = thorough
298+
const density = metrics.testCaseCount / metrics.sourceFiles;
299+
if (density >= 5) testScore = 20;
300+
else if (density >= 3) testScore = 15;
301+
else if (density >= 1) testScore = 10;
302+
else testScore = 5;
303+
coverageNote = `${metrics.testCaseCount} tests across ${metrics.sourceFiles} files (${density.toFixed(1)}/file)`;
280304
} else {
281-
// File ratio proxy: 5 pts floor (has tests) + up to 15 scaled by ratio
282-
testScore = Math.min(20, 5 + Math.round((metrics.testFileRatio / 100) * 15));
283-
coverageNote = `${metrics.testFileRatio}% file ratio (run with --coverage for accurate scoring)`;
305+
// Tier 3: File ratio fallback
306+
testScore = Math.min(20, Math.round((metrics.testFileRatio / 100) * 20));
307+
coverageNote = `${metrics.testFileRatio}% file ratio`;
284308
}
285309
score += testScore;
286310
checks.push({ name: "Test Coverage", status: testScore >= 15 ? "pass" : "warn", points: testScore, note: coverageNote });

0 commit comments

Comments
 (0)