Releases: nxtg-ai/forge-plugin
v3.6.0 — Encyclopedia-Grade Documentation
What's New
Encyclopedia-Grade Documentation Rewrite
Complete rewrite of all user-facing documentation. Every agent, command, and skill now has its own reference page with consistent structure: What It Does → When to Use → How It Works → Examples → Power Use Cases → Combines With → Level Progression (L1/L2/L3) → Tips & Gotchas.
By the numbers:
- 92 new documentation files
- 10,870 lines of documentation
- 112 lines average per doc
- Written by 5 parallel agent teams reading all source files
New Documentation
docs/agents/— 33 individual agent reference pagesdocs/commands/— 23 individual command reference pagesdocs/skills/— 32 individual skill reference pagesdocs/LEVELS.md— Consolidated L1→L2→L3 journey with feature matrix, upgrade criteria, and downgrade pathdocs/GLOSSARY.md— 18 Forge terms explained three ways (analogy → definition → Forge-specific context)docs/README.md— Documentation hub with user journey, architecture diagram, quick reference
Quality Improvements (UAT-Driven)
- Agent differentiation table — "Which Quality Agent Do I Use?" comparison (Detective vs Guardian vs Security vs Compliance vs Crucible) with speed, depth, and use-case guidance
- Cross-reference links — Pipeline breadcrumbs on Planner, Builder, Guardian showing the feature pipeline flow
- Pre-release checklist — Step-by-step release workflow with agent references and failure recovery
- Mermaid workflow diagrams — Architecture, feature pipeline, daily workflow, release process
- Anti-pattern tables — "Common Rationalizations" in Guardian and Security docs (pattern from superpowers plugin)
- Skills clarification — "You don't need to configure anything" callout at top of skills README
- Agent count fixed — C-13 updated from "23 agents" to "33 agents"
- All 13 chapter docs linked — No more orphan documentation
Plugin Updates
- Description updated to reflect governance-native positioning
- Version bump: 3.5.1 → 3.6.0
forge-plugin v3.5.1 — WSL2 tmux Dashboard Fix
Bug Fixes
BUG 1 — Auto-open fails silently in tmux
open() fails in tmux sessions because DISPLAY/WSL_INTEROP env vars are not forwarded. Now tries two fallbacks in order:
powershell.exe Start "<url>"— works withoutDISPLAYwslview "<path>"— via thewslupackage
If all methods fail, returns hint: with a pasteable URL so the user can open manually.
BUG 2 — Wrong path format for Windows browsers
file:///tmp/forge-dashboard-xxx.html cannot be opened by Windows browsers from WSL2. Now detects WSL2 via WSL_DISTRO_NAME or /proc/sys/fs/binfmt_misc/WSLInterop and returns the correct Windows-accessible URL:
file://///wsl.localhost/<distro>/tmp/forge-dashboard-xxx.html
Both path (Linux path) and browserUrl (Windows-ready URL) are now included in the response.
Test Suite
44/44 vitest tests pass. 43/43 node:test tests pass.
Full Changelog: v3.5.0...v3.5.1
forge-plugin v3.5.0 — CLA Enforcement
What's Changed
Added
- CLA enforcement —
contributor-assistant/github-action@v2added to.github/workflows/cla.yml. All PR contributors must sign before a PR can be merged. - CLA.md — Contributor License Agreement document (based on Apache ICLA terms). Stored in repo root; signatures recorded in
.github/cla-signatures.json. - CONTRIBUTING.md — Updated with CLA signing instructions. Sign by commenting:
I have read the CLA Document and I hereby sign the CLA
Notes
- License unchanged: MIT.
- Bots and
dependabot[bot]are auto-allowlisted. - Action required: Add
CLA_PERSONAL_ACCESS_TOKENto repository secrets for the bot to commit signatures back to the repo.
Test Suite
44/44 vitest tests pass. 43/43 node:test tests pass.
Full Changelog: v3.4.9...v3.5.0
forge-plugin v3.4.9 — Documentation Quality Pass
What's Changed
Documentation
- JSDoc coverage — Added complete JSDoc blocks to all 11 exported functions in
tools.mjs:run,readJson,serverVersion,findApplicationRoot,getGovernanceState,getGitStatus,getCodeMetrics,getHealthScore,getTestResults,listCheckpoints,getSecurityScan,generateDashboard. Coverage: 0% → 100%. - CONTRIBUTING.md — New contributor guide covering clone/setup, both test suites (
npx vitest runandnode --test), file format templates for commands/agents/skills, MCP server dev workflow, and PR/commit conventions. - C-13 agent count — Fixed
docs/C-13-agents-skills.mdheading: "22 Specialized Agents" → "23 Specialized Agents". - marketplace.json — Synced version to 3.4.9 (was stale at 3.2.0).
- CHANGELOG.md — Added v3.4.8 entry documenting the CRUCIBLE remediation work.
Dependencies
- Hono dependency bumped via Dependabot (#3).
Test Suite
44/44 vitest tests pass. 43/43 node:test tests pass.
Full Changelog: v3.4.8...v3.4.9
v3.4.8 — CRUCIBLE Remediation
CRUCIBLE Protocol Remediation
Addresses P0/P1 findings from the CRUCIBLE test quality audit (2026-03-09).
Changes
index.mjs test coverage — Added unit tests with FORGE_TEST_MODE guard. Previously 0% coverage.
Hollow assertion elimination — Replaced all toBeDefined(), toBeTruthy(), and typeof assertions with specific value assertions using expect.objectContaining() and exact matches. Hollow rate: 0% (was 17.1%).
Silent catch block remediation — All 5 remaining silent catch blocks now log with console.warn() and structured error context. No more silently swallowed errors.
Test Counts
87 tests (44 vitest + 43 node:test), 0 failures.
Full Changelog: v3.4.7...v3.4.8
v3.4.7 — Agent naming consistency + component count audit
What changed
Agent file naming cleanup
All 23 agent files renamed from legacy prefixed names to clean names:
[AFRG]-database.md→database.md[NXTG-CEO]-LOOP.md→ceo-loop.mdforge-oracle.md→oracle.md- ... and 20 more
The Claude Code /plugin TUI now shows clean agent names instead of [AFRG]-database style prefixes.
Component count audit
Audited and corrected counts across 15+ files:
- Commands: 23 (was showing 21)
- Agents: 23 (was showing 22)
- Skills: 32 (was showing 29)
- Hooks: 7 (was showing 6)
Updated in: CLAUDE.md, UAT-Guide, docs pages, slash commands, and skills.
Update
/forge:update
Then restart Claude Code to pick up changes.
v3.4.6 — Dashboard null% fix + score stability
Fixed
-
Dashboard "null% coverage" — HTML dashboard displayed
null% coveragein the Test Files card when no Istanbul/c8 coverage report exists. Now shows the appropriate metric based on what's available:- Real coverage:
80% coverage - Test density:
37 test cases - File ratio fallback:
33% file ratio
- Real coverage:
-
Score stability note — The 80→75 bouncing (governance hook modifying
.claude/governance.json) is already handled by the BUG-01 filter (v3.2.0) which excludes.claude/paths from git cleanliness scoring. If you see this bouncing, restart Claude Code after/forge:update— MCP servers don't hot-reload.
Upgrade
/forge:updateImportant: Restart Claude Code after updating to pick up MCP server changes.
Full changelog: v3.4.5...v3.4.6
v3.4.5 — Production-Ready Test Density Scoring
What this release fixes
v3.4.4 introduced test density scoring but had a ship-blocking bug: the grep for counting test cases scanned node_modules, dist, and build artifacts. On a real React project (forge-ui), this inflated 4,187 real matches to 14,130 phantom matches — a 237% over-count.
Bug fixes
- P0:
node_modulesinflation — Test case grep now usesfindwith the sameBUILD_ARTIFACT_EXCLUDESas source file counting. No more phantom matches from dependency packages. - Double-counting — v3.4.4 used two grep passes that counted
*.test.*files insidetests/twice. Replaced with singlefind | xargs greppipeline. - Missing patterns — Now detects:
it.each(),test.skip(),test.only(),it.concurrent()(data-driven/skipped tests)__tests__/directory convention (Jest without.test.in filename)*.cy.*files (Cypress E2E tests)#[tokio::test],#[rstest],#[actix_web::test](async Rust)
Validated against 3 real projects
| Project | Stack | Tests | Source files | Density | Score |
|---|---|---|---|---|---|
| forge-ui | React 19 | 4,187 | 851 | 4.9/file | 15/20 (solid) |
| forge-orchestrator | Rust | 362 | 43 | 8.4/file | 20/20 (thorough) |
| game.clicker | React + Vite | 37 | 3 | 12.3/file | 20/20 (thorough) |
Known limitations (documented, not bugs)
- Distribution blindness — 100 tests in 1 file + 0 tests in 99 files averages to 1.0 density. Real line coverage (Tier 1) catches this; density (Tier 2) does not.
- Parameterized tests —
test.each()with 10 rows counts as 1 test case. Go table-driven tests witht.Run()subtests are also undercounted. - BDD/Cucumber —
.featurefiles withScenario:are not detected.
These limitations are inherent to any fast proxy. The scoring clearly recommends running --coverage for Tier 1 accuracy.
Upgrade
/forge:updateThen restart Claude Code.
Full changelog: v3.4.4...v3.4.5
v3.4.4 — Test Density Scoring (replaces file ratio)
What changed
The health score's test dimension now counts actual test cases, not just test files.
The problem (v3.4.1-v3.4.3)
The old metric counted test FILES vs source FILES. A project with 1 test file containing 37 tests and 3 source files scored 7/20 — and adding 16 more tests changed the score by exactly zero.
The fix: 3-tier scoring
| Tier | Method | When |
|---|---|---|
| 1 | Real line coverage (Istanbul/c8/nyc) | Coverage report exists |
| 2 | Test density — grep for it()/test()/def test_/#[test] |
No coverage report, tests detected |
| 3 | File ratio (legacy fallback) | Test patterns not detected |
Test density benchmarks
| Density (tests/source file) | Tier | Score |
|---|---|---|
| < 1 | Sparse | 5/20 |
| 1 – 3 | Basic | 10/20 |
| 3 – 5 | Solid | 15/20 |
| 5+ | Thorough | 20/20 |
Real impact
| Project | Tests | Source files | Before | After |
|---|---|---|---|---|
| game.clicker (37 tests) | 37 | 3 | 7/20 | 20/20 |
| game.clicker (21 tests) | 21 | 3 | 7/20 | 15/20 |
| Fixture (4 tests) | 4 | 2 | 10/20 | 10/20 |
Adding tests now visibly moves the score.
Tests
- 27/27 vitest (was 26)
- 43/43 node:test (unchanged)
Upgrade
/forge:updateThen restart Claude Code to pick up MCP server changes.
Full changelog: v3.4.3...v3.4.4
v3.4.3 — Test Scoring Accuracy Fix
Fixed
- Test scoring accuracy — Health score now uses real line coverage (Istanbul/c8/nyc) for the Tests dimension when a coverage report exists. Previously, real coverage was displayed in the note but ignored for scoring — the score always used the file ratio proxy.
- File ratio proxy too punishing — When no coverage report exists, the file ratio proxy now awards a 5-point floor for "has tests at all" plus up to 15 scaled by ratio. A project with 1 test file and 3 source files now scores 10/20 (was 7/20). The note tells users to run
--coveragefor accurate scoring.
Scoring breakdown
| Scenario | Before | After |
|---|---|---|
| 1 test file / 3 source files (33% ratio) | 7/20 | 10/20 |
| 2 test files / 4 source files (50% ratio) | 10/20 | 13/20 |
| Real coverage 80% (Istanbul report) | Used file ratio! | 16/20 |
| No tests at all | 0/20 | 0/20 (unchanged) |
Tests
- 26/26 vitest (was 25 — 1 new)
- 43/43 node:test (unchanged)
Upgrade
/forge:updateFull changelog: v3.4.2...v3.4.3