Skip to content

names: collapse X-bridge docs to a one-line note#206

Merged
pudo merged 1 commit intomainfrom
pudo/cluster-xbridge-doc
May 3, 2026
Merged

names: collapse X-bridge docs to a one-line note#206
pudo merged 1 commit intomainfrom
pudo/cluster-xbridge-doc

Conversation

@pudo
Copy link
Copy Markdown
Member

@pudo pudo commented May 3, 2026

Summary

The X-bridge case in run_cluster (a part referenced from two clusters when a later edge bridges two pre-existing clusters with no shared vertex) is structurally unreachable under the current monotone DP. The cursor walk in run_align can't produce the non-monotone overlap pattern an X-bridge requires. Confirmed empirically: 0 bridge events fired across the 818-case cases.csv corpus.

The plan docs and the run_cluster docstring previously framed it as a real-but-rare invariant violation that needed a union-find rewrite. After investigation that's misleading — the bug is real algorithmically but unreachable from real input today. This PR collapses the discussion to a one-line note in each of the three places it was tracked.

Background

Investigation triggered by a pudo/cluster-dsu branch (kept locally as a reference) that would have been a ~300-line rewrite — Dsu struct + iterative path-compressed find + union-by-rank + emit-order tick tracking + 6 unit tests. The rewrite is correct algorithmically and bit-identical on the corpus (0 outcome flips, 0 score diffs over 818 cases). But the X-bridge regression test it introduces exercises code that the real DP can never trigger — it builds synthetic AlignmentData directly.

The structural argument: align.overlaps is built by a single monotone DP walk through the SEP-joined strings. Both qry_idx and res_idx only advance, never backtrack. So if (qp_a, rp_b) accumulates an Equal step, no later step can be at (qp_c, rp_d) with qp_c > qp_a and rp_d < rp_b — but that's exactly what an X-bridge needs (two non-vertex-sharing edges, then a third that bridges them). Under sort-order processing on lex-sorted edges, the bridge edge always lands at a position where it's a chain-via-shared-vertex instead.

The DSU rewrite is the right shape if the connectivity-rule replacement lands (threshold = 0 expands which edges qualify and could unlock the case), or if the DP ever stops being monotone. Neither is imminent. Better to revert and pull the DSU back when there's a concrete reason.

What this PR changes

File Change
rust/src/names/compare.rs (run_cluster docstring) Replaces the "X-bridge limitation" paragraph with a short note explaining structural unreachability + what would need to change to unlock it.
plans/weighted-distance.md § Open spec knobs → Clustering rule fragility Two sub-bullets collapsed to one sentence covering both threshold fragility and the DSU follow-up.
plans/arch-name-distance.md § Pairing rule Drops the X-bridge limitation paragraph + redundant fragility bullets; keeps the threshold-fragility one-liner.

Net diff: 17 insertions, 55 deletions.

No code behaviour change. No test changes.

Test plan

  • cargo test --release --features python — clean
  • pytest tests/ — 470 pass
  • mypy --strict rigour — clean
  • cargo fmt --check, cargo clippy --all-targets -- -D warnings (with and without --features python) — clean

🤖 Generated with Claude Code

The X-bridge case in the cluster-rule fragility tracking
(`run_cluster` mishandling a part bridged into two pre-existing
clusters) is structurally unreachable under the current monotone
DP — the cursor walk in `run_align` can't produce the
non-monotone overlap pattern an X-bridge requires. Confirmed
empirically: 0 bridge events fired across the 818-case
cases.csv corpus.

Investigation was triggered by the cluster-DSU branch, which
would have been a ~300-line rewrite to fix the case. The fix is
correct algorithmically but the bug is unreachable from real
input today, so the work is currently complexity without
measurable gain. Reverting to the shared-vertex implementation
on main and folding the analysis into a one-line code comment.
The DSU rewrite stays as a follow-up if the connectivity-rule
replacement lands (which would expand which edges qualify and
could unlock the case).

Plan-doc cleanup:
- weighted-distance.md § Open spec knobs → Clustering rule
  fragility: two sub-bullets collapsed to one sentence.
- arch-name-distance.md § Pairing rule: drops the X-bridge
  limitation paragraph + redundant fragility bullets; keeps the
  threshold-fragility one-liner.
- compare.rs `run_cluster` docstring: drops the "X-bridge
  limitation" paragraph; replaces with a short note explaining
  the structural unreachability and what would need to change to
  unlock it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pudo pudo merged commit 057459f into main May 3, 2026
20 checks passed
@pudo pudo deleted the pudo/cluster-xbridge-doc branch May 3, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant