feat: v2-only SWE-Lancer tier list on GitHub Pages by devin-ai-integration[bot] · Pull Request #89 · laiso/ts-bench

devin-ai-integration · 2026-04-02T07:36:33Z

Summary

Adds a static site under docs/ to display v2 SWE-Lancer benchmark results on GitHub Pages, with per-agent detail pages including OGP metadata. V1 (Exercism) results are completely excluded. Closes #88.

What's included:

docs/index.html — Client-side dashboard with three tabs: Tier List (tiermaker-style horizontal rows), Historical Runs (sorted by date), and Task Breakdown (per-task pass/fail for the 5 SWE-Lancer tasks). Only v2 entries are displayed.
scripts/build-results-pages.ts — Bun script that reads public/data/leaderboard.json, filters to v2 entries only, and generates one HTML page per agent/model under docs/results/ with OGP metadata (og:title, og:description, twitter:card).
.github/workflows/gh-pages.yml — Unified workflow (merged the former swelancer-pages.yml into this single file). Builds both benchmark result pages and SWE-Lancer task browser, then deploys docs/ via actions/deploy-pages@v4. Triggers on pushes to main that touch relevant paths, plus workflow_dispatch.
Adds build:results-pages npm script. Generated docs/results/ and docs/data/leaderboard.json are gitignored (CI rebuilds).

Updates since last revision

Expanded sample data (16 agents): Added 10 more dummy v2 agents to leaderboard.json so every tier (S/A/B/C/D/F) has 2–4 agents. Total: cursor, claude, devin (S); codex, gemini, opencode (A); cline, goose, kimi, qwen (B); aider, cursor (C); copilot, gemini (D); windsurf, opencode (F). All data is fabricated for demo purposes.
Agent icons: Downloaded 64×64 PNG icons for all agents via Google's favicon service, stored in docs/assets/icons/. Displayed at 32×32 in agent cards with graceful fallback (onerror hides broken images).
isV2Entry() consistency fix (scripts/build-results-pages.ts): Changed if (entry.tier) → if (entry.tier?.tier) to match the frontend's check, preventing the build script from generating orphan result pages that the frontend never links to.
esc() quote escaping fix (docs/index.html): Replaced the textContent→innerHTML trick with explicit manual escaping of all 5 characters (&, <, >, ", '). The old implementation didn't escape quotes, which could allow attribute injection when esc() output is placed inside alt="..." or src="..." attributes.
tierClass() allowlist (scripts/build-results-pages.ts): Added an allowlist check so only known tier letters (S/A/B/C/D/F) produce a CSS class, preventing class-name injection.

Previous updates (still applied)

Tier list maker layout: Horizontal tier rows (S → F) with colored labels and agent cards, matching tiermaker.com style.
Vercel-style dark theme: Black background (#000), dark surfaces (#111, #1a1a1a), light text (#ededed), underline-style tabs. Tier row colors: S (#ff7f7f), A (#ffbf7f), B (#ffdf7f), C (#ffff7f), D (#bfff7f), F (#7fffff).
V2-only: All v1 Exercism data filtered out client-side (isV2Entry()) and at build time.
XSS hardening: tierBadge() validates tier values against an allowlist. Build script uses escapeHtml() + class allowlist.
Workflow consolidation: Deleted swelancer-pages.yml; merged into gh-pages.yml.

Review & Testing Checklist for Human

All 16 agents use fabricated data — The leaderboard.json entries are entirely made up for demo/layout purposes. Decide whether to keep them for the initial deploy or strip before merge. Real results will overwrite them when update-leaderboard.ts runs after actual v2 benchmarks.
Agent icons may not match actual logos — Icons were sourced from Google's favicon service using agent website domains. Verify the icons are reasonable representations (especially for agents whose websites may not have distinctive favicons).
Tier color inconsistency between pages — index.html uses tiermaker-style colors (S = #ff7f7f red, F = #7fffff cyan) while individual result pages in build-results-pages.ts use a different palette (S = #c9a000 gold, F = #f85149 red). Verify this is acceptable or unify.
V2 detection logic is duplicated in docs/index.html (client JS), scripts/build-results-pages.ts, and src/config/constants.ts. The three isV2Entry / tier-threshold implementations must stay in sync.
Enable GitHub Pages — Settings → Pages → Source: "GitHub Actions". The renamed workflow (gh-pages.yml, formerly swelancer-pages.yml) may require re-selecting the source.

Suggested test plan: Serve docs/ locally (python3 -m http.server 8080 --directory docs), verify all 3 tabs render with 16 agents distributed across all 6 tiers, click agent cards to confirm detail pages load with correct OGP tags, and resize to check mobile responsiveness.

Notes

Typecheck (bun run typecheck) and unit tests (bun test ./src — 90 pass) both pass with no regressions.
Build script generates 16 v2 result pages from the sample data (skips 16 v1 entries).

Link to Devin session: https://app.devin.ai/sessions/6a49c671c925468887611e948335d9af
Requested by: @laiso

- Add docs/index.html with leaderboard, historical runs, and task breakdown tabs - Add scripts/build-results-pages.ts to generate per-agent result pages with OGP metadata - Add .github/workflows/gh-pages.yml to deploy docs/ to GitHub Pages on leaderboard updates - Add build:results-pages script to package.json Closes #88

devin-ai-integration · 2026-04-02T07:36:36Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Address Devin Review feedback: escape tier in body content and og:description attribute, whitelist tier CSS class names.

- Filter leaderboard to show only v2 SWE-Lancer results (5 tasks) - Add tier legend (S/A/B/C/D/F) and sort by tier rank - Fix XSS: add tier allowlist in tierBadge() function - Merge swelancer-pages.yml into unified gh-pages.yml - Delete redundant swelancer-pages.yml workflow - Update build script to skip non-v2 entries - Update page title/description to reference SWE-Lancer v2

devin-ai-integration · 2026-04-02T11:39:43Z

Testing Results — v2-only SWE-Lancer Tier List

Tested locally at http://localhost:8080 serving docs/. All 5 tests passed.

v2-only Page Rendering (5/5 passed)

v2 branding — PASSED. Header shows "SWE-Lancer v2 benchmark". No "Exercism" or "TypeScript workloads" references.
Tier legend — PASSED. S 5/5, A 4/5, B 3/5, C 2/5, D 1/5, F 0/5 all visible with distinct colored badges.
Tier List tab empty state — PASSED. Shows "No v2 benchmark results yet." with --dataset v2 guidance. Zero v1 entries visible (all 23 filtered out).
Historical Runs tab — PASSED. Shows "No v2 results yet."
Task Breakdown tab — PASSED. Shows "No v2 task-level data available."

Build Script v2 Filtering

$ bun scripts/build-results-pages.ts
Generated 0 v2 result pages (skipped 23 non-v2 entries)

Screenshots

Tier List (empty state)	Historical Runs	Task Breakdown
	Shows "No v2 results yet."	Shows "No v2 task-level data available."

Devin session

…2 data - Redesign leaderboard with horizontal tier rows (S/A/B/C/D/F) and agent cards - Vercel-style dark theme: black background, dark surfaces, light text - Tier colors matching tier list maker aesthetic (S: red to F: cyan) - Add 6 sample v2 agents: cursor(S), codex(A), cline(B), aider(C), copilot(D), windsurf(F) - Underline-style tab navigation - Agent cards show name, model, solved count, avg time

- Download favicons for cursor, codex, cline, aider, copilot, windsurf - Display 32x32 icons above agent name in tier list cards - Graceful fallback (hide icon) if image fails to load

…ote escaping - Add 10 more dummy v2 agents to leaderboard.json (16 total) to populate all tier rows (S/A/B/C/D/F) with multiple agents each - Download icons for new agents (claude, devin, gemini, opencode, goose, kimi, qwen) from Google favicon service - Fix isV2Entry() in build script to check entry.tier?.tier (not just entry.tier) to match frontend logic — prevents generating orphan pages - Fix esc() in frontend to escape quotes (" and ') in addition to <>& preventing attribute injection via alt/src attributes

devin-ai-integration

Devin Review found 2 new potential issues.

View 12 additional findings in Devin Review.

devin-ai-integration · 2026-04-02T14:34:21Z

docs/index.html

+          if (e.data.results[i].exercise === task) { found = e.data.results[i]; break; }
+        }
+        if (!found) {
+          html += '<td style="color:var(--text-muted)">-</td>';


🟡 Undefined CSS variable --text-muted used in index.html breakdown table

The renderBreakdown function at docs/index.html:499 uses color:var(--text-muted) in an inline style, but the page's CSS (:root at docs/index.html:13-24) only defines --text-secondary, not --text-muted. When the browser encounters the undefined variable, var(--text-muted) resolves to the initial value, so the color property has no effect and the dash text inherits the parent's color (--text: #ededed, i.e., full white), instead of appearing in a muted/gray color as intended. The generated result pages in scripts/build-results-pages.ts:191 correctly define --text-muted: #8b949e, suggesting --text-secondary was the intended variable in index.html.

Suggested change

html += '<td style="color:var(--text-muted)">-</td>';

html += '<td style="color:var(--text-secondary)">-</td>';

Was this helpful? React with 👍 or 👎 to provide feedback.

Fixed in 68a1670 — changed var(--text-muted) to var(--text-secondary) which is the correct variable defined in this page's :root.

docs/index.html

…text-secondary - Escape agentName via esc() before building iconPath to prevent attribute injection in img src attribute - Replace undefined --text-muted with --text-secondary in breakdown table for missing task cells

devin-ai-integration bot assigned laiso Apr 2, 2026

This comment was marked as resolved.

Sign in to view

fix: escape tier value in HTML to prevent XSS

dc80611

Address Devin Review feedback: escape tier in body content and og:description attribute, whitelist tier CSS class names.

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot changed the title ~~feat: add GitHub Pages benchmark results display~~ feat: v2-only SWE-Lancer tier list on GitHub Pages Apr 2, 2026

devin-ai-integration bot added 2 commits April 2, 2026 14:13

feat: add agent icons to tier list cards

5300723

- Download favicons for cursor, codex, cline, aider, copilot, windsurf - Display 32x32 icons above agent name in tier list cards - Graceful fallback (hide icon) if image fails to load

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot commented Apr 2, 2026

View reviewed changes

laiso merged commit 9df2912 into main Apr 2, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: v2-only SWE-Lancer tier list on GitHub Pages#89

feat: v2-only SWE-Lancer tier list on GitHub Pages#89
laiso merged 7 commits intomainfrom
devin/1775115148-gh-pages-results

devin-ai-integration bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Apr 2, 2026

Uh oh!

devin-ai-integration bot Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	html += '<td style="color:var(--text-muted)">-</td>';
	html += '<td style="color:var(--text-secondary)">-</td>';

Uh oh!

Conversation

devin-ai-integration bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Updates since last revision

Previous updates (still applied)

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

🤖 Devin AI Engineer

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Testing Results — v2-only SWE-Lancer Tier List

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Apr 2, 2026 •

edited

Loading