feat: v2-only SWE-Lancer tier list on GitHub Pages#89
Conversation
- Add docs/index.html with leaderboard, historical runs, and task breakdown tabs - Add scripts/build-results-pages.ts to generate per-agent result pages with OGP metadata - Add .github/workflows/gh-pages.yml to deploy docs/ to GitHub Pages on leaderboard updates - Add build:results-pages script to package.json Closes #88
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Address Devin Review feedback: escape tier in body content and og:description attribute, whitelist tier CSS class names.
- Filter leaderboard to show only v2 SWE-Lancer results (5 tasks) - Add tier legend (S/A/B/C/D/F) and sort by tier rank - Fix XSS: add tier allowlist in tierBadge() function - Merge swelancer-pages.yml into unified gh-pages.yml - Delete redundant swelancer-pages.yml workflow - Update build script to skip non-v2 entries - Update page title/description to reference SWE-Lancer v2
Testing Results — v2-only SWE-Lancer Tier ListTested locally at v2-only Page Rendering (5/5 passed)
Build Script v2 Filtering |
…2 data - Redesign leaderboard with horizontal tier rows (S/A/B/C/D/F) and agent cards - Vercel-style dark theme: black background, dark surfaces, light text - Tier colors matching tier list maker aesthetic (S: red to F: cyan) - Add 6 sample v2 agents: cursor(S), codex(A), cline(B), aider(C), copilot(D), windsurf(F) - Underline-style tab navigation - Agent cards show name, model, solved count, avg time
- Download favicons for cursor, codex, cline, aider, copilot, windsurf - Display 32x32 icons above agent name in tier list cards - Graceful fallback (hide icon) if image fails to load
…ote escaping
- Add 10 more dummy v2 agents to leaderboard.json (16 total) to populate
all tier rows (S/A/B/C/D/F) with multiple agents each
- Download icons for new agents (claude, devin, gemini, opencode, goose,
kimi, qwen) from Google favicon service
- Fix isV2Entry() in build script to check entry.tier?.tier (not just
entry.tier) to match frontend logic — prevents generating orphan pages
- Fix esc() in frontend to escape quotes (" and ') in addition to <>&
preventing attribute injection via alt/src attributes
docs/index.html
Outdated
| if (e.data.results[i].exercise === task) { found = e.data.results[i]; break; } | ||
| } | ||
| if (!found) { | ||
| html += '<td style="color:var(--text-muted)">-</td>'; |
There was a problem hiding this comment.
🟡 Undefined CSS variable --text-muted used in index.html breakdown table
The renderBreakdown function at docs/index.html:499 uses color:var(--text-muted) in an inline style, but the page's CSS (:root at docs/index.html:13-24) only defines --text-secondary, not --text-muted. When the browser encounters the undefined variable, var(--text-muted) resolves to the initial value, so the color property has no effect and the dash text inherits the parent's color (--text: #ededed, i.e., full white), instead of appearing in a muted/gray color as intended. The generated result pages in scripts/build-results-pages.ts:191 correctly define --text-muted: #8b949e, suggesting --text-secondary was the intended variable in index.html.
| html += '<td style="color:var(--text-muted)">-</td>'; | |
| html += '<td style="color:var(--text-secondary)">-</td>'; |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in 68a1670 — changed var(--text-muted) to var(--text-secondary) which is the correct variable defined in this page's :root.
…text-secondary - Escape agentName via esc() before building iconPath to prevent attribute injection in img src attribute - Replace undefined --text-muted with --text-secondary in breakdown table for missing task cells
Summary
Adds a static site under
docs/to display v2 SWE-Lancer benchmark results on GitHub Pages, with per-agent detail pages including OGP metadata. V1 (Exercism) results are completely excluded. Closes #88.What's included:
docs/index.html— Client-side dashboard with three tabs: Tier List (tiermaker-style horizontal rows), Historical Runs (sorted by date), and Task Breakdown (per-task pass/fail for the 5 SWE-Lancer tasks). Only v2 entries are displayed.scripts/build-results-pages.ts— Bun script that readspublic/data/leaderboard.json, filters to v2 entries only, and generates one HTML page per agent/model underdocs/results/with OGP metadata (og:title,og:description,twitter:card)..github/workflows/gh-pages.yml— Unified workflow (merged the formerswelancer-pages.ymlinto this single file). Builds both benchmark result pages and SWE-Lancer task browser, then deploysdocs/viaactions/deploy-pages@v4. Triggers on pushes tomainthat touch relevant paths, plusworkflow_dispatch.build:results-pagesnpm script. Generateddocs/results/anddocs/data/leaderboard.jsonare gitignored (CI rebuilds).Updates since last revision
leaderboard.jsonso every tier (S/A/B/C/D/F) has 2–4 agents. Total: cursor, claude, devin (S); codex, gemini, opencode (A); cline, goose, kimi, qwen (B); aider, cursor (C); copilot, gemini (D); windsurf, opencode (F). All data is fabricated for demo purposes.docs/assets/icons/. Displayed at 32×32 in agent cards with graceful fallback (onerrorhides broken images).isV2Entry()consistency fix (scripts/build-results-pages.ts): Changedif (entry.tier)→if (entry.tier?.tier)to match the frontend's check, preventing the build script from generating orphan result pages that the frontend never links to.esc()quote escaping fix (docs/index.html): Replaced thetextContent→innerHTMLtrick with explicit manual escaping of all 5 characters (&,<,>,",'). The old implementation didn't escape quotes, which could allow attribute injection whenesc()output is placed insidealt="..."orsrc="..."attributes.tierClass()allowlist (scripts/build-results-pages.ts): Added an allowlist check so only known tier letters (S/A/B/C/D/F) produce a CSS class, preventing class-name injection.Previous updates (still applied)
#000), dark surfaces (#111,#1a1a1a), light text (#ededed), underline-style tabs. Tier row colors: S (#ff7f7f), A (#ffbf7f), B (#ffdf7f), C (#ffff7f), D (#bfff7f), F (#7fffff).isV2Entry()) and at build time.tierBadge()validates tier values against an allowlist. Build script usesescapeHtml()+ class allowlist.swelancer-pages.yml; merged intogh-pages.yml.Review & Testing Checklist for Human
leaderboard.jsonentries are entirely made up for demo/layout purposes. Decide whether to keep them for the initial deploy or strip before merge. Real results will overwrite them whenupdate-leaderboard.tsruns after actual v2 benchmarks.index.htmluses tiermaker-style colors (S =#ff7f7fred, F =#7fffffcyan) while individual result pages inbuild-results-pages.tsuse a different palette (S =#c9a000gold, F =#f85149red). Verify this is acceptable or unify.docs/index.html(client JS),scripts/build-results-pages.ts, andsrc/config/constants.ts. The threeisV2Entry/ tier-threshold implementations must stay in sync.gh-pages.yml, formerlyswelancer-pages.yml) may require re-selecting the source.Suggested test plan: Serve
docs/locally (python3 -m http.server 8080 --directory docs), verify all 3 tabs render with 16 agents distributed across all 6 tiers, click agent cards to confirm detail pages load with correct OGP tags, and resize to check mobile responsiveness.Notes
bun run typecheck) and unit tests (bun test ./src— 90 pass) both pass with no regressions.Link to Devin session: https://app.devin.ai/sessions/6a49c671c925468887611e948335d9af
Requested by: @laiso