feat: implement subscription-based (API key-less) authentication for agents#90
feat: implement subscription-based (API key-less) authentication for agents#90
Conversation
…agents Implement all 7 tasks from the subscription-auth spec: 1. Add createAuthCacheArgs(), resolveAuthCachePath(), hasAuthCache() to src/utils/docker.ts 2. Add tryAnyEnv() to src/utils/env.ts; make API keys optional in claude/gemini/codex builders 3. Apply auth mounts in docker-strategy.ts for both v1 and v2 execution paths 4. Apply auth mounts in v2-container.ts (replace hardcoded claudeMount) 5. Implement --setup-auth <agent> CLI command for interactive Docker login 6. Update environment.md with subscription auth documentation 7. Add unit tests for new functions and subscription auth behavior Auth priority: API key > auth cache volume > error with --setup-auth suggestion. GitHub Actions unaffected (always uses API key secrets). Closes #59
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
…duplicate agent list, remove unused import
|
Addressed both Devin Review findings in 3563b38:
|
|
Fixed in aa2fdd2: |
|
Fixed in 1e2fa9d: New test added: "returns false when dir has files but no sentinel" confirms that having |
Gemini CLI does not have a 'login' sub-command — it authenticates interactively on first launch. Add AUTH_LOGIN_ARGS mapping so --setup-auth gemini runs 'gemini' (no args) while claude/codex still use '<agent> login'.
- Log which auth method is used (API key vs subscription) in all 3 agent builders so users can see which path was taken. - Add missing await on expect(...).rejects.toThrow() in subscription auth tests (Claude, Gemini, Codex) so assertions are actually checked.
…subcommand) None of the supported CLIs (Claude, Gemini, Codex) have a dedicated 'login' sub-command. They all authenticate interactively when launched for the first time. Update AUTH_LOGIN_ARGS to use empty arrays for all agents and update spec docs accordingly.
Codex CLI requires 'codex login --device-auth' for headless Device-Code flow authentication, unlike Claude and Gemini which authenticate interactively on first launch with no extra args.
- SETUP_AUTH_IMAGE constant points to node:lts (lightweight, no build required) - Mount local scripts/run-agent.sh into the container (read-only) - run-agent.sh handles CLI installation via npm automatically - Users no longer need to build ts-bench-container just for authentication
…o exit Claude and Gemini CLIs enter interactive chat mode after successful login, so the user must Ctrl-C to exit — producing a non-zero exit code. Previously the sentinel was only written on exit code 0. Now we also check for the agent-specific credential file (e.g. .credentials.json for Claude, oauth_creds.json for Gemini) and write the sentinel if credentials are found, regardless of exit code.
Summary
Implements subscription-based authentication for Claude, Gemini, and Codex agents, allowing benchmarks to run without API keys when the user has a valid local login session. Auth state is persisted via Docker volumes at
~/.cache/ts-bench/auth/<agent>/.Key changes:
src/utils/docker.ts: NewcreateAuthCacheArgs(),resolveAuthCachePath(),hasAuthCache(),hasCredentialFile(), exportedAUTH_CACHE_AGENTS,AUTH_LOGIN_ARGS,AUTH_CREDENTIAL_FILES,AUTH_SENTINEL— mirrors existingcreateCliCacheArgs()patternsrc/utils/env.ts: NewtryAnyEnv()— non-throwing variant ofrequireAnyEnv()claude.ts,gemini.ts,codex.ts): API key fallback to auth cache with[auth]log output showing which method is used; only default/primary providers support subscription auth, explicit overrides (openrouter, dashscope, etc.) still require keysdocker-strategy.ts,v2-container.ts: Auth volume mounts replace hardcodedclaudeMount;V2ContainerManageracceptsagentNameconstructor paramsrc/index.ts: New--setup-auth <agent>command using lightweightnode:ltsimage (no need to buildts-bench-container); mounts localscripts/run-agent.shinto the container; usesAUTH_LOGIN_ARGSmapping for agent-specific login commands; writes.ts-bench-authsentinel on success or when credential files are detected (handles Ctrl-C exit from interactive CLIs)src/config/constants.ts: NewSETUP_AUTH_IMAGE = "node:lts"constantPrepareContext.agentNameplumbed through execution strategies and runners (agent.ts,test.ts)src/utils/agent-logger.ts:ClaudeLogCollectorreads logs from auth cache volume path in Docker modeAuth priority: API key env var → auth cache volume (sentinel check) → error suggesting
--setup-auth.Updates since last revision
hasAuthCache()would return false despite valid credentials existing. NowrunSetupAuth()also checks for agent-specific credential files (AUTH_CREDENTIAL_FILESmapping:.credentials.jsonfor Claude,oauth_creds.jsonfor Gemini,auth.jsonfor Codex) via newhasCredentialFile()helper. If credentials are found, the sentinel is written regardless of exit code.--setup-authfor Claude (Logged in as kazuya@flare.run), Gemini (Signed in with Google: kazuya@flare.run,Plan: Google AI Ultra for Business), and Codex (Successfully logged in) in the Devin session. All three agents installed via npm insidenode:lts, launched correctly, and completed OAuth flows.Review & Testing Checklist for Human
AUTH_CREDENTIAL_FILEShardcodes.credentials.json(Claude),oauth_creds.json(Gemini),auth.json(Codex). If any agent CLI changes its credential file naming in a future version, the sentinel-writing fallback will silently stop working. Verify these match the current CLI versions.--setup-authwithnode:lts— Runbun src/index.ts --setup-auth claudeand verify: (a)node:ltsimage is pulled/used (notts-bench-container), (b)run-agent.shis mounted and installs the agent CLI via npm, (c) agent launches and shows auth prompt, (d).ts-bench-authsentinel appears in~/.cache/ts-bench/auth/claude/after Ctrl-C exit, (e) subsequent--dockerrun without API key succeeds.run-agent.shmount path assumption —--setup-authresolves the script viaresolve(process.cwd(), 'scripts', 'run-agent.sh'). If the CLI is invoked from a directory other than the repo root, the mount will fail. Verify this is acceptable or if an absolute path fallback is needed.[auth]log output — With API key:[auth] <Agent>: using API key (...). After--setup-authwithout API key:[auth] <Agent>: using subscription auth (...).Suggested manual test plan:
ANTHROPIC_API_KEYset, runbun src/index.ts --agent claude --exercise acronym --docker→ should work as before, see[auth] Claude: using API key (ANTHROPIC_API_KEY)bun src/index.ts --setup-auth claude→ should pullnode:lts, mountrun-agent.sh, install + launch Claude CLI~/.cache/ts-bench/auth/claude/.ts-bench-authsentinel exists--dockerand without API key → should throw with--setup-authsuggestionNotes
hasAuthCachesentinel logic), and 4tryAnyEnvtests. TypeScript typecheck passes cleanly.AUTH_LOGIN_ARGS: Claude/Gemini →[](no login subcommand), Codex →['login', '--device-auth']. The fallback?? []inrunSetupAuth()only applies if a future agent is added toAUTH_CACHE_AGENTSbut notAUTH_LOGIN_ARGS.createAuthCacheArgs()creates the host auth cache directory as a side effect on every Docker run, even when subscription auth isn't used. Consistent with existingcreateCliCacheArgspattern..ts-bench-authsentinel persists indefinitely. If credentials are revoked server-side,hasAuthCachewill still returntrueand the agent will fail at runtime with a provider-specific error.--setup-authargument parsing uses rawprocess.argv.indexOfbefore the CLI parser.--setup-auth --helptreats--helpas the agent name (caught by supported-agents check, but error message could be confusing).--setup-authruns leave root-owned files in~/.cache/ts-bench/auth/, the subscription-auth unit tests may fail becausermSyncin the test teardown silently fails on root-owned files. Clean up withsudo rm -rf ~/.cache/ts-bench/auth/<agent>before running tests.Link to Devin session: https://app.devin.ai/sessions/514b15c4c58a4d94a8a03a366ff4f4f5
Requested by: @laiso