feat: implement subscription-based (API key-less) authentication for agents by devin-ai-integration[bot] · Pull Request #90 · laiso/ts-bench

devin-ai-integration · 2026-04-02T07:37:10Z

Summary

Implements subscription-based authentication for Claude, Gemini, and Codex agents, allowing benchmarks to run without API keys when the user has a valid local login session. Auth state is persisted via Docker volumes at ~/.cache/ts-bench/auth/<agent>/.

Key changes:

src/utils/docker.ts: New createAuthCacheArgs(), resolveAuthCachePath(), hasAuthCache(), hasCredentialFile(), exported AUTH_CACHE_AGENTS, AUTH_LOGIN_ARGS, AUTH_CREDENTIAL_FILES, AUTH_SENTINEL — mirrors existing createCliCacheArgs() pattern
src/utils/env.ts: New tryAnyEnv() — non-throwing variant of requireAnyEnv()
Agent builders (claude.ts, gemini.ts, codex.ts): API key fallback to auth cache with [auth] log output showing which method is used; only default/primary providers support subscription auth, explicit overrides (openrouter, dashscope, etc.) still require keys
docker-strategy.ts, v2-container.ts: Auth volume mounts replace hardcoded claudeMount; V2ContainerManager accepts agentName constructor param
src/index.ts: New --setup-auth <agent> command using lightweight node:lts image (no need to build ts-bench-container); mounts local scripts/run-agent.sh into the container; uses AUTH_LOGIN_ARGS mapping for agent-specific login commands; writes .ts-bench-auth sentinel on success or when credential files are detected (handles Ctrl-C exit from interactive CLIs)
src/config/constants.ts: New SETUP_AUTH_IMAGE = "node:lts" constant
PrepareContext.agentName plumbed through execution strategies and runners (agent.ts, test.ts)
src/utils/agent-logger.ts: ClaudeLogCollector reads logs from auth cache volume path in Docker mode
Spec document + environment.md update

Auth priority: API key env var → auth cache volume (sentinel check) → error suggesting --setup-auth.

Updates since last revision

Credential-file-based sentinel writing: Claude and Gemini CLIs enter interactive chat mode after a successful login, so the user must Ctrl-C to exit — producing a non-zero exit code. Previously the sentinel was only written on exit code 0, meaning hasAuthCache() would return false despite valid credentials existing. Now runSetupAuth() also checks for agent-specific credential files (AUTH_CREDENTIAL_FILES mapping: .credentials.json for Claude, oauth_creds.json for Gemini, auth.json for Codex) via new hasCredentialFile() helper. If credentials are found, the sentinel is written regardless of exit code.
End-to-end testing of all 3 agents: Successfully tested --setup-auth for Claude (Logged in as kazuya@flare.run), Gemini (Signed in with Google: kazuya@flare.run, Plan: Google AI Ultra for Business), and Codex (Successfully logged in) in the Devin session. All three agents installed via npm inside node:lts, launched correctly, and completed OAuth flows.

Review & Testing Checklist for Human

Credential file name assumptions — AUTH_CREDENTIAL_FILES hardcodes .credentials.json (Claude), oauth_creds.json (Gemini), auth.json (Codex). If any agent CLI changes its credential file naming in a future version, the sentinel-writing fallback will silently stop working. Verify these match the current CLI versions.
End-to-end --setup-auth with node:lts — Run bun src/index.ts --setup-auth claude and verify: (a) node:lts image is pulled/used (not ts-bench-container), (b) run-agent.sh is mounted and installs the agent CLI via npm, (c) agent launches and shows auth prompt, (d) .ts-bench-auth sentinel appears in ~/.cache/ts-bench/auth/claude/ after Ctrl-C exit, (e) subsequent --docker run without API key succeeds.
run-agent.sh mount path assumption — --setup-auth resolves the script via resolve(process.cwd(), 'scripts', 'run-agent.sh'). If the CLI is invoked from a directory other than the repo root, the mount will fail. Verify this is acceptable or if an absolute path fallback is needed.
Verify [auth] log output — With API key: [auth] <Agent>: using API key (...). After --setup-auth without API key: [auth] <Agent>: using subscription auth (...).
Regression: existing API-key-based flows unaffected — Run a normal benchmark with API key set.

Suggested manual test plan:

With ANTHROPIC_API_KEY set, run bun src/index.ts --agent claude --exercise acronym --docker → should work as before, see [auth] Claude: using API key (ANTHROPIC_API_KEY)
Run bun src/index.ts --setup-auth claude → should pull node:lts, mount run-agent.sh, install + launch Claude CLI
After login, Ctrl-C to exit → sentinel should still be written (credential file detected)
Run without API key → should use subscription auth
Verify ~/.cache/ts-bench/auth/claude/.ts-bench-auth sentinel exists
Run without --docker and without API key → should throw with --setup-auth suggestion

Notes

All 111 unit tests pass, including 9 subscription-auth tests, 12 docker utility tests (4 for hasAuthCache sentinel logic), and 4 tryAnyEnv tests. TypeScript typecheck passes cleanly.
AUTH_LOGIN_ARGS: Claude/Gemini → [] (no login subcommand), Codex → ['login', '--device-auth']. The fallback ?? [] in runSetupAuth() only applies if a future agent is added to AUTH_CACHE_AGENTS but not AUTH_LOGIN_ARGS.
macOS credential storage: Agent CLIs typically use the system Keychain on macOS, but inside Docker (Linux) they fall back to file-based storage. This is why Approach B (container-local auth) works — credentials are written to files that persist in the volume.
createAuthCacheArgs() creates the host auth cache directory as a side effect on every Docker run, even when subscription auth isn't used. Consistent with existing createCliCacheArgs pattern.
The .ts-bench-auth sentinel persists indefinitely. If credentials are revoked server-side, hasAuthCache will still return true and the agent will fail at runtime with a provider-specific error.
--setup-auth argument parsing uses raw process.argv.indexOf before the CLI parser. --setup-auth --help treats --help as the agent name (caught by supported-agents check, but error message could be confusing).
Test suite caveat: If real --setup-auth runs leave root-owned files in ~/.cache/ts-bench/auth/, the subscription-auth unit tests may fail because rmSync in the test teardown silently fails on root-owned files. Clean up with sudo rm -rf ~/.cache/ts-bench/auth/<agent> before running tests.
Closes Support Subscription-based (API Key-less) Authentication for Agents #59.

Link to Devin session: https://app.devin.ai/sessions/514b15c4c58a4d94a8a03a366ff4f4f5
Requested by: @laiso

…agents Implement all 7 tasks from the subscription-auth spec: 1. Add createAuthCacheArgs(), resolveAuthCachePath(), hasAuthCache() to src/utils/docker.ts 2. Add tryAnyEnv() to src/utils/env.ts; make API keys optional in claude/gemini/codex builders 3. Apply auth mounts in docker-strategy.ts for both v1 and v2 execution paths 4. Apply auth mounts in v2-container.ts (replace hardcoded claudeMount) 5. Implement --setup-auth <agent> CLI command for interactive Docker login 6. Update environment.md with subscription auth documentation 7. Add unit tests for new functions and subscription auth behavior Auth priority: API key > auth cache volume > error with --setup-auth suggestion. GitHub Actions unaffected (always uses API key secrets). Closes #59

devin-ai-integration · 2026-04-02T07:37:13Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

…duplicate agent list, remove unused import

…unner

devin-ai-integration · 2026-04-02T07:48:22Z

Addressed both Devin Review findings in 3563b38:

Subscription auth fallback now guarded by useDocker — All three agent builders (claude.ts, gemini.ts, codex.ts) now check this.config.useDocker && hasAuthCache(agent) before falling back to subscription auth. In native mode, the builder always requires an explicit API key and shows the helpful error message.
TestRunner now passes agentName in PrepareContext — src/runners/test.ts now includes agentName: config.agent so the correct auth cache is mounted for non-claude agents in test containers.

devin-ai-integration · 2026-04-02T07:53:35Z

Fixed in aa2fdd2: ClaudeLogCollector now uses resolveAuthCachePath('claude') when config.useDocker is true, so it reads JSONL logs from ~/.cache/ts-bench/auth/claude/projects/ (the auth cache volume host path) instead of ~/.claude/projects/. Native mode still reads from ~/.claude/ as before.

…log files

devin-ai-integration · 2026-04-02T08:04:36Z

Fixed in 1e2fa9d: hasAuthCache() now checks for a .ts-bench-auth sentinel file instead of "any file in the directory". The --setup-auth command writes this sentinel after a successful login. This prevents false positives from Claude's conversation log files (projects/ subdirectory) that persist after API-key-based runs.

New test added: "returns false when dir has files but no sentinel" confirms that having credentials.json alone (or log files) does not trigger the auth cache fallback.

Gemini CLI does not have a 'login' sub-command — it authenticates interactively on first launch. Add AUTH_LOGIN_ARGS mapping so --setup-auth gemini runs 'gemini' (no args) while claude/codex still use '<agent> login'.

- Log which auth method is used (API key vs subscription) in all 3 agent builders so users can see which path was taken. - Add missing await on expect(...).rejects.toThrow() in subscription auth tests (Claude, Gemini, Codex) so assertions are actually checked.

…subcommand) None of the supported CLIs (Claude, Gemini, Codex) have a dedicated 'login' sub-command. They all authenticate interactively when launched for the first time. Update AUTH_LOGIN_ARGS to use empty arrays for all agents and update spec docs accordingly.

Codex CLI requires 'codex login --device-auth' for headless Device-Code flow authentication, unlike Claude and Gemini which authenticate interactively on first launch with no extra args.

- SETUP_AUTH_IMAGE constant points to node:lts (lightweight, no build required) - Mount local scripts/run-agent.sh into the container (read-only) - run-agent.sh handles CLI installation via npm automatically - Users no longer need to build ts-bench-container just for authentication

…o exit Claude and Gemini CLIs enter interactive chat mode after successful login, so the user must Ctrl-C to exit — producing a non-zero exit code. Previously the sentinel was only written on exit code 0. Now we also check for the agent-specific credential file (e.g. .credentials.json for Claude, oauth_creds.json for Gemini) and write the sentinel if credentials are found, regardless of exit code.

Resolve conflicts: combine env-file security (PR #91) with subscription-auth cache args (PR #90) and keep both test suites. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration bot assigned laiso Apr 2, 2026

fix: address review issues — pass agentName to V2ContainerManager, de…

2018aec

…duplicate agent list, remove unused import

This comment was marked as resolved.

Sign in to view

fix: guard hasAuthCache with useDocker check, pass agentName in TestR…

3563b38

…unner

This comment was marked as resolved.

Sign in to view

fix: update ClaudeLogCollector to use auth cache path in Docker mode

aa2fdd2

This comment was marked as resolved.

Sign in to view

fix: use sentinel file in hasAuthCache to avoid false positives from …

1e2fa9d

…log files

fix: use agent-specific login commands in --setup-auth

2b742f6

Gemini CLI does not have a 'login' sub-command — it authenticates interactively on first launch. Add AUTH_LOGIN_ARGS mapping so --setup-auth gemini runs 'gemini' (no args) while claude/codex still use '<agent> login'.

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot added 5 commits April 2, 2026 10:38

fix: codex uses 'codex login --device-auth' for setup-auth

573ea5c

Codex CLI requires 'codex login --device-auth' for headless Device-Code flow authentication, unlike Claude and Gemini which authenticate interactively on first launch with no extra args.

devin-ai-integration bot mentioned this pull request Apr 2, 2026

fix: v2 container killed prematurely + IS_SANDBOX for Claude Docker + subscription auth merge #92

Merged

4 tasks

laiso merged commit 0c52ea0 into main Apr 2, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement subscription-based (API key-less) authentication for agents#90

feat: implement subscription-based (API key-less) authentication for agents#90
laiso merged 11 commits intomainfrom
devin/1775114880-subscription-auth-spec

devin-ai-integration bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

devin-ai-integration bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Updates since last revision

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

🤖 Devin AI Engineer

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot commented Apr 2, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Apr 2, 2026 •

edited

Loading