Skip to content

feat: implement subscription-based (API key-less) authentication for agents#90

Merged
laiso merged 11 commits intomainfrom
devin/1775114880-subscription-auth-spec
Apr 2, 2026
Merged

feat: implement subscription-based (API key-less) authentication for agents#90
laiso merged 11 commits intomainfrom
devin/1775114880-subscription-auth-spec

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Apr 2, 2026

Summary

Implements subscription-based authentication for Claude, Gemini, and Codex agents, allowing benchmarks to run without API keys when the user has a valid local login session. Auth state is persisted via Docker volumes at ~/.cache/ts-bench/auth/<agent>/.

Key changes:

  • src/utils/docker.ts: New createAuthCacheArgs(), resolveAuthCachePath(), hasAuthCache(), hasCredentialFile(), exported AUTH_CACHE_AGENTS, AUTH_LOGIN_ARGS, AUTH_CREDENTIAL_FILES, AUTH_SENTINEL — mirrors existing createCliCacheArgs() pattern
  • src/utils/env.ts: New tryAnyEnv() — non-throwing variant of requireAnyEnv()
  • Agent builders (claude.ts, gemini.ts, codex.ts): API key fallback to auth cache with [auth] log output showing which method is used; only default/primary providers support subscription auth, explicit overrides (openrouter, dashscope, etc.) still require keys
  • docker-strategy.ts, v2-container.ts: Auth volume mounts replace hardcoded claudeMount; V2ContainerManager accepts agentName constructor param
  • src/index.ts: New --setup-auth <agent> command using lightweight node:lts image (no need to build ts-bench-container); mounts local scripts/run-agent.sh into the container; uses AUTH_LOGIN_ARGS mapping for agent-specific login commands; writes .ts-bench-auth sentinel on success or when credential files are detected (handles Ctrl-C exit from interactive CLIs)
  • src/config/constants.ts: New SETUP_AUTH_IMAGE = "node:lts" constant
  • PrepareContext.agentName plumbed through execution strategies and runners (agent.ts, test.ts)
  • src/utils/agent-logger.ts: ClaudeLogCollector reads logs from auth cache volume path in Docker mode
  • Spec document + environment.md update

Auth priority: API key env var → auth cache volume (sentinel check) → error suggesting --setup-auth.

Updates since last revision

  • Credential-file-based sentinel writing: Claude and Gemini CLIs enter interactive chat mode after a successful login, so the user must Ctrl-C to exit — producing a non-zero exit code. Previously the sentinel was only written on exit code 0, meaning hasAuthCache() would return false despite valid credentials existing. Now runSetupAuth() also checks for agent-specific credential files (AUTH_CREDENTIAL_FILES mapping: .credentials.json for Claude, oauth_creds.json for Gemini, auth.json for Codex) via new hasCredentialFile() helper. If credentials are found, the sentinel is written regardless of exit code.
  • End-to-end testing of all 3 agents: Successfully tested --setup-auth for Claude (Logged in as kazuya@flare.run), Gemini (Signed in with Google: kazuya@flare.run, Plan: Google AI Ultra for Business), and Codex (Successfully logged in) in the Devin session. All three agents installed via npm inside node:lts, launched correctly, and completed OAuth flows.

Review & Testing Checklist for Human

  • Credential file name assumptionsAUTH_CREDENTIAL_FILES hardcodes .credentials.json (Claude), oauth_creds.json (Gemini), auth.json (Codex). If any agent CLI changes its credential file naming in a future version, the sentinel-writing fallback will silently stop working. Verify these match the current CLI versions.
  • End-to-end --setup-auth with node:lts — Run bun src/index.ts --setup-auth claude and verify: (a) node:lts image is pulled/used (not ts-bench-container), (b) run-agent.sh is mounted and installs the agent CLI via npm, (c) agent launches and shows auth prompt, (d) .ts-bench-auth sentinel appears in ~/.cache/ts-bench/auth/claude/ after Ctrl-C exit, (e) subsequent --docker run without API key succeeds.
  • run-agent.sh mount path assumption--setup-auth resolves the script via resolve(process.cwd(), 'scripts', 'run-agent.sh'). If the CLI is invoked from a directory other than the repo root, the mount will fail. Verify this is acceptable or if an absolute path fallback is needed.
  • Verify [auth] log output — With API key: [auth] <Agent>: using API key (...). After --setup-auth without API key: [auth] <Agent>: using subscription auth (...).
  • Regression: existing API-key-based flows unaffected — Run a normal benchmark with API key set.

Suggested manual test plan:

  1. With ANTHROPIC_API_KEY set, run bun src/index.ts --agent claude --exercise acronym --docker → should work as before, see [auth] Claude: using API key (ANTHROPIC_API_KEY)
  2. Run bun src/index.ts --setup-auth claude → should pull node:lts, mount run-agent.sh, install + launch Claude CLI
  3. After login, Ctrl-C to exit → sentinel should still be written (credential file detected)
  4. Run without API key → should use subscription auth
  5. Verify ~/.cache/ts-bench/auth/claude/.ts-bench-auth sentinel exists
  6. Run without --docker and without API key → should throw with --setup-auth suggestion

Notes

  • All 111 unit tests pass, including 9 subscription-auth tests, 12 docker utility tests (4 for hasAuthCache sentinel logic), and 4 tryAnyEnv tests. TypeScript typecheck passes cleanly.
  • AUTH_LOGIN_ARGS: Claude/Gemini → [] (no login subcommand), Codex → ['login', '--device-auth']. The fallback ?? [] in runSetupAuth() only applies if a future agent is added to AUTH_CACHE_AGENTS but not AUTH_LOGIN_ARGS.
  • macOS credential storage: Agent CLIs typically use the system Keychain on macOS, but inside Docker (Linux) they fall back to file-based storage. This is why Approach B (container-local auth) works — credentials are written to files that persist in the volume.
  • createAuthCacheArgs() creates the host auth cache directory as a side effect on every Docker run, even when subscription auth isn't used. Consistent with existing createCliCacheArgs pattern.
  • The .ts-bench-auth sentinel persists indefinitely. If credentials are revoked server-side, hasAuthCache will still return true and the agent will fail at runtime with a provider-specific error.
  • --setup-auth argument parsing uses raw process.argv.indexOf before the CLI parser. --setup-auth --help treats --help as the agent name (caught by supported-agents check, but error message could be confusing).
  • Test suite caveat: If real --setup-auth runs leave root-owned files in ~/.cache/ts-bench/auth/, the subscription-auth unit tests may fail because rmSync in the test teardown silently fails on root-owned files. Clean up with sudo rm -rf ~/.cache/ts-bench/auth/<agent> before running tests.
  • Closes Support Subscription-based (API Key-less) Authentication for Agents #59.

Link to Devin session: https://app.devin.ai/sessions/514b15c4c58a4d94a8a03a366ff4f4f5
Requested by: @laiso


Open with Devin

…agents

Implement all 7 tasks from the subscription-auth spec:

1. Add createAuthCacheArgs(), resolveAuthCachePath(), hasAuthCache() to src/utils/docker.ts
2. Add tryAnyEnv() to src/utils/env.ts; make API keys optional in claude/gemini/codex builders
3. Apply auth mounts in docker-strategy.ts for both v1 and v2 execution paths
4. Apply auth mounts in v2-container.ts (replace hardcoded claudeMount)
5. Implement --setup-auth <agent> CLI command for interactive Docker login
6. Update environment.md with subscription auth documentation
7. Add unit tests for new functions and subscription auth behavior

Auth priority: API key > auth cache volume > error with --setup-auth suggestion.
GitHub Actions unaffected (always uses API key secrets).

Closes #59
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration[bot]

This comment was marked as resolved.

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Addressed both Devin Review findings in 3563b38:

  1. Subscription auth fallback now guarded by useDocker — All three agent builders (claude.ts, gemini.ts, codex.ts) now check this.config.useDocker && hasAuthCache(agent) before falling back to subscription auth. In native mode, the builder always requires an explicit API key and shows the helpful error message.

  2. TestRunner now passes agentName in PrepareContextsrc/runners/test.ts now includes agentName: config.agent so the correct auth cache is mounted for non-claude agents in test containers.

devin-ai-integration[bot]

This comment was marked as resolved.

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Fixed in aa2fdd2: ClaudeLogCollector now uses resolveAuthCachePath('claude') when config.useDocker is true, so it reads JSONL logs from ~/.cache/ts-bench/auth/claude/projects/ (the auth cache volume host path) instead of ~/.claude/projects/. Native mode still reads from ~/.claude/ as before.

devin-ai-integration[bot]

This comment was marked as resolved.

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Fixed in 1e2fa9d: hasAuthCache() now checks for a .ts-bench-auth sentinel file instead of "any file in the directory". The --setup-auth command writes this sentinel after a successful login. This prevents false positives from Claude's conversation log files (projects/ subdirectory) that persist after API-key-based runs.

New test added: "returns false when dir has files but no sentinel" confirms that having credentials.json alone (or log files) does not trigger the auth cache fallback.

Gemini CLI does not have a 'login' sub-command — it authenticates
interactively on first launch.  Add AUTH_LOGIN_ARGS mapping so
--setup-auth gemini runs 'gemini' (no args) while claude/codex
still use '<agent> login'.
devin-ai-integration[bot]

This comment was marked as resolved.

- Log which auth method is used (API key vs subscription) in all 3
  agent builders so users can see which path was taken.
- Add missing await on expect(...).rejects.toThrow() in subscription
  auth tests (Claude, Gemini, Codex) so assertions are actually checked.
…subcommand)

None of the supported CLIs (Claude, Gemini, Codex) have a dedicated
'login' sub-command.  They all authenticate interactively when launched
for the first time.  Update AUTH_LOGIN_ARGS to use empty arrays for
all agents and update spec docs accordingly.
Codex CLI requires 'codex login --device-auth' for headless
Device-Code flow authentication, unlike Claude and Gemini which
authenticate interactively on first launch with no extra args.
- SETUP_AUTH_IMAGE constant points to node:lts (lightweight, no build required)
- Mount local scripts/run-agent.sh into the container (read-only)
- run-agent.sh handles CLI installation via npm automatically
- Users no longer need to build ts-bench-container just for authentication
…o exit

Claude and Gemini CLIs enter interactive chat mode after successful
login, so the user must Ctrl-C to exit — producing a non-zero exit
code.  Previously the sentinel was only written on exit code 0.

Now we also check for the agent-specific credential file (e.g.
.credentials.json for Claude, oauth_creds.json for Gemini) and write
the sentinel if credentials are found, regardless of exit code.
@laiso laiso merged commit 0c52ea0 into main Apr 2, 2026
1 check passed
laiso added a commit that referenced this pull request Apr 2, 2026
Resolve conflicts: combine env-file security (PR #91) with
subscription-auth cache args (PR #90) and keep both test suites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Subscription-based (API Key-less) Authentication for Agents

1 participant