Custom AI SDK provider for using nexos.ai models with opencode.
Fixes compatibility issues when using Gemini, Claude, ChatGPT, Codex, and Codestral models through nexos.ai API in opencode:
- Gemini: appends missing
data: [DONE]SSE signal (prevents hanging), inlines$refin tool schemas (rejected by Vertex AI), fixesfinish_reasonfor tool calls (stop→tool_calls) - Claude: converts thinking params to snake_case (
budgetTokens→budget_tokens), fixesfinish_reasonin thinking mode (end_turn→stopandtool_use→tool_calls, prevents infinite retry loop), addscache_controlmarkers for prompt caching, stripstemperaturewhen thinking is enabled, stripstemperaturefor Opus 4.7 (nexos.ai routes Opus 4.7 requests withtemperatureto a guardrails backend where streaming tool calls are broken) - ChatGPT/GPT: strips
reasoning_effort: "none"only for legacy / non-reasoning models (GPT 4.x,Chat,Instant,oss— modern GPT 5.x accept"none"natively), stripstemperature: false(invalid value), strips temperature for non-Codex models (nexos.ai chat completions only supports default temperature; Codex models via Responses API support custom temperature) - Codex: transparently redirects requests to
/v1/responses(Responses API) — Codex models don't support/v1/chat/completions. Handles streaming, tool calls, reasoning effort, and cache token reporting. - Codestral: sets
strict: falsein tool definitions whenstrictisnull(Mistral API rejectsnullfor this field)
export NEXOS_API_KEY="your-nexos-api-key"Add the provider to your ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"nexos-ai": {
"npm": "@crazy-goat/nexos-provider",
"name": "Nexos AI",
"env": ["NEXOS_API_KEY"],
"options": {
"baseURL": "https://api.nexos.ai/v1/",
"timeout": 300000
},
"models": {
"Gemini 2.5 Pro": {
"name": "Gemini 2.5 Pro",
"limit": { "context": 128000, "output": 64000 }
},
"Claude Sonnet 4.5": {
"name": "Claude Sonnet 4.5",
"limit": { "context": 200000, "output": 16000 },
"options": {
"thinking": { "type": "enabled", "budgetTokens": 1024 }
},
"variants": {
"thinking-high": { "thinking": { "type": "enabled", "budgetTokens": 10000 } },
"no-thinking": { "thinking": { "type": "disabled" } }
}
},
"GPT 5": {
"name": "GPT 5",
"limit": { "context": 400000, "output": 128000 },
"options": { "reasoningEffort": "medium" },
"variants": {
"high": { "reasoningEffort": "high" },
"no-reasoning": { "reasoningEffort": "none" }
}
}
}
}
}
}Tip: You can automatically generate the config with all available nexos.ai models using opencode-nexos-models-config.
Warning: Gemini 3 models (Flash Preview, Pro Preview) do not work with tool calling through nexos.ai — see known-bugs/gemini3-tools for details.
Simple prompt:
opencode run "hello" -m "nexos-ai/Gemini 2.5 Pro"With tool calling:
opencode run "list files in current directory" -m "nexos-ai/Gemini 2.5 Pro"Claude with thinking:
opencode run "what is 2+2?" -m "nexos-ai/Claude Sonnet 4.5" --variant thinking-highGPT with reasoning effort:
opencode run "what is 2+2?" -m "nexos-ai/GPT 5" --variant highOr select the model interactively in opencode with Ctrl+X M.
opencode caches the provider in ~/.cache/opencode/. To force an update to the latest version:
rm -rf ~/.cache/opencode/node_modules/@crazy-goatThe next time you run opencode, it will download the latest version from npm.
The provider exports createNexosAI which creates a standard AI SDK provider with a custom fetch wrapper. Per-provider fixes are in separate modules:
opencode → createNexosAI → fetch wrapper → nexos.ai API
│
├─ fix-gemini.mjs: $ref inlining, finish_reason fix
├─ fix-claude.mjs: thinking params, end_turn/tool_use → stop/tool_calls
├─ fix-chatgpt.mjs: strips reasoning_effort:"none" for legacy models
├─ fix-codex.mjs: chat completions → Responses API
└─ fix-codestral.mjs: strict:null→false in tools
Test with a simple prompt:
opencode run "what is 2+2?" -m "nexos-ai/Gemini 2.5 Pro"
opencode run "what is 2+2?" -m "nexos-ai/Gemini 2.5 Flash"
opencode run "what is 2+2?" -m "nexos-ai/Claude Sonnet 4.5"
opencode run "what is 2+2?" -m "nexos-ai/GPT 5"Test tool calling:
opencode run "list files in current directory" -m "nexos-ai/Gemini 2.5 Pro"
opencode run "list files in current directory" -m "nexos-ai/Claude Sonnet 4.5"
opencode run "list files in current directory" -m "nexos-ai/GPT 5"
opencode run "list files in current directory" -m "nexos-ai/GPT 5.3 Codex"Test thinking/reasoning variants:
opencode run "what is 2+2?" -m "nexos-ai/Claude Sonnet 4.5" --variant thinking-high
opencode run "what is 2+2?" -m "nexos-ai/Gemini 2.5 Pro" --variant thinking-high
opencode run "what is 2+2?" -m "nexos-ai/GPT 5" --variant high
opencode run "what is 2+2?" -m "nexos-ai/GPT 5.3 Codex" --variant highRun check-models/check-all.mjs to test all available models for simple prompts and tool calling:
node check-models/check-all.mjsTest a single model:
node check-models/check-all.mjs "GPT 4.1"Results are saved to check-models/checks.md — see current compatibility status there.
The known-bugs/ directory documents every API quirk the provider works around, one folder per issue. Each folder has a README and, where empirical reproduction adds value, a test script.
- claude-prompt-caching —
cache_controlmarker strategy (4 breakpoints: system, tools, latest user, previous user) + break-even math and real-session savings. - claude-finish-reason-end-turn — In thinking mode, Claude leaks
end_turn(natural end) andtool_use(tool call end) where opencode expectsstop/tool_calls. Without the rewrites, opencode retries indefinitely on every thinking-mode turn. - claude-thinking-params —
budgetTokens→budget_tokens(snake_case), bumpmax_tokenswhen budget exceeds it, striptemperaturewhile thinking is enabled. (Historical:thinking: {type: "disabled"}stripping — upstream now accepts it, fix is a pass-through.) - claude-opus-47-temperature — Opus 4.7 with any
temperatureroutes to a guardrails backend where streaming tool calls are broken. Provider stripstemperaturefor Opus 4.7. - claude-sonnet-46-cache — Sonnet 4.6 on vertex-ai invalidates cache when
cache_controlis on user messages; also a higher minimum token threshold than documented. - claude-cached-tokens-reporting — Opus models only report cache via
prompt_tokens_details.cached_tokens; provider sums it intoprompt_tokensfor opencode's usage display.
- gemini-schema-restrictions — Vertex AI rejects many JSON Schema keywords (
$ref,exclusiveMinimum,patternProperties,if/then/else,not,$schema, etc.). Provider inlines refs and strips the rest. - gemini-stream-format — Four stream-format issues bundled: missing
[DONE]sentinel, uppercaseSTOP,stopinstead oftool_callsfor tool use,content_blocks[].delta.thinkinginstead ofreasoning_content. - gemini3-tools — Gemini 3 / 3.1 reject multi-turn tool-use replays because nexos.ai does not propagate
thought_signature. Provider rewrites history into plain alternating turns.
- gpt-chat-completions-limits — Legacy / non-reasoning GPT models (GPT 4.x,
Chat,Instant,oss) rejectreasoning_effort: "none"; modern GPT 5.x accept it. Plustemperature: falseand customtemperatureare rejected for all non-Codex GPT models. - codex-responses-api — Codex models require
/v1/responses, not/v1/chat/completions. Provider redirects the URL and converts both directions (request schema + SSE stream + usage).
- codestral-strict-null — Mistral API rejects
strict: nullin tool function definitions. Provider coercesnull→false.
- kimi-fireworks-stream — Kimi and GLM on fireworks-ai stream without
data: [DONE]orusagechunk. Provider'sTransformStreamsynthesizes both on flush while preserving progressive streaming.
- token-caching — Prefix caching matrix across Gemini / Claude / GPT. Gemini implicit caching only matches identical requests (no prefix match); explicit
cachedContentsAPI is not exposed by nexos.ai. - thinking — Test harness for thinking / reasoning token reporting across models.
MIT