- Base path:
/api/v1 - Endpoint:
POST /api/v1/chat/completions(OpenAI-compatible) - Purpose: Route chat requests to configured LLM providers with optional streaming and persistence.
- Scope note: Chat Dictionaries and the Document Generator are implemented as sub-routes under
/api/v1/chat, but documented in Chatbook features. See./Chatbook_Features_API_Documentation.md. - OpenAPI tags:
chat,chat-grammars,chat-dictionaries,chat-documents
Conversation list/search, lifecycle updates, message trees, analytics, and knowledge-save endpoints live under /api/v1/chat. Session CRUD for character chats remains under /api/v1/chats.
Endpoints:
GET /api/v1/chat/conversations— list/search conversations with filters and ranking (order_by=bm25|recency|hybrid|topic).PATCH /api/v1/chat/conversations/{id}— update state/topic/keywords with optimistic locking (versionin body).GET /api/v1/chat/conversations/{id}/tree— root-thread tree view withmax_depth+ truncation.GET /api/v1/chat/analytics— UTC histogram buckets by date/topic/state.POST /api/v1/chat/knowledge/save— save a snippet to Notes/Flashcards with backlinks.GET /api/v1/chat/grammars— list saved user-scoped GBNF grammars for llama.cpp.POST /api/v1/chat/grammars— create a saved user-scoped GBNF grammar.GET /api/v1/chat/grammars/{grammar_id}— fetch one saved grammar.PATCH /api/v1/chat/grammars/{grammar_id}— update grammar text or metadata.DELETE /api/v1/chat/grammars/{grammar_id}— soft-delete by default; usehard_delete=trueto permanently remove it.
Note:
/api/v1/chatscontinues to serve character chat session CRUD and exports.- Alias:
/api/v1/chats/conversationsmaps to the conversation list/update/tree endpoints above.
Parameter glossary:
query: full-text search term applied to conversation title.state: conversation lifecycle state (in-progress,resolved,backlog,non-viable).topic_label: exact topic label match; append*for prefix search.keywords: repeatable query parameter; all values must match (AND).order_by:bm25(text relevance),recency(last_modified),hybrid(weighted blend),topic(alphabetical topic).start_date/end_date: ISO-8601 range bounds for analytics.bucket_granularity:dayorweekfor analytics buckets.
- Single-user:
X-API-KEY: <key> - Multi-user:
Authorization: Bearer <JWT> - Backward-compatible header alias:
Token: Bearer <JWT>is accepted. - Standard limits apply; heavy operations (streaming, tool calls) count toward per-user RPM/TPM.
- If authentication is required and missing/invalid, the endpoint returns
401.
Slash commands:
- When enabled, user messages starting with
/commandare intercepted and processed by the Chat command router before reaching the LLM. - Global enable/disable:
- Env:
CHAT_COMMANDS_ENABLED=1 - Config:
[Chat-Commands] commands_enabled = trueinConfig_Files/config.txt
- Env:
- Per-request behavior can be adjusted via
slash_command_injection_modeon the request (system,preface, orreplace).
Follows OpenAI-style chat payload with extensions.
Key fields:
model(string): Target model. May be prefixed asprovider/model(e.g.,anthropic/claude-opus-4-20250514).messages(array): Conversation turns. Supports rolessystem,user,assistant,tool.- User message
contentmay be a string or a list of parts: text and base64 data URIimage_urlusingdata:image/...;base64,...only (HTTP/HTTPS image URLs are not accepted).
- User message
stream(bool): If true, returns Server-Sent Events (SSE) for streaming.api_provider(string, optional): Overrides provider selection. Server default used if omitted.prompt_template_name(string, optional): Apply a named prompt template (alphanumeric,_,-).- Conversation history controls (optional):
history_message_limit(int): How many past messages to load (default set by server; see Chat-Module config).history_message_order(asc|desc): Oldest-first vs newest-first ordering when loading history.
- Common sampling params (provider-dependent):
temperature,top_p,max_tokens,n,frequency_penalty,presence_penalty,logprobs,top_logprobs,logit_bias. - Tools:
tools,tool_choice(provider-dependent tool/function calling).tool_choicerequirestoolsor the request is rejected. response_format:{ "type": "text" | "json_object" }(provider-dependent).- Chat extensions:
character_id,conversation_id(context hooks),save_to_db(persistence toggle). - Continuation controls (tldw extension):
tldw_continuation(optional).- Shape:
from_message_id(string, required): anchor message ID (max 128 chars).mode(branch|append, required):branch: rebuilds history from anchor ancestry (root -> ... -> anchor), excluding sibling/descendant branches past anchor.append: anchor must be the current conversation tip; otherwise request fails with409.
assistant_prefill(string, optional): assistant prefix injected before generation (provider behavior may vary).
- Requirements:
conversation_idis required whentldw_continuationis present.conversation_idmust reference an existing conversation.
- Persistence behavior:
- When
save_to_db=true, the generated assistant message is saved withparent_message_id=<from_message_id>. assistant_prefillis context-only and is not saved as a separate message turn.
- When
- Shape:
Provider-specific extensions:
- Bedrock guardrails:
extra_headers: include Bedrock guardrail headers likeX-Amzn-Bedrock-GuardrailIdentifier,X-Amzn-Bedrock-GuardrailVersion, optionalX-Amzn-Bedrock-Trace.extra_body: includeamazon-bedrock-guardrailConfigobject when needed.- Merge behavior:
extra_headers/extra_bodyare additive; explicit headers/body keys in the request win on conflicts.
- llama.cpp advanced controls (
/api/v1/chat/completionsonly in v1):thinking_budget_tokens(int, optional): app-level thinking budget. Only accepted when the resolved provider is llama.cpp and the deployment has a configured mapping for the upstream request key.grammar_mode(none|library|inline, optional): selects how the outbound GBNF grammar is resolved.grammar_id(string, optional): required whengrammar_mode=library.grammar_inline(string, optional): required whengrammar_mode=inline.grammar_override(string, optional): optional request-only override when using a saved grammar.- Guardrails:
- These fields are rejected with
400if the resolved provider is not llama.cpp. - These fields are rejected with
400whenstrict_openai_compatis active for the local-provider runtime. - First-class llama.cpp controls override reserved
extra_bodykeys such asgrammarand the configured thinking-budget request key.
- These fields are rejected with
- Scope boundary:
- v1 support is limited to
POST /api/v1/chat/completions. /api/v1/messagesdoes not yet accept these first-class llama.cpp fields.
- v1 support is limited to
Saved grammar resource notes:
- Grammars are user-scoped and stored in the chat domain.
- Grammar records expose
validation_statusasunchecked | valid | invalid. DELETE /api/v1/chat/grammars/{grammar_id}soft-deletes unlesshard_delete=trueis sent.
Minimal example (non-streaming):
curl -s -X POST http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-KEY: $API_KEY" \
-d '{
"model": "openai/gpt-4o",
"messages": [{"role":"user","content":"Hello!"}]
}'Streaming example (SSE):
curl -N -X POST http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-KEY: $API_KEY" \
-d '{
"model": "anthropic/claude-opus-4-20250514",
"messages": [{"role":"user","content":"Stream this response."}],
"stream": true
}'JSON mode example (response_format=json_object):
curl -s -X POST http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-KEY: $API_KEY" \
-d '{
"model": "openai/gpt-4o",
"response_format": {"type": "json_object"},
"messages": [
{"role":"system","content":"Return valid JSON only."},
{"role":"user","content":"Summarize tldw_server with fields: summary, keywords[]"}
]
}'Tools example (function calling):
curl -s -X POST http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-KEY: $API_KEY" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{"role": "user", "content": "What\'s the weather in Paris?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather by city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}'Continuation example (branch):
curl -s -X POST http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-KEY: $API_KEY" \
-d '{
"model": "openai/gpt-4o",
"conversation_id": "conv_123",
"save_to_db": true,
"messages": [{"role":"user","content":"Continue from this point."}],
"tldw_continuation": {
"from_message_id": "msg_anchor_456",
"mode": "branch"
}
}'Continuation example (append + assistant prefill):
curl -s -X POST http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-KEY: $API_KEY" \
-d '{
"model": "openai/gpt-4o",
"conversation_id": "conv_123",
"messages": [{"role":"user","content":"Refine the draft."}],
"tldw_continuation": {
"from_message_id": "msg_latest_tip",
"mode": "append",
"assistant_prefill": "Draft: "
}
}'llama.cpp grammar example (inline):
curl -s -X POST http://127.0.0.1:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-KEY: $API_KEY" \
-d '{
"api_provider": "llama.cpp",
"model": "llama.cpp/local-model",
"messages": [{"role":"user","content":"Reply with ok only."}],
"grammar_mode": "inline",
"grammar_inline": "root ::= \"ok\""
}'- If
modelincludes a provider prefix (provider/model), that provider is used unlessapi_provideris explicitly set. - If no provider is specified, the server uses
DEFAULT_LLM_PROVIDER. - API key loading (precedence high → low):
- Environment variables (e.g.,
OPENAI_API_KEY,ANTHROPIC_API_KEY). - Dotenv files in project root or
Config_Files:.envand.ENV(both names supported, non-overriding by default). tldw_Server_API/Config_Files/config.txtunder[API](e.g.,openai_api_key=...).
- Environment variables (e.g.,
- Optional failover: when enabled via server config, the Chat module may fallback to a healthy provider on upstream errors (disabled by default for stability).
- Config key:
[Chat-Module] enable_provider_fallback = True(default:False)
- Config key:
- Media type:
text/event-stream. - Heartbeats: Sent periodically (default 30s) to keep connections alive.
- Idle timeout: Default 300s of inactivity ends the stream with an error event.
- Completion: The server emits a single
data: [DONE]at the end of a successful or error-shortened stream. Duplicate terminal markers are suppressed.
Config keys (Chat-Module):
streaming_idle_timeout_seconds(default 300)streaming_heartbeat_interval_seconds(default 30)
Event shapes (examples):
event: stream_start
data: {"conversation_id":"<id>","model":"<model>","timestamp":"<iso>"}
data: {"choices":[{"delta":{"content":"Hello"}}]}
: heartbeat 2025-01-01T00:00:30Z
data: [DONE]
Errors during streaming are emitted as SSE data: frames with an {"error": {"message": "..."}} payload; the server then terminates with a single data: [DONE].
When continuation is active, stream metadata payloads include tldw_continuation (for example in stream_start, chunk payload metadata, tool_results, stream_end, and stream error frames).
Note: Stream chunks follow OpenAI-style choices[].delta.content for maximum client compatibility.
| Event/Line | Shape/Fields | Notes |
|---|---|---|
event: stream_start |
data: { conversation_id, model, timestamp } |
Emitted once at start |
| Heartbeat | : heartbeat <ISO-8601> |
Comment line; no data: payload |
| Delta chunk | data: {"choices":[{"delta":{"content":"..."}}]} |
Repeats for each text delta |
| Error | data: {"error": {"message": "..."}} |
Emitted and stream terminates |
event: stream_end |
data: { conversation_id, success, timestamp } |
Emitted on graceful completion |
- Non-streaming JSON uses OpenAI’s
choicesshape and includestldw_conversation_idto help clients track state. - When continuation is applied, non-streaming responses include
tldw_continuation, for example:applied: truemode: "branch" | "append"from_message_id: "<anchor_id>"assistant_prefillandassistant_prefill_appliedwhen prefill was used
- Default behavior is ephemeral (no DB writes).
- Per-request opt-in: set
"save_to_db": trueto persist conversation/messages. - Server default can be toggled without client changes:
- Env:
CHAT_SAVE_DEFAULT=true(highest precedence) orDEFAULT_CHAT_SAVE=true - Config file (
Config_Files/config.txt):[Chat-Module] chat_save_default = True(ordefault_save_to_db = True) - Fallback legacy default:
[Auto-Save] save_character_chats
- Env:
- Stored content includes text and validated/decoded images. Invalid images are saved as placeholders to preserve turn continuity.
- Persistence guard: When
save_to_db=truebut no valid character/chat context is present (e.g., missingcharacter_id/conversation_id), the server safely disables persistence for that request and returns a normal response. A warning is logged; no partial/invalid writes occur.
| Setting / Source | Value / Example | Effect / Notes | Precedence |
|---|---|---|---|
| Request body | save_to_db: true |
Persist this request’s conversation/messages | Highest |
| Env var | CHAT_SAVE_DEFAULT=true |
Default persistence for requests | High |
| Env var (legacy) | DEFAULT_CHAT_SAVE=true |
Default persistence for requests | High |
Config file [Chat-Module] |
chat_save_default = True |
Default persistence (preferred key) | Medium |
Config file [Chat-Module] (legacy) |
default_save_to_db = True |
Default persistence (legacy compatibility) | Medium |
| Fallback legacy | [Auto-Save] save_character_chats |
Used only if above unset | Low |
| Response (non-stream) | tldw_conversation_id |
Returned in JSON to help clients retain context | - |
| Response (stream) | event: stream_start |
Includes conversation_id at stream start |
- |
- Images: Accepts
image/png,image/jpeg,image/webp. Images must be supplied as base64data:image/...;base64,...URIs; external HTTP/HTTPS image URLs are not supported for chat messages. Default max base64 payload ≈ 3MB. - Messages: Default max messages per request: 1000.
- Text: Default per-message text limit: 400,000 characters.
- Images per request: Default max: 10.
- Oversized or invalid payloads return
400/413with details.
Image message example:
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,...."}}
]
}400Invalid request (schema, limits, bad params)401Missing/invalid authentication404Resource not found (e.g., invalid character reference, continuation anchor not found, or anchor not in the requested conversation)409Conflict while persisting or continuation constraint conflict (for exampleappendanchor is not latest message)413Request payload too large (e.g., too many messages/images, text too long)429Rate limit exceeded (endpoint or upstream)500Internal server error (unexpected failure)502/504Upstream provider error/timeout503Service/configuration issue (e.g., missing API key, busy queue)
- HTTP status is usually
200for successful connection establishment. - Provider or validation errors are surfaced as SSE frames:
data: {"error": {"message": "...", "type": "<ErrorClass>"}}- followed by
data: [DONE]
- Catastrophic failures before streaming starts (e.g., auth, grossly invalid body) still return HTTP errors as above.
Configurable under [Chat-Module] in Config_Files/config.txt:
rate_limit_per_minuterate_limit_per_user_per_minuterate_limit_per_conversation_per_minuterate_limit_tokens_per_minuteWhen exceeded, the endpoint returns429.
Queued execution (optional):
- Enable job-queue processing for chat calls to smooth bursts.
- Env:
CHAT_QUEUED_EXECUTION=1or config[Chat-Module] queued_execution = True - Related settings when enabled:
max_queue_size,max_concurrent_requests
- Metrics: Tracks request size, LLM latency, streaming chunks/heartbeats, DB transactions, and image processing.
- Audit: When enabled, logs API request metadata (user_id, request_id, model/provider, streaming) via the unified audit service.
- Logging: The server never logs API keys by default. For troubleshooting in non-production environments, you can enable masked key logging by setting
ALLOW_MASKED_KEY_LOG=true. When enabled, logs may include a masked form of the key (first/last 4 chars). Do not enable in production.
Image metrics now track per-image sizes when multiple images are included in a single user message.
- Endpoints (read-only operational state):
GET /api/v1/chat/queue/status- Queue size, concurrency, processed/rejected countsGET /api/v1/chat/queue/activity?limit=50- Recent processed job summaries (most recent last)
- RBAC: Requires permission
system.logsviaAuthPrincipalclaims (applies to both single-user and multi-user profiles). - Intended for administrators/operations; avoid exposing in multi-tenant environments without RBAC.
- Location: Next.js WebUI (
apps/tldw-frontend) → Chat page. - Persistence: “Save to DB” checkbox uses server defaults.
- Providers/models: Dropdowns reflect configured providers and models.
- Chatbook features (Dictionaries, Document Generator, Import/Export):
./Chatbook_Features_API_Documentation.md - Character chat sessions API: see
./API_Design.md(character chat endpoints overview)
Supporting endpoints for discovering providers and models:
GET /api/v1/llm/providers- Configured providers and modelsGET /api/v1/llm/providers/{provider}- Details for a specific providerGET /api/v1/llm/models- Flat list of<provider>/<model>values (includesimage/<backend>entries)GET /api/v1/llm/models/metadata- Flattened model capability metadata (includestype=imageentries)- Use filters like
?type=chator?output_modality=textto keep chat-only lists.
- Use filters like
- Scope: Optional integration tests for supported providers (OpenAI, Anthropic, Cohere, DeepSeek, Google, Groq, Qwen, HuggingFace, Mistral, Bedrock, OpenRouter) and local backends (llama.cpp, Kobold, Ollama, Oobabooga, TabbyAPI, vLLM). Disabled by default to avoid accidental network calls. The exact set is determined at runtime from configuration.
- Opt-in flag: Set
RUN_COMMERCIAL_CHAT_TESTS=truein your environment or.env. - Keys: Provide real API keys via env,
.env/.ENV(repo root ortldw_Server_API/Config_Files/), orConfig_Files/config.txt[API]entries. Mock/test keys (e.g.,sk-mock...,test-...) are ignored by the tests. - Network: Ensure outbound network access when running these tests.
Quick key sanity check (no secrets printed):
from tldw_Server_API.app.api.v1.schemas.chat_request_schemas import get_api_keys
keys = get_api_keys()
k = keys.get('openai') or ''
print({'openai_present': bool(k), 'length': len(k), 'masked': (k[:4]+'...'+k[-4:]) if k else None})Run all commercial integration tests (Chat only):
export RUN_COMMERCIAL_CHAT_TESTS=true
export OPENAI_API_KEY="<real-openai-key>" # plus others as needed
python -m pytest tldw_Server_API/tests/Chat -m "integration and external_api" -vTarget a specific OpenAI templating test:
python -m pytest tldw_Server_API/tests/Chat/test_chat_completions_integration.py::test_commercial_provider_with_template_and_char_data_openai_integration -vNotes:
- Streaming test in this file is currently marked
@pytest.mark.skipdue to TestClient SSE limitations; unit tests cover streaming, and you can verify manually withcurl -N. - The providers list is dynamically filtered at runtime; tests are skipped if no eligible provider has a usable key.
- Provider failover is disabled by default for production stability (can be enabled in
[Chat-Module]). - Images in chat messages must be base64 data URIs within
image_url.url(PNG, JPEG, WEBP). - The API returns
tldw_conversation_idin non-streaming responses to let clients maintain context.
- Keys not detected for a provider (e.g., OpenAI): verify env and dotenv files.
- Check presence via
GET /api/v1/llm/providers- the provider appears only when a usable key/base URL is configured. - The loader reads
.env/.ENVfrom project root andtldw_Server_API/Config_Files/, plus[API]keys inconfig.txt.
- Check presence via
- Quick Python sanity check (no secrets printed):
from tldw_Server_API.app.api.v1.schemas.chat_request_schemas import get_api_keys keys = get_api_keys() k = keys.get('openai') or '' print({ 'openai_present': bool(k), 'length': len(k), 'masked': (k[:4] + '...' + k[-4:]) if k else None })