Skip to content

Commit 330e105

Browse files
authored
Release 0.1.1: Documentation fixes and cleanup (#12)
* fix(docs): correct package names in binding READMEs * docs: enhance READMEs with production focus and key differentiators - Add 'Why LLMKit?' section highlighting Rust benefits - Emphasize production-ready features (memory safety, no GIL, no leaks) - Add prompt caching examples with 90% cost savings messaging - Add extended thinking, model registry examples - Fix badge URLs (llmkit-python, llmkit-node) - Add production features table (smart router, circuit breaker, guardrails) - Improve code examples with cleaner imports for Node.js * docs: rewrite PROVIDERS.md and MODELS_REGISTRY.md - Remove internal 'Phase' development notes - Create clean, user-friendly provider documentation - Organize 100+ providers by category (Cloud, Inference, Regional, etc.) - Add environment variables and features for each provider - Rewrite model registry with practical examples - Add popular models table with pricing - Add capability query examples * docs: remove internal development notes and clean up documentation - Remove docs with internal "Phase" references (moved to model_registry): - additional_providers.md - emerging_specialized_providers.md - domain_models.md - scientific_benchmarks.md - Rewrite CHANGELOG.md with standard Keep a Changelog format - Simplify RELEASE_NOTES.md to user-facing release notes - Update docs/INDEX.md to remove references to deleted files * fix(models): regenerate model registry from latest crawler data - Updated model data from model_registry crawler (97 providers, 11,067 models) - Refreshed pricing, capabilities, and benchmark data - Synchronized with latest provider API changes * chore: bump version to 0.1.1 - Update Cargo.toml versions (main, python, node) - Update pyproject.toml version - Update package.json version - Update CHANGELOG.md with 0.1.1 release notes * test(python): skip tests for API response types without constructors Word and TranscribeResponse are returned from API calls and cannot be instantiated directly in tests (no #[new] constructor in PyO3 bindings). * revert: remove unnecessary skip markers from Python tests The tests were passing - the CI failure was a Rust toolchain issue, not Python test failures. * fix(ci): install Rust before cargo-deny to fix toolchain conflict The cargo-deny-action v2 runs in a musl container which conflicted with rust-toolchain.toml causing: error: override toolchain 'stable-x86_64-unknown-linux-musl' is not installed Installing the stable toolchain explicitly before running cargo-deny resolves the conflict. * fix(test): use full model ID instead of alias in provider test The model alias 'gpt-4o' is shared by multiple providers (OpenAI, OpenRouter). The last provider in the registry wins the alias lookup. Using the full ID 'openai/gpt-4o' ensures the test checks the correct provider. * fix(models): prioritize native providers for alias resolution When multiple providers offer the same model (e.g., gpt-4o via OpenAI, Azure, OpenRouter), aliases now resolve to the native provider: - gpt-4o → openai/gpt-4o (not azure or openrouter) - claude-sonnet-4-5 → anthropic/claude-sonnet-4-5 Added is_native_provider() function to identify canonical providers for model families (OpenAI for GPT/O1/O3, Anthropic for Claude, etc.) Also fixes raw_id collision where openrouter/gpt-4o's raw_id "gpt-4o" was overwriting the OpenAI entry. * chore(models): regenerate model registry * fix(test): update documentation completeness test for cleaned docs The test was checking for docs/domain_models.md and docs/scientific_benchmarks.md which were moved to ~/projects/model_registry as part of documentation cleanup. Updated to check for core docs that actually exist: - docs/INDEX.md - docs/MODELS_REGISTRY.md - docs/getting-started-rust.md * fix(docs): correct v0.1.0 release date to 2026-01-11 * docs: update package descriptions to highlight production-grade features Emphasize 100+ providers and 11,000+ models across all packages.
1 parent 0cbfc9e commit 330e105

23 files changed

Lines changed: 13632 additions & 16991 deletions

.github/workflows/ci.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,9 @@ jobs:
215215
steps:
216216
- uses: actions/checkout@v6
217217

218+
- name: Install Rust
219+
uses: dtolnay/rust-toolchain@stable
220+
218221
- name: Check dependencies
219222
uses: EmbarkStudios/cargo-deny-action@v2
220223

CHANGELOG.md

Lines changed: 77 additions & 182 deletions
Original file line numberDiff line numberDiff line change
@@ -7,230 +7,125 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10-
### Fixed
11-
12-
#### Data Quality Improvements (Model Registry Audit - January 3, 2026)
13-
**Overall Quality Score: 91% → 98%**
14-
15-
- **Cache Pricing**: Added missing cache pricing (3rd price value) for 8 models:
16-
- Google Gemini 3 Pro/Flash (direct API and Vertex AI)
17-
- DeepSeek Reasoner (direct API)
18-
- Claude Haiku 4.5 (OpenRouter)
19-
- Claude Sonnet/Haiku 4.5 (AWS Bedrock)
10+
## [0.1.1] - 2026-01-12
2011

21-
- **Capability Flags**: Removed incorrect "S" (Structured Output) flag from 26 models that don't support strict JSON schema enforcement:
22-
- Open-source models: Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B (Groq, Cerebras, SambaNova, Fireworks)
23-
- Vision models: Gemini Flash, Jamba 2.0 (AI21)
24-
- Various provider implementations: Vertex AI, Mistral, DeepSeek, Cohere
25-
- Prevents application failures when attempting structured output on unsupported models
12+
### Fixed
2613

27-
- **Benchmark Scores**: Corrected unrealistic benchmark scores:
28-
- OpenAI o3: MATH 97.8→96.0, GPQA 85.4→75.0, SWE-bench 58.5→48.0
29-
- DeepSeek R1 (all 4 providers): HumanEval 97.3→91.0, MATH 97.3→90.0
30-
- Claude Opus/Sonnet 4.5: MGSM 94.2/93.5→91.5/91.0
14+
- **Package Names**: Corrected installation instructions in READMEs
15+
- Python: `pip install llmkit-python` (was incorrectly `llmkit`)
16+
- Node.js: `npm install llmkit-node` (was incorrectly `llmkit`)
17+
- **Badge URLs**: Fixed PyPI and npm badge links in main README
18+
- **Model Registry**: Regenerated from latest crawler data (97 providers, 11,067 models)
19+
- Updated pricing, capabilities, and benchmark data
20+
- Synchronized with latest provider API changes
3121

32-
- **Missing Data**: Added missing benchmark scores:
33-
- MMMU (multimodal understanding) for 2 Claude Haiku models
34-
- Tool use flag for DeepSeek R1 on Together AI
22+
### Documentation
3523

36-
- **Provider Consistency**: Verified cross-provider model consistency:
37-
- All DeepSeek R1 variants now have identical benchmark scores
38-
- Gemini 3 models have consistent pricing/specs across providers
39-
- Vertex AI markup patterns documented and verified as intentional
24+
- Enhanced READMEs with "Why LLMKit?" section highlighting Rust benefits
25+
- Added production features overview (smart router, circuit breaker, guardrails)
26+
- Improved code examples for prompt caching, extended thinking, and model registry
27+
- Cleaned up internal development notes from documentation
28+
- Simplified PROVIDERS.md and MODELS_REGISTRY.md for better readability
4029

41-
### Tests
42-
- ✅ All 186 tests passing
43-
- ✅ No regressions in model parsing or provider detection
44-
- ✅ Cache pricing validation complete
30+
## [0.1.0] - 2026-01-11
4531

46-
## [0.1.0] - 2026-01-03
32+
Initial release of LLMKit.
4733

4834
### Added
4935

50-
#### Phase 1: Extended Thinking Completion
51-
- **Google Gemini 2.0 Deep Thinking** via Vertex AI
52-
- `VertexThinking` struct with configurable budget_tokens
53-
- Automatic serialization mapping from unified `ThinkingConfig`
54-
- Benchmark: 87% accuracy on complex reasoning tasks
55-
- **DeepSeek-R1 Reasoning Model Support**
56-
- Automatic model selection (deepseek-chat vs deepseek-reasoner)
57-
- Integrated with `ThinkingConfig` for unified thinking interface
58-
- Benchmark: 71% pass rate on AIME competition problems
59-
- Extended thinking now available across 4 major providers:
60-
- ✅ OpenAI (o3, o1-pro)
61-
- ✅ Anthropic (claude-opus-4.1)
62-
- ✅ Google Vertex (Gemini 2.0)
63-
- ✅ DeepSeek (DeepSeek-R1)
64-
65-
#### Phase 2: Regional Provider Expansion
66-
- **Mistral EU Regional Support**
67-
- `MistralRegion` enum (Global/EU) with GDPR-compliant endpoint
68-
- Configuration: `MISTRAL_REGION=eu` environment variable
69-
- Compliant with European data residency requirements
70-
- **Maritaca AI Enhancements**
71-
- Model discovery: `supported_models()` and `default_model()`
72-
- Maritaca-3 model support for Portuguese/Brazilian Portuguese
73-
- Brazilian market optimization
74-
- **Contingent Regional Providers (Pending API Access)**
75-
- LightOn (France): GDPR-compliant models, awaiting partnership approval
76-
- LatamGPT (Chile/Brazil): Spanish/Portuguese optimization, launching Jan-Feb 2026
77-
78-
#### Phase 3: Real-Time Voice Upgrade
79-
- **Deepgram v3 Upgrade**
80-
- `DeepgramVersion` enum supporting V1 (legacy) and V3 (new)
81-
- Nova-3 model access via v3 API
82-
- Backward compatible with existing V1 implementations
83-
- **ElevenLabs Streaming Enhancements**
84-
- `LatencyMode` enum (5 levels: LowestLatency → HighestQuality)
85-
- `StreamingOptions` for fine-grained streaming control
86-
- Latency/quality tradeoff configuration
87-
- **Contingent Real-Time Voice Provider (Pending API Access)**
88-
- Grok Real-Time Voice (xAI): Low-latency conversational AI, awaiting xAI partnership
89-
90-
#### Phase 4: Video Generation Integration
91-
- **NEW `src/providers/video/` Modality**
92-
- Architectural separation from image generation
93-
- Unified video generation interface
94-
- **Runware Video Aggregator**
95-
- Support for 5+ video models via single provider:
96-
- runway-gen-4.5 (runway)
97-
- kling-2.0 (keling)
98-
- pika-1.0 (pika)
99-
- hailuo-mini (hailuo)
100-
- leonardo-ultra (leonardo)
101-
- `VideoModel` enum for type-safe model selection
102-
- `VideoGenerationResult` struct for response handling
103-
- **DiffusionRouter Skeleton** (Launching February 2026)
104-
- Placeholder for future API integration
105-
- Scheduled for Phase 5 implementation
106-
107-
#### Phase 5: Domain-Specific Models & Documentation
108-
- **Med-PaLM 2 Medical Domain Integration**
109-
- `VertexProvider::for_medical_domain()` helper method
110-
- HIPAA compliance guidelines in documentation
111-
- Use case: Healthcare AI applications
112-
- **Domain-Specific Documentation**
113-
- NEW `docs/domain_models.md`: Finance, legal, medical, and scientific domains
114-
- NEW `docs/scientific_benchmarks.md`: Detailed reasoning model benchmarks
115-
- NEW `docs/MODELS_REGISTRY.md`: Complete model/method reference with Python & TypeScript examples
116-
- **Contingent Domain Providers (Pending API Access)**
117-
- ChatLAW (Legal AI): Contract analysis and legal research, awaiting API approval
118-
- BloombergGPT: Documented as enterprise-only, alternatives provided (FinGPT, AdaptLLM)
119-
120-
#### Core Library (Rust)
121-
- Unified LLM API interface with 100+ providers:
122-
- **Core**: Anthropic, OpenAI, Azure OpenAI
123-
- **Cloud**: AWS Bedrock, Google Vertex AI, Google AI (Gemini)
124-
- **Fast Inference**: Groq, Mistral, Cerebras, SambaNova, Fireworks, DeepSeek
125-
- **Enterprise**: Cohere, AI21
126-
- **Hosted**: Together, Perplexity, Anyscale, DeepInfra, Novita, Hyperbolic
127-
- **Platforms**: HuggingFace, Replicate, Baseten, RunPod
128-
- **Cloud ML**: Cloudflare, WatsonX, Databricks
129-
- **Local**: Ollama, LM Studio, vLLM, TGI, Llamafile
130-
- **Regional**: YandexGPT, GigaChat, Clova, Maritaca, Mistral (EU)
131-
- **Specialized**: Voyage, Jina, Deepgram (v3), ElevenLabs, Fal
132-
- **Video**: Runware (5 models), DiffusionRouter (planned)
133-
- **Domain-Specific**: Med-PaLM 2, DeepSeek-R1 (scientific reasoning)
36+
#### Core Features
37+
- Unified LLM API interface with **100+ providers**
38+
- **11,000+ model registry** with pricing, capabilities, and benchmarks
13439
- Streaming completions with async iterators
13540
- Tool/function calling with fluent builder pattern (`ToolBuilder`)
136-
- Extended thinking mode across 4 providers with unified `ThinkingConfig`
137-
- Prompt caching support (5-minute and 1-hour TTL)
13841
- Structured output with JSON schema enforcement
13942
- Vision/image input support (base64 and URLs)
140-
- Embeddings API for text vectors
141-
- Batch processing API for async bulk requests
142-
- Token counting API for cost estimation
143-
- Model registry with 11,000+ models including:
144-
- Pricing information (input/output per 1M tokens)
145-
- Capability flags (vision, tools, streaming, JSON mode, thinking, video)
146-
- Benchmark scores (MMLU, HumanEval, MATH, AIME reasoning, etc.)
14743
- Comprehensive error types for all failure modes
148-
- Feature flags for provider selection (reduces binary size)
149-
- Default retry logic with configurable backoff
44+
- Feature flags for provider selection
45+
46+
#### Extended Thinking
47+
- Unified `ThinkingConfig` API across 4 providers:
48+
- OpenAI (o3, o1-pro)
49+
- Anthropic (Claude with extended thinking)
50+
- Google Vertex AI (Gemini 2.0 Deep Thinking)
51+
- DeepSeek (DeepSeek-R1)
52+
53+
#### Prompt Caching
54+
- Native support for Anthropic, OpenAI, Google, and DeepSeek
55+
- 5-minute and 1-hour TTL options
56+
- Up to 90% cost savings on repeated prompts
57+
58+
#### Regional Providers
59+
- Mistral EU with GDPR-compliant endpoint
60+
- Maritaca AI for Brazilian Portuguese
61+
- Regional configuration via environment variables
62+
63+
#### Audio & Voice
64+
- Deepgram v3 with Nova-3 models
65+
- ElevenLabs with configurable latency modes
66+
- Speech-to-text and text-to-speech support
67+
68+
#### Video Generation
69+
- Runware aggregator supporting 5+ video models
70+
- Runway Gen-4.5, Kling 2.0, Pika 1.0, and more
71+
72+
#### Embeddings & Specialized
73+
- Voyage AI embeddings
74+
- Jina AI embeddings and reranking
75+
- Token counting API
76+
- Batch processing API
77+
78+
#### Providers
79+
- **Core**: Anthropic, OpenAI, Azure OpenAI
80+
- **Cloud**: AWS Bedrock, Google Vertex AI, Google AI
81+
- **Fast Inference**: Groq, Mistral, Cerebras, SambaNova, Fireworks, DeepSeek
82+
- **Enterprise**: Cohere, AI21
83+
- **Hosted**: Together, Perplexity, DeepInfra, OpenRouter
84+
- **Local**: Ollama, LM Studio, vLLM, TGI, Llamafile
85+
- **Regional**: YandexGPT, GigaChat, Clova, Maritaca
86+
- **Specialized**: Voyage, Jina, Deepgram, ElevenLabs, Fal
15087

15188
#### Python Bindings
152-
- Synchronous `LLMKitClient` for blocking operations
153-
- Asynchronous `AsyncLLMKitClient` for async/await
89+
- Synchronous `LLMKitClient` and async `AsyncLLMKitClient`
15490
- Full streaming support with iterators
155-
- Type stubs (`.pyi`) for IDE completion
156-
- All 100+ providers accessible via `from_env()` or explicit config
157-
- Complete feature parity with Rust core:
158-
- Extended thinking across 4 providers
159-
- Regional provider access (Mistral EU, Maritaca, etc.)
160-
- Real-time voice streaming (Deepgram v3, ElevenLabs)
161-
- Video generation (Runware 5+ models)
162-
- Domain-specific models (Med-PaLM 2, scientific reasoning)
163-
- Embeddings API
164-
- Batch processing API
165-
- Token counting API
166-
- Model registry access
91+
- Type stubs for IDE completion
92+
- Complete feature parity with Rust core
16793

16894
#### Node.js/TypeScript Bindings
16995
- `LLMKitClient` with async/await API
170-
- Streaming via async iterator (`stream()`) and callback (`completeStream()`)
171-
- Full TypeScript type definitions (`.d.ts`)
172-
- All 100+ providers accessible via `fromEnv()` or explicit config
173-
- Complete feature parity with Rust core:
174-
- Extended thinking across 4 providers
175-
- Regional provider access (Mistral EU, Maritaca, etc.)
176-
- Real-time voice streaming (Deepgram v3, ElevenLabs)
177-
- Video generation (Runware 5+ models)
178-
- Domain-specific models (Med-PaLM 2, scientific reasoning)
179-
- Embeddings API
180-
- Batch processing API
181-
- Token counting API
182-
- Model registry access
96+
- Streaming via async iterator and callbacks
97+
- Full TypeScript type definitions
98+
- Complete feature parity with Rust core
18399

184100
### Security
185101
- No unsafe code in core library
186102
- API keys not logged
187103
- HTTPS enforced for all providers
188-
- Pre-compiled regex patterns using `LazyLock` for thread safety
189-
- Secure credential handling for regional providers with data residency requirements
190104

191105
### Testing
192-
- 186+ Rust unit and integration tests (13 new test modules from Phase 5)
193-
- 83 Python tests covering all major features
194-
- 77 Node.js tests covering all major features
195-
- Provider integration tests (requires API keys)
196-
- 3-tier testing strategy: Unit (CI/CD) + Mock (CI/CD) + Manual (real APIs)
106+
- 186+ tests (Rust, Python, Node.js)
107+
- Unit, integration, and mock test coverage
197108

198109
### Documentation
199-
- Getting Started guides for Python, Node.js, and Rust
110+
- Getting Started guides for Rust, Python, and Node.js
200111
- API reference documentation
201112
- Provider configuration guide
202-
- 27+ example files across all platforms
203-
- NEW: Domain-specific model guide (`docs/domain_models.md`)
204-
- NEW: Scientific benchmarks and reasoning models (`docs/scientific_benchmarks.md`)
205-
- NEW: Complete models registry with Python/TypeScript examples (`docs/MODELS_REGISTRY.md`)
206-
- NEW: Regional provider guidance for GDPR/data residency compliance
207-
208-
### Breaking Changes
209-
- None (all features are additive, backward compatible with v0.1.0)
113+
- 27+ example files
210114

211115
---
212116

213117
## Future Plans
214118

215119
### [0.2.0] - Planned
216120

217-
#### Features
218121
- Provider pooling and load balancing
219-
- Automatic failover/fallback between providers
122+
- Automatic failover between providers
220123
- Health checking for provider availability
221-
- Guardrails integration
222124
- Cost metering and budget controls
223-
- Multi-tenancy support
224-
- Caching provider
225-
- Custom retry configuration
226-
- Prompt templates with variable substitution
227-
228-
#### Improvements
229-
- Secure string handling for API keys
230-
- Key rotation support
231-
- Audit logging
232-
- Performance optimizations
125+
- Guardrails integration
233126

234127
---
235128

129+
[Unreleased]: https://github.com/yfedoseev/llmkit/compare/v0.1.1...HEAD
130+
[0.1.1]: https://github.com/yfedoseev/llmkit/compare/v0.1.0...v0.1.1
236131
[0.1.0]: https://github.com/yfedoseev/llmkit/releases/tag/v0.1.0

Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ cargo_common_metadata = "warn"
3434

3535
[package]
3636
name = "llmkit"
37-
description = "Unified LLM API client for Rust - multi-provider support with a single interface"
38-
version = "0.1.0"
37+
description = "Production-grade LLM client - 100+ providers, 11,000+ models. Pure Rust."
38+
version = "0.1.1"
3939
edition = "2021"
4040
license = "MIT OR Apache-2.0"
4141
repository = "https://github.com/yfedoseev/llmkit"

0 commit comments

Comments
 (0)