yfedoseev
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 3 additions & 0 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 77 additions & 182 deletions b/‎CHANGELOG.md‎
Lines changed: 77 additions & 182 deletions
diff --git a/‎Cargo.toml‎
Lines changed: 2 additions & 2 deletions b/‎Cargo.toml‎
Lines changed: 2 additions & 2 deletions
@@ -215,6 +215,9 @@ jobs:
     steps:
       - uses: actions/checkout@v6
 
+      - name: Install Rust
+        uses: dtolnay/rust-toolchain@stable
+
       - name: Check dependencies
         uses: EmbarkStudios/cargo-deny-action@v2
 
 
@@ -7,230 +7,125 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
-### Fixed
-
-#### Data Quality Improvements (Model Registry Audit - January 3, 2026)
-**Overall Quality Score: 91% → 98%**
-
-- **Cache Pricing**: Added missing cache pricing (3rd price value) for 8 models:
-  - Google Gemini 3 Pro/Flash (direct API and Vertex AI)
-  - DeepSeek Reasoner (direct API)
-  - Claude Haiku 4.5 (OpenRouter)
-  - Claude Sonnet/Haiku 4.5 (AWS Bedrock)
+## [0.1.1] - 2026-01-12
 
-- **Capability Flags**: Removed incorrect "S" (Structured Output) flag from 26 models that don't support strict JSON schema enforcement:
-  - Open-source models: Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B (Groq, Cerebras, SambaNova, Fireworks)
-  - Vision models: Gemini Flash, Jamba 2.0 (AI21)
-  - Various provider implementations: Vertex AI, Mistral, DeepSeek, Cohere
-  - Prevents application failures when attempting structured output on unsupported models
+### Fixed
 
-- **Benchmark Scores**: Corrected unrealistic benchmark scores:
-  - OpenAI o3: MATH 97.8→96.0, GPQA 85.4→75.0, SWE-bench 58.5→48.0
-  - DeepSeek R1 (all 4 providers): HumanEval 97.3→91.0, MATH 97.3→90.0
-  - Claude Opus/Sonnet 4.5: MGSM 94.2/93.5→91.5/91.0
+- **Package Names**: Corrected installation instructions in READMEs
+  - Python: `pip install llmkit-python` (was incorrectly `llmkit`)
+  - Node.js: `npm install llmkit-node` (was incorrectly `llmkit`)
+- **Badge URLs**: Fixed PyPI and npm badge links in main README
+- **Model Registry**: Regenerated from latest crawler data (97 providers, 11,067 models)
+  - Updated pricing, capabilities, and benchmark data
+  - Synchronized with latest provider API changes
 
-- **Missing Data**: Added missing benchmark scores:
-  - MMMU (multimodal understanding) for 2 Claude Haiku models
-  - Tool use flag for DeepSeek R1 on Together AI
+### Documentation
 
-- **Provider Consistency**: Verified cross-provider model consistency:
-  - All DeepSeek R1 variants now have identical benchmark scores
-  - Gemini 3 models have consistent pricing/specs across providers
-  - Vertex AI markup patterns documented and verified as intentional
+- Enhanced READMEs with "Why LLMKit?" section highlighting Rust benefits
+- Added production features overview (smart router, circuit breaker, guardrails)
+- Improved code examples for prompt caching, extended thinking, and model registry
+- Cleaned up internal development notes from documentation
+- Simplified PROVIDERS.md and MODELS_REGISTRY.md for better readability
 
-### Tests
-- ✅ All 186 tests passing
-- ✅ No regressions in model parsing or provider detection
-- ✅ Cache pricing validation complete
+## [0.1.0] - 2026-01-11
 
-## [0.1.0] - 2026-01-03
+Initial release of LLMKit.
 
 ### Added
 
-#### Phase 1: Extended Thinking Completion
-- **Google Gemini 2.0 Deep Thinking** via Vertex AI
-  - `VertexThinking` struct with configurable budget_tokens
-  - Automatic serialization mapping from unified `ThinkingConfig`
-  - Benchmark: 87% accuracy on complex reasoning tasks
-- **DeepSeek-R1 Reasoning Model Support**
-  - Automatic model selection (deepseek-chat vs deepseek-reasoner)
-  - Integrated with `ThinkingConfig` for unified thinking interface
-  - Benchmark: 71% pass rate on AIME competition problems
-- Extended thinking now available across 4 major providers:
-  - ✅ OpenAI (o3, o1-pro)
-  - ✅ Anthropic (claude-opus-4.1)
-  - ✅ Google Vertex (Gemini 2.0)
-  - ✅ DeepSeek (DeepSeek-R1)
-
-#### Phase 2: Regional Provider Expansion
-- **Mistral EU Regional Support**
-  - `MistralRegion` enum (Global/EU) with GDPR-compliant endpoint
-  - Configuration: `MISTRAL_REGION=eu` environment variable
-  - Compliant with European data residency requirements
-- **Maritaca AI Enhancements**
-  - Model discovery: `supported_models()` and `default_model()`
-  - Maritaca-3 model support for Portuguese/Brazilian Portuguese
-  - Brazilian market optimization
-- **Contingent Regional Providers (Pending API Access)**
-  - LightOn (France): GDPR-compliant models, awaiting partnership approval
-  - LatamGPT (Chile/Brazil): Spanish/Portuguese optimization, launching Jan-Feb 2026
-
-#### Phase 3: Real-Time Voice Upgrade
-- **Deepgram v3 Upgrade**
-  - `DeepgramVersion` enum supporting V1 (legacy) and V3 (new)
-  - Nova-3 model access via v3 API
-  - Backward compatible with existing V1 implementations
-- **ElevenLabs Streaming Enhancements**
-  - `LatencyMode` enum (5 levels: LowestLatency → HighestQuality)
-  - `StreamingOptions` for fine-grained streaming control
-  - Latency/quality tradeoff configuration
-- **Contingent Real-Time Voice Provider (Pending API Access)**
-  - Grok Real-Time Voice (xAI): Low-latency conversational AI, awaiting xAI partnership
-
-#### Phase 4: Video Generation Integration
-- **NEW `src/providers/video/` Modality**
-  - Architectural separation from image generation
-  - Unified video generation interface
-- **Runware Video Aggregator**
-  - Support for 5+ video models via single provider:
-    - runway-gen-4.5 (runway)
-    - kling-2.0 (keling)
-    - pika-1.0 (pika)
-    - hailuo-mini (hailuo)
-    - leonardo-ultra (leonardo)
-  - `VideoModel` enum for type-safe model selection
-  - `VideoGenerationResult` struct for response handling
-- **DiffusionRouter Skeleton** (Launching February 2026)
-  - Placeholder for future API integration
-  - Scheduled for Phase 5 implementation
-
-#### Phase 5: Domain-Specific Models & Documentation
-- **Med-PaLM 2 Medical Domain Integration**
-  - `VertexProvider::for_medical_domain()` helper method
-  - HIPAA compliance guidelines in documentation
-  - Use case: Healthcare AI applications
-- **Domain-Specific Documentation**
-  - NEW `docs/domain_models.md`: Finance, legal, medical, and scientific domains
-  - NEW `docs/scientific_benchmarks.md`: Detailed reasoning model benchmarks
-  - NEW `docs/MODELS_REGISTRY.md`: Complete model/method reference with Python & TypeScript examples
-- **Contingent Domain Providers (Pending API Access)**
-  - ChatLAW (Legal AI): Contract analysis and legal research, awaiting API approval
-  - BloombergGPT: Documented as enterprise-only, alternatives provided (FinGPT, AdaptLLM)
-
-#### Core Library (Rust)
-- Unified LLM API interface with 100+ providers:
-  - **Core**: Anthropic, OpenAI, Azure OpenAI
-  - **Cloud**: AWS Bedrock, Google Vertex AI, Google AI (Gemini)
-  - **Fast Inference**: Groq, Mistral, Cerebras, SambaNova, Fireworks, DeepSeek
-  - **Enterprise**: Cohere, AI21
-  - **Hosted**: Together, Perplexity, Anyscale, DeepInfra, Novita, Hyperbolic
-  - **Platforms**: HuggingFace, Replicate, Baseten, RunPod
-  - **Cloud ML**: Cloudflare, WatsonX, Databricks
-  - **Local**: Ollama, LM Studio, vLLM, TGI, Llamafile
-  - **Regional**: YandexGPT, GigaChat, Clova, Maritaca, Mistral (EU)
-  - **Specialized**: Voyage, Jina, Deepgram (v3), ElevenLabs, Fal
-  - **Video**: Runware (5 models), DiffusionRouter (planned)
-  - **Domain-Specific**: Med-PaLM 2, DeepSeek-R1 (scientific reasoning)
+#### Core Features
+- Unified LLM API interface with **100+ providers**
+- **11,000+ model registry** with pricing, capabilities, and benchmarks
 - Streaming completions with async iterators
 - Tool/function calling with fluent builder pattern (`ToolBuilder`)
-- Extended thinking mode across 4 providers with unified `ThinkingConfig`
-- Prompt caching support (5-minute and 1-hour TTL)
 - Structured output with JSON schema enforcement
 - Vision/image input support (base64 and URLs)
-- Embeddings API for text vectors
-- Batch processing API for async bulk requests
-- Token counting API for cost estimation
-- Model registry with 11,000+ models including:
-  - Pricing information (input/output per 1M tokens)
-  - Capability flags (vision, tools, streaming, JSON mode, thinking, video)
-  - Benchmark scores (MMLU, HumanEval, MATH, AIME reasoning, etc.)
 - Comprehensive error types for all failure modes
-- Feature flags for provider selection (reduces binary size)
-- Default retry logic with configurable backoff
+- Feature flags for provider selection
+
+#### Extended Thinking
+- Unified `ThinkingConfig` API across 4 providers:
+  - OpenAI (o3, o1-pro)
+  - Anthropic (Claude with extended thinking)
+  - Google Vertex AI (Gemini 2.0 Deep Thinking)
+  - DeepSeek (DeepSeek-R1)
+
+#### Prompt Caching
+- Native support for Anthropic, OpenAI, Google, and DeepSeek
+- 5-minute and 1-hour TTL options
+- Up to 90% cost savings on repeated prompts
+
+#### Regional Providers
+- Mistral EU with GDPR-compliant endpoint
+- Maritaca AI for Brazilian Portuguese
+- Regional configuration via environment variables
+
+#### Audio & Voice
+- Deepgram v3 with Nova-3 models
+- ElevenLabs with configurable latency modes
+- Speech-to-text and text-to-speech support
+
+#### Video Generation
+- Runware aggregator supporting 5+ video models
+- Runway Gen-4.5, Kling 2.0, Pika 1.0, and more
+
+#### Embeddings & Specialized
+- Voyage AI embeddings
+- Jina AI embeddings and reranking
+- Token counting API
+- Batch processing API
+
+#### Providers
+- **Core**: Anthropic, OpenAI, Azure OpenAI
+- **Cloud**: AWS Bedrock, Google Vertex AI, Google AI
+- **Fast Inference**: Groq, Mistral, Cerebras, SambaNova, Fireworks, DeepSeek
+- **Enterprise**: Cohere, AI21
+- **Hosted**: Together, Perplexity, DeepInfra, OpenRouter
+- **Local**: Ollama, LM Studio, vLLM, TGI, Llamafile
+- **Regional**: YandexGPT, GigaChat, Clova, Maritaca
+- **Specialized**: Voyage, Jina, Deepgram, ElevenLabs, Fal
 
 #### Python Bindings
-- Synchronous `LLMKitClient` for blocking operations
-- Asynchronous `AsyncLLMKitClient` for async/await
+- Synchronous `LLMKitClient` and async `AsyncLLMKitClient`
 - Full streaming support with iterators
-- Type stubs (`.pyi`) for IDE completion
-- All 100+ providers accessible via `from_env()` or explicit config
-- Complete feature parity with Rust core:
-  - Extended thinking across 4 providers
-  - Regional provider access (Mistral EU, Maritaca, etc.)
-  - Real-time voice streaming (Deepgram v3, ElevenLabs)
-  - Video generation (Runware 5+ models)
-  - Domain-specific models (Med-PaLM 2, scientific reasoning)
-  - Embeddings API
-  - Batch processing API
-  - Token counting API
-  - Model registry access
+- Type stubs for IDE completion
+- Complete feature parity with Rust core
 
 #### Node.js/TypeScript Bindings
 - `LLMKitClient` with async/await API
-- Streaming via async iterator (`stream()`) and callback (`completeStream()`)
-- Full TypeScript type definitions (`.d.ts`)
-- All 100+ providers accessible via `fromEnv()` or explicit config
-- Complete feature parity with Rust core:
-  - Extended thinking across 4 providers
-  - Regional provider access (Mistral EU, Maritaca, etc.)
-  - Real-time voice streaming (Deepgram v3, ElevenLabs)
-  - Video generation (Runware 5+ models)
-  - Domain-specific models (Med-PaLM 2, scientific reasoning)
-  - Embeddings API
-  - Batch processing API
-  - Token counting API
-  - Model registry access
+- Streaming via async iterator and callbacks
+- Full TypeScript type definitions
+- Complete feature parity with Rust core
 
 ### Security
 - No unsafe code in core library
 - API keys not logged
 - HTTPS enforced for all providers
-- Pre-compiled regex patterns using `LazyLock` for thread safety
-- Secure credential handling for regional providers with data residency requirements
 
 ### Testing
-- 186+ Rust unit and integration tests (13 new test modules from Phase 5)
-- 83 Python tests covering all major features
-- 77 Node.js tests covering all major features
-- Provider integration tests (requires API keys)
-- 3-tier testing strategy: Unit (CI/CD) + Mock (CI/CD) + Manual (real APIs)
+- 186+ tests (Rust, Python, Node.js)
+- Unit, integration, and mock test coverage
 
 ### Documentation
-- Getting Started guides for Python, Node.js, and Rust
+- Getting Started guides for Rust, Python, and Node.js
 - API reference documentation
 - Provider configuration guide
-- 27+ example files across all platforms
-- NEW: Domain-specific model guide (`docs/domain_models.md`)
-- NEW: Scientific benchmarks and reasoning models (`docs/scientific_benchmarks.md`)
-- NEW: Complete models registry with Python/TypeScript examples (`docs/MODELS_REGISTRY.md`)
-- NEW: Regional provider guidance for GDPR/data residency compliance
-
-### Breaking Changes
-- None (all features are additive, backward compatible with v0.1.0)
+- 27+ example files
 
 ---
 
 ## Future Plans
 
 ### [0.2.0] - Planned
 
-#### Features
 - Provider pooling and load balancing
-- Automatic failover/fallback between providers
+- Automatic failover between providers
 - Health checking for provider availability
-- Guardrails integration
 - Cost metering and budget controls
-- Multi-tenancy support
-- Caching provider
-- Custom retry configuration
-- Prompt templates with variable substitution
-
-#### Improvements
-- Secure string handling for API keys
-- Key rotation support
-- Audit logging
-- Performance optimizations
+- Guardrails integration
 
 ---
 
+[Unreleased]: https://github.com/yfedoseev/llmkit/compare/v0.1.1...HEAD
+[0.1.1]: https://github.com/yfedoseev/llmkit/compare/v0.1.0...v0.1.1
 [0.1.0]: https://github.com/yfedoseev/llmkit/releases/tag/v0.1.0
@@ -34,8 +34,8 @@ cargo_common_metadata = "warn"
 
 [package]
 name = "llmkit"
-description = "Unified LLM API client for Rust - multi-provider support with a single interface"
-version = "0.1.0"
+description = "Production-grade LLM client - 100+ providers, 11,000+ models. Pure Rust."
+version = "0.1.1"
 edition = "2021"
 license = "MIT OR Apache-2.0"
 repository = "https://github.com/yfedoseev/llmkit"