DEP: Dynamo 1.0 API and Non Feature Requirements#54
Conversation
Initial draft defining API stability guarantees, component boundaries, and non-feature requirements for the Dynamo 1.0 GA release. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Populated registry with 14 components including planner, router, frontend, backend wrappers, bindings, core libraries, KVBM, and deployment/DevEx resources. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive documentation of planner component including: - Public interface (classes, methods, config) - Internal dependencies (Python modules, Rust crates) - External dependencies (Python packages, services) - User/developer interaction patterns - Packaging and container info - Observability (metrics, dashboards) - 1.0 standardization checklist Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents extension points for: - Custom PlannerConnector implementations - Custom load predictors (BasePredictor ABC) - Configuration tuning (CLI args, env vars) - Custom metrics sources - Adding new backends Also documents current limitations and workarounds. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents planner's current architecture: - Autonomous control loop (no REST API) - Input sources (CLI, profiles, Prometheus, etcd, K8s) - Output destinations (K8s DGD, etcd, metrics, logs) - etcd key schema for VirtualConnector Identifies gaps and proposes 1.0 standardization options: - Option A: Add REST API - Option B: Standardize observability only - Option C: Event-driven architecture Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive documentation following planner template: - Overview and location - Internal dependencies (Python, Rust crates) - Public interface (endpoints, KvRouterConfig) - User/developer interaction (embedded, standalone, K8s) - Service interface & I/O contract - 1.0 standardization recommendations - Customization & extension points - Tests and related docs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add detailed component summaries following a consistent template for: - Frontend (HTTP entry point) - Backend vLLM (vLLM engine wrapper) - Backend SGLang (SGLang engine wrapper) - Backend TensorRT-LLM (TRT-LLM engine wrapper) - Python Bindings (PyO3/Maturin) - C Bindings (libdynamo_llm_capi) - Core Libraries (dynamo-runtime, dynamo-llm) - KVBM (KV Block Manager) - Deployment (K8s operator, Helm charts) - Recipes (production deployment templates) - Examples (tutorials and reference implementations) - Benchmarks (performance evaluation tools) Each summary includes: - Overview and location - Internal dependencies - Public interface - User/developer interaction - Packaging & containers - Service interface & I/O contract - Observability - 1.0 Standardization checklist - Customization & extension - Related documentation and tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add detailed component summaries for external LLM inference platforms: nvidia-lpu (Groq) components: - Nova: Agent conductor for inference engines - Neutrino: High-performance inference proxy (HTTP/gRPC to Cap'n Proto) - Sombrero: LLM load balancer built on Pingora - Supernova: K8s controllers for model deployments llm-d components: - Main: Orchestration layer, recipes, documentation - Inference Scheduler: EPP for request routing (extends GIE) - KV Cache: Distributed KV cache tracking - Inference Sim: Lightweight vLLM simulator - Benchmark: Automated benchmarking framework Each summary includes Dynamo equivalent mapping for API comparison. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document which Dynamo components interact with the KV Router: - Callers: Frontend (primary), Standalone Router - Publishers: vLLM, TRT-LLM, SGLang, Mocker workers - Configuration providers: Frontend CLI, Python bindings Include interaction flow diagram showing request and event flows, key router methods called, and worker registration flags. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds comprehensive documentation for Dynamo's integration with the Kubernetes Gateway API Inference Extension (GAIE) and kGateway: - Custom EPP (Endpoint Picker Processor) with Dynamo KV routing via C FFI - Helm chart templates for InferencePool, InferenceModel CRDs - Plugin chain configuration (dyn-kv, picker, cleanup) - Architecture diagram showing Gateway → EPP → Frontend → Workers flow - NVIDIA extension fields for disaggregated serving (nvext) - Comparison to llm-d's native Go implementation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create DEP for K8s API versioning strategy (v1alpha1 → v1beta1 → v1) - Document CRD field stability requirements for all 5 CRDs - Add label, port, and environment variable registries - Define deprecation policy and graduation criteria - Update deployment-component-summary.md with API formalization section Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create DEP analyzing 31 CLI arguments across 5 categories - Document validation requirements (port ranges, float bounds, mode checks) - Identify 10+ missing DYN_* environment variables - Propose argument grouping for better discoverability - Add implementation priority phases - Update frontend-component-summary.md with complete argument tables Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive analysis of inter-component contracts: - Worker registration (MDC schema, etcd keys, capabilities) - Request/response schemas (locations, extensions) - KV cache events (JSON schema, NATS subjects) - Health check contract - Metrics contract Critical enforcement gaps identified: - Python bridge loses type safety (Dict[str, Any]) - No capability negotiation between frontend/backends - Response validation is minimal - No protocol versioning Proposes: - Pydantic models for Python type safety - Capability declaration at registration - Semantic response validation - Protocol version headers Also updates frontend-component-summary.md with contract section. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Defines three-tier stability system (Experimental, Beta, Stable) with: - Industry research from PyTorch, Kubernetes, Rust, Google APIs, PEP 702 - Marking mechanisms for Python, Rust, CLI, K8s CRDs, env vars - Graduation criteria and deprecation timelines - Concrete Dynamo feature examples for each stability level Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Graham King <grahamk@nvidia.com>
| Bindings modules: | ||
| - dynamo.runtime | ||
| - dynamo.llm | ||
| - dynamo.nixl_connect |
There was a problem hiding this comment.
nit: nixl connect I wouldn't consider a binding -
| # Goals: | ||
|
|
||
| 1. Minimize the public interface, so we have less to support and more flexibility to evolve. | ||
| 2. Organize the public interface to make it more intuitive. |
There was a problem hiding this comment.
3 document our basic rationale / vision in selecting the api -
| Progress: https://github.com/ai-dynamo/dynamo/pull/5412 and https://github.com/ai-dynamo/dynamo/pull/5458 | ||
| Action: Identify more removals. This is challenging, we don't know what people are using. | ||
|
|
||
| ## Move internal bindings (meaning used by `components/` but not intended for public use otherwise) under new `_internal` module. |
There was a problem hiding this comment.
I would consider anything used by components as 'public' from the bindings module. Internal in my mind means use within the module.
deps/proposal-bindings-v2.md
Outdated
|
|
||
| ## Move internal bindings (meaning used by `components/` but not intended for public use otherwise) under new `_internal` module. | ||
|
|
||
| There should not be exposed: Namespace, Component, CancellationToken, Context, ModelDeploymentCard, ModelRuntimeConfig. |
There was a problem hiding this comment.
I think here we'll need to see a code example -
thinking again that functionally / use wise we have four big areas - and ideally we create modules - this would be more than 'bindings'
frontend
backend
router
common
|
|
||
| ## Prometheus metrics | ||
|
|
||
| They allow Python users to use Prometheus metrics with Dynamo labels attached. They are used instead of the official Prometheus library, and team consensus appears to be that this is necessary. |
There was a problem hiding this comment.
would like to probe on this further if it is needed.
If we are really simplifying the component , namespace, endpoint variables out - then not clear the labels are as important ....
right now - the two main use cases would be 'kvstats' and new metrics potentially from multimodal ....
the only other option to consider would be to use a python native construct
deps/proposal-bindings-v2.md
Outdated
| ## Misc tidy ups | ||
|
|
||
| . `register_llm` has a lot of required params, simplify. | ||
| . Look at `Client` (router) interface. Simplify, possibly rename. Merge with KV Router? |
There was a problem hiding this comment.
I think client depends on the runtime interface simplifications.
nnshah1
left a comment
There was a problem hiding this comment.
just some comments
- Replace DEP content with full requirements from temporary.md - Add Component Registry with support levels and standardization priorities - Add Testing Requirements section with test pyramid and coverage targets - Add Framework Requirements for vLLM/SGLang version support - Add Developer Velocity Requirements with metrics and ownership model - Add CI/CD Requirements with P0/P1 priorities - Add Packaging and Repo Structure section - Remove proposal-bindings-v2.md (moved to separate PR #68) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Template for documenting P0 components with sections for: - Overview and Quick Start - Design (architecture, relationships, data flow) - Configuration (CLI flags, env vars with tables) - Public API (for Library components) - Metrics & Observability - Customization & Extension Points - Testing (locations, coverage targets) - Error Handling - 1.0 Standardization Status Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Graham King <grahamk@nvidia.com>
| - `--model-path` - Model name or path | ||
| - `--tokenizer-path` - Tokenizer path | ||
|
|
||
| **Worker Mode (DisaggregationMode):** |
|
|
||
| ```bash | ||
| # Node 0 (leader) | ||
| python -m dynamo.sglang --model-path MODEL --node-rank 0 --tp-size 8 |
Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>
Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>
Summary
Initial draft DEP defining what "1.0" implies for Dynamo GA release:
Goals
Status
This is an early draft - Requirements, Opens, and Proposal sections are TBD.
🤖 Generated with Claude Code