Skip to content

DEP: Dynamo 1.0 API and Non Feature Requirements#54

Draft
nnshah1 wants to merge 23 commits intomainfrom
dynamo_api
Draft

DEP: Dynamo 1.0 API and Non Feature Requirements#54
nnshah1 wants to merge 23 commits intomainfrom
dynamo_api

Conversation

@nnshah1
Copy link
Contributor

@nnshah1 nnshah1 commented Jan 8, 2026

Summary

Initial draft DEP defining what "1.0" implies for Dynamo GA release:

  • API stability guarantees
  • Component and module boundaries
  • Non-feature requirements (tracing, logging, observability, naming, testing)

Goals

  • Clear versioned public interfaces
  • Modular components supporting replacement, reuse, and customization
  • Consistent DX driven from industry standards
  • Extension support without deep modifications
  • Support for gen AI use cases targeted for 2026

Status

This is an early draft - Requirements, Opens, and Proposal sections are TBD.


🤖 Generated with Claude Code

nnshah1 and others added 18 commits January 8, 2026 10:11
Initial draft defining API stability guarantees, component boundaries,
and non-feature requirements for the Dynamo 1.0 GA release.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Populated registry with 14 components including planner, router,
frontend, backend wrappers, bindings, core libraries, KVBM, and
deployment/DevEx resources.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive documentation of planner component including:
- Public interface (classes, methods, config)
- Internal dependencies (Python modules, Rust crates)
- External dependencies (Python packages, services)
- User/developer interaction patterns
- Packaging and container info
- Observability (metrics, dashboards)
- 1.0 standardization checklist

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents extension points for:
- Custom PlannerConnector implementations
- Custom load predictors (BasePredictor ABC)
- Configuration tuning (CLI args, env vars)
- Custom metrics sources
- Adding new backends

Also documents current limitations and workarounds.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents planner's current architecture:
- Autonomous control loop (no REST API)
- Input sources (CLI, profiles, Prometheus, etcd, K8s)
- Output destinations (K8s DGD, etcd, metrics, logs)
- etcd key schema for VirtualConnector

Identifies gaps and proposes 1.0 standardization options:
- Option A: Add REST API
- Option B: Standardize observability only
- Option C: Event-driven architecture

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive documentation following planner template:
- Overview and location
- Internal dependencies (Python, Rust crates)
- Public interface (endpoints, KvRouterConfig)
- User/developer interaction (embedded, standalone, K8s)
- Service interface & I/O contract
- 1.0 standardization recommendations
- Customization & extension points
- Tests and related docs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add detailed component summaries following a consistent template for:
- Frontend (HTTP entry point)
- Backend vLLM (vLLM engine wrapper)
- Backend SGLang (SGLang engine wrapper)
- Backend TensorRT-LLM (TRT-LLM engine wrapper)
- Python Bindings (PyO3/Maturin)
- C Bindings (libdynamo_llm_capi)
- Core Libraries (dynamo-runtime, dynamo-llm)
- KVBM (KV Block Manager)
- Deployment (K8s operator, Helm charts)
- Recipes (production deployment templates)
- Examples (tutorials and reference implementations)
- Benchmarks (performance evaluation tools)

Each summary includes:
- Overview and location
- Internal dependencies
- Public interface
- User/developer interaction
- Packaging & containers
- Service interface & I/O contract
- Observability
- 1.0 Standardization checklist
- Customization & extension
- Related documentation and tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add detailed component summaries for external LLM inference platforms:

nvidia-lpu (Groq) components:
- Nova: Agent conductor for inference engines
- Neutrino: High-performance inference proxy (HTTP/gRPC to Cap'n Proto)
- Sombrero: LLM load balancer built on Pingora
- Supernova: K8s controllers for model deployments

llm-d components:
- Main: Orchestration layer, recipes, documentation
- Inference Scheduler: EPP for request routing (extends GIE)
- KV Cache: Distributed KV cache tracking
- Inference Sim: Lightweight vLLM simulator
- Benchmark: Automated benchmarking framework

Each summary includes Dynamo equivalent mapping for API comparison.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document which Dynamo components interact with the KV Router:
- Callers: Frontend (primary), Standalone Router
- Publishers: vLLM, TRT-LLM, SGLang, Mocker workers
- Configuration providers: Frontend CLI, Python bindings

Include interaction flow diagram showing request and event flows,
key router methods called, and worker registration flags.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds comprehensive documentation for Dynamo's integration with the
Kubernetes Gateway API Inference Extension (GAIE) and kGateway:

- Custom EPP (Endpoint Picker Processor) with Dynamo KV routing via C FFI
- Helm chart templates for InferencePool, InferenceModel CRDs
- Plugin chain configuration (dyn-kv, picker, cleanup)
- Architecture diagram showing Gateway → EPP → Frontend → Workers flow
- NVIDIA extension fields for disaggregated serving (nvext)
- Comparison to llm-d's native Go implementation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create DEP for K8s API versioning strategy (v1alpha1 → v1beta1 → v1)
- Document CRD field stability requirements for all 5 CRDs
- Add label, port, and environment variable registries
- Define deprecation policy and graduation criteria
- Update deployment-component-summary.md with API formalization section

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create DEP analyzing 31 CLI arguments across 5 categories
- Document validation requirements (port ranges, float bounds, mode checks)
- Identify 10+ missing DYN_* environment variables
- Propose argument grouping for better discoverability
- Add implementation priority phases
- Update frontend-component-summary.md with complete argument tables

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comprehensive analysis of inter-component contracts:

- Worker registration (MDC schema, etcd keys, capabilities)
- Request/response schemas (locations, extensions)
- KV cache events (JSON schema, NATS subjects)
- Health check contract
- Metrics contract

Critical enforcement gaps identified:
- Python bridge loses type safety (Dict[str, Any])
- No capability negotiation between frontend/backends
- Response validation is minimal
- No protocol versioning

Proposes:
- Pydantic models for Python type safety
- Capability declaration at registration
- Semantic response validation
- Protocol version headers

Also updates frontend-component-summary.md with contract section.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Defines three-tier stability system (Experimental, Beta, Stable) with:
- Industry research from PyTorch, Kubernetes, Rust, Google APIs, PEP 702
- Marking mechanisms for Python, Rust, CLI, K8s CRDs, env vars
- Graduation criteria and deprecation timelines
- Concrete Dynamo feature examples for each stability level

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Graham King <grahamk@nvidia.com>
Bindings modules:
- dynamo.runtime
- dynamo.llm
- dynamo.nixl_connect
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nixl connect I wouldn't consider a binding -

# Goals:

1. Minimize the public interface, so we have less to support and more flexibility to evolve.
2. Organize the public interface to make it more intuitive.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 document our basic rationale / vision in selecting the api -

Progress: https://github.com/ai-dynamo/dynamo/pull/5412 and https://github.com/ai-dynamo/dynamo/pull/5458
Action: Identify more removals. This is challenging, we don't know what people are using.

## Move internal bindings (meaning used by `components/` but not intended for public use otherwise) under new `_internal` module.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider anything used by components as 'public' from the bindings module. Internal in my mind means use within the module.


## Move internal bindings (meaning used by `components/` but not intended for public use otherwise) under new `_internal` module.

There should not be exposed: Namespace, Component, CancellationToken, Context, ModelDeploymentCard, ModelRuntimeConfig.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we'll need to see a code example -

thinking again that functionally / use wise we have four big areas - and ideally we create modules - this would be more than 'bindings'

frontend
backend
router
common


## Prometheus metrics

They allow Python users to use Prometheus metrics with Dynamo labels attached. They are used instead of the official Prometheus library, and team consensus appears to be that this is necessary.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to probe on this further if it is needed.

If we are really simplifying the component , namespace, endpoint variables out - then not clear the labels are as important ....

right now - the two main use cases would be 'kvstats' and new metrics potentially from multimodal ....

the only other option to consider would be to use a python native construct

## Misc tidy ups

. `register_llm` has a lot of required params, simplify.
. Look at `Client` (router) interface. Simplify, possibly rename. Merge with KV Router?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think client depends on the runtime interface simplifications.

Copy link
Contributor Author

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some comments

nnshah1 and others added 3 commits February 5, 2026 05:51
- Replace DEP content with full requirements from temporary.md
- Add Component Registry with support levels and standardization priorities
- Add Testing Requirements section with test pyramid and coverage targets
- Add Framework Requirements for vLLM/SGLang version support
- Add Developer Velocity Requirements with metrics and ownership model
- Add CI/CD Requirements with P0/P1 priorities
- Add Packaging and Repo Structure section
- Remove proposal-bindings-v2.md (moved to separate PR #68)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Template for documenting P0 components with sections for:
- Overview and Quick Start
- Design (architecture, relationships, data flow)
- Configuration (CLI flags, env vars with tables)
- Public API (for Library components)
- Metrics & Observability
- Customization & Extension Points
- Testing (locations, coverage targets)
- Error Handling
- 1.0 Standardization Status

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Graham King <grahamk@nvidia.com>
- `--model-path` - Model name or path
- `--tokenizer-path` - Tokenizer path

**Worker Mode (DisaggregationMode):**
Copy link

@ishandhanani ishandhanani Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--disaggregation-mode


```bash
# Node 0 (leader)
python -m dynamo.sglang --model-path MODEL --node-rank 0 --tp-size 8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need --nnodes

Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>
Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants