[BUG]: After switching backend from SGLang to TensorRT-LLM, requests fail with no instances for backend/generate until frontend restart

### Describe the Bug

This issue manifests specifically after changing the deployment backend from SGLang to TensorRT-LLM (while keeping the same served model name / Dynamo namespace). On Kubernetes via DynamoGraphDeployment (DGD), the graph can report Ready and both frontend and worker pods can be Running/Ready, but /v1/chat/completions (or completions) returns HTTP 500 with an error like:

Failed to generate completions: no instances found for endpoint ".../backend/generate"

Frontend logs may also show a model card checksum mismatch relative to the model’s canonical checksum (i.e., a newly registered worker’s Model Deployment Card (MDC) is rejected).

Workaround observed: deleting/recreating the frontend pod restores successful traffic, suggesting incorrect or stuck frontend-side interpretation of discovery events / canonical MDC state across the backend swap, compounded by discovery (KV/etcd) contents and event ordering during rollout.

### Steps to Reproduce

1. Deploy a DGD with **SGLang** as the backend (frontend + worker, typical single-replica setup).
2. Wait until the DGD is **Ready**; call **`/v1/chat/completions`** (or completions) and confirm **success**.
3. **Change the DGD spec to TensorRT-LLM** (e.g., switch `backendFramework` / worker image / args), **keeping the same `served-model-name` and `dynamoNamespace`** as before.
4. Wait until the DGD is **Ready** again (confirm **new** frontend/worker pods exist and are Ready).
5. Repeat the same API request as step 2.
6. Observe **HTTP 500** and **`no instances found for endpoint ".../backend/generate"`** in frontend logs; optionally observe **MDC checksum / canonical checksum mismatch** logs.
7. Delete the **frontend** pod only, let it recreate, repeat step 5: requests **succeed** again.

### Expected Behavior

After a validated DGD Ready following a SGLang → TensorRT-LLM backend change, the system should converge to routable backend/generate instances without manual pod surgery—either by correct discovery lifecycle (old registrations removed before incompatible MDCs appear) and/or defined upgrade semantics (operator/docs: drain order, discovery cleanup, forced frontend roll).

### Actual Behavior

Following SGLang → TensorRT-LLM, requests fail with no instances found for endpoint ".../backend/generate" despite Kubernetes Ready state. Restarting only the frontend clears the failure.

### Environment

dynamo v0.7.1

### Additional Context

- Likely related to strict per-model-name canonical MDC checksum handling in lib/llm/src/discovery/watcher.rs and ModelDeploymentCard::mdcsum() in lib/llm/src/model_card.rs, plus DiscoveryInstance::Model { card_json, ... } in lib/runtime/src/discovery/mod.rs.

- Error string originates from lib/runtime/src/pipeline/network/egress/push_router.rs when zero available instances exist for the endpoint client.

- Asymmetry: vLLM ↔ SGLang transitions may not hit this as often; SGLang → TensorRT-LLM appears more prone (TRT stack often produces a materially different MDC fingerprint vs SGLang for the same HF model name).


### Screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: After switching backend from SGLang to TensorRT-LLM, requests fail with no instances for backend/generate until frontend restart #8263

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: After switching backend from SGLang to TensorRT-LLM, requests fail with no instances for backend/generate until frontend restart #8263

Description

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions