-
Notifications
You must be signed in to change notification settings - Fork 2k
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.
Description
🚀 The feature, motivation and pitch
@lucaslie found differences between AD's and PT's logits post-processing implementations and we should study these to understand the perf implications of the 2 implementations.
Slack: https://nvidia.slack.com/archives/C08T55LHSG4/p1756943076165769
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.
Type
Projects
Status
In review