Skip to content

[Feature]: Review logits post-processing performance #7532

@nzmora-nvidia

Description

@nzmora-nvidia

🚀 The feature, motivation and pitch

@lucaslie found differences between AD's and PT's logits post-processing implementations and we should study these to understand the perf implications of the 2 implementations.

Slack: https://nvidia.slack.com/archives/C08T55LHSG4/p1756943076165769

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Labels

AutoDeploy<NV> AutoDeploy BackendPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.

Type

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions