[None][fix] Optimize TorchSampler process_logprobs by tongyuantongyu · Pull Request #13380 · NVIDIA/TensorRT-LLM

tongyuantongyu · 2026-04-23T10:56:18Z

Summary by CodeRabbit

Refactor
- Optimized logprob processing for improved performance efficiency in token sampling operations.

Description

The old code

TensorRT-LLM/tensorrt_llm/_torch/pyexecutor/sampler.py

Line 4174 in 54c3915

k=max(requests[req_id].py_num_logprobs for req_id in group_req_indices),

is accessing the torch tensor group_req_indices in a for-each-request loop which is slow. Rewrite the logic to avoid that. Also merged 3 loops into one and simplified the logic.

Test Coverage

No functional change, behavior guarded by existed logprobs tests.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-23T10:57:39Z

📝 Walkthrough

Walkthrough

The logprob processing in the sampler is refactored to precompute request index partitions in Python, separating non-beam-search and beam-search requests upfront. The top-k operation now uses the precomputed maximum logprobs value instead of computing it from group indices, and beam-search logprob scattering uses precomputed partition lists directly.

Changes

Cohort / File(s)	Summary
Logprob Partitioning and Processing `tensorrt_llm/_torch/pyexecutor/sampler.py`	Refactored logprob handling to precompute request index partitions for non-beam-search and beam-search requests; modified torch.topk parameter k to use precomputed `max_num_logprobs_no_beam_search`; updated beam-search logprob scatter execution guard to check precomputed partition list existence instead of feature flag.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: optimizing the TorchSampler's process_logprobs function, which directly relates to the code changes refactoring logprob processing logic.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description clearly explains the performance optimization: removes slow per-request torch tensor access, merges three loops into one, and simplifies logic. It provides context with a GitHub link and confirms no functional change with existing test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tongyuantongyu · 2026-04-23T11:00:23Z

/bot run

tongyuantongyu · 2026-04-24T06:03:25Z

/bot run

tensorrt-cicd · 2026-04-24T06:09:03Z

PR_Github #45345 [ run ] triggered by Bot. Commit: 5464f4e Link to invocation

tensorrt-cicd · 2026-04-24T12:08:47Z

PR_Github #45345 [ run ] completed with state SUCCESS. Commit: 5464f4e
/LLM/main/L0_MergeRequest_PR pipeline #35591 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Yuan Tong <[email protected]>

tongyuantongyu · 2026-04-27T02:33:16Z

/bot run

tensorrt-cicd · 2026-04-27T02:39:21Z

PR_Github #45625 [ run ] triggered by Bot. Commit: 2087607 Link to invocation

tensorrt-cicd · 2026-04-27T07:32:17Z

PR_Github #45625 [ run ] completed with state SUCCESS. Commit: 2087607
/LLM/main/L0_MergeRequest_PR pipeline #35838 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tongyuantongyu · 2026-04-27T10:54:38Z

/bot run

tensorrt-cicd · 2026-04-27T11:00:33Z

PR_Github #45719 [ run ] triggered by Bot. Commit: 2087607 Link to invocation

tensorrt-cicd · 2026-04-27T14:47:24Z

PR_Github #45719 [ run ] completed with state SUCCESS. Commit: 2087607
/LLM/main/L0_MergeRequest_PR pipeline #35920 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Yuan Tong <[email protected]>

tongyuantongyu self-assigned this Apr 23, 2026

tongyuantongyu requested a review from a team as a code owner April 23, 2026 10:56

tongyuantongyu requested a review from leslie-fang25 April 23, 2026 10:56

tongyuantongyu force-pushed the ytong/sampler-logprob-perf branch from 7468dd6 to 33fe7e1 Compare April 23, 2026 10:57

tongyuantongyu requested review from ixlmar and stnie and removed request for leslie-fang25 April 23, 2026 10:58

stnie reviewed Apr 23, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/sampler.py Outdated

Comment thread tensorrt_llm/_torch/pyexecutor/sampler.py Outdated

tongyuantongyu force-pushed the ytong/sampler-logprob-perf branch from 33fe7e1 to 9f7cb9f Compare April 24, 2026 03:15

tongyuantongyu requested a review from stnie April 24, 2026 06:02

tongyuantongyu force-pushed the ytong/sampler-logprob-perf branch from 9f7cb9f to 5464f4e Compare April 24, 2026 06:02

tongyuantongyu added 2 commits April 27, 2026 10:33

[None][fix] Optimize TorchSampler process_logprobs

bc0a9cc

Signed-off-by: Yuan Tong <[email protected]>

cleanup

2087607

Signed-off-by: Yuan Tong <[email protected]>

tongyuantongyu force-pushed the ytong/sampler-logprob-perf branch from 5464f4e to 2087607 Compare April 27, 2026 02:33

tongyuantongyu requested a review from joyang-nv April 28, 2026 02:23

joyang-nv approved these changes Apr 30, 2026

View reviewed changes

tongyuantongyu merged commit 0c9a668 into NVIDIA:main Apr 30, 2026
5 checks passed

evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request May 4, 2026

[None][fix] Optimize TorchSampler process_logprobs (NVIDIA#13380)

161f337

Signed-off-by: Yuan Tong <[email protected]>

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026

[None][fix] Optimize TorchSampler process_logprobs (NVIDIA#13380)

aef4036

Signed-off-by: Yuan Tong <[email protected]>

Conversation

tongyuantongyu commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

tongyuantongyu commented Apr 23, 2026

Uh oh!

Uh oh!

Uh oh!

tongyuantongyu commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

tongyuantongyu commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tongyuantongyu commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tongyuantongyu commented Apr 23, 2026 •

edited

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading