Skip to content

[None][fix] Optimize TorchSampler process_logprobs#13380

Merged
tongyuantongyu merged 2 commits into
NVIDIA:mainfrom
tongyuantongyu:ytong/sampler-logprob-perf
Apr 30, 2026
Merged

[None][fix] Optimize TorchSampler process_logprobs#13380
tongyuantongyu merged 2 commits into
NVIDIA:mainfrom
tongyuantongyu:ytong/sampler-logprob-perf

Conversation

@tongyuantongyu
Copy link
Copy Markdown
Member

@tongyuantongyu tongyuantongyu commented Apr 23, 2026

Summary by CodeRabbit

  • Refactor
    • Optimized logprob processing for improved performance efficiency in token sampling operations.

Description

The old code

k=max(requests[req_id].py_num_logprobs for req_id in group_req_indices),

is accessing the torch tensor group_req_indices in a for-each-request loop which is slow. Rewrite the logic to avoid that. Also merged 3 loops into one and simplified the logic.

Test Coverage

No functional change, behavior guarded by existed logprobs tests.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@tongyuantongyu tongyuantongyu self-assigned this Apr 23, 2026
@tongyuantongyu tongyuantongyu requested a review from a team as a code owner April 23, 2026 10:56
@tongyuantongyu tongyuantongyu force-pushed the ytong/sampler-logprob-perf branch from 7468dd6 to 33fe7e1 Compare April 23, 2026 10:57
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

The logprob processing in the sampler is refactored to precompute request index partitions in Python, separating non-beam-search and beam-search requests upfront. The top-k operation now uses the precomputed maximum logprobs value instead of computing it from group indices, and beam-search logprob scattering uses precomputed partition lists directly.

Changes

Cohort / File(s) Summary
Logprob Partitioning and Processing
tensorrt_llm/_torch/pyexecutor/sampler.py
Refactored logprob handling to precompute request index partitions for non-beam-search and beam-search requests; modified torch.topk parameter k to use precomputed max_num_logprobs_no_beam_search; updated beam-search logprob scatter execution guard to check precomputed partition list existence instead of feature flag.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: optimizing the TorchSampler's process_logprobs function, which directly relates to the code changes refactoring logprob processing logic.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly explains the performance optimization: removes slow per-request torch tensor access, merges three loops into one, and simplifies logic. It provides context with a GitHub link and confirms no functional change with existing test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@tongyuantongyu tongyuantongyu requested review from ixlmar and stnie and removed request for leslie-fang25 April 23, 2026 10:58
@tongyuantongyu
Copy link
Copy Markdown
Member Author

/bot run

Comment thread tensorrt_llm/_torch/pyexecutor/sampler.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/sampler.py Outdated
@tongyuantongyu tongyuantongyu force-pushed the ytong/sampler-logprob-perf branch from 33fe7e1 to 9f7cb9f Compare April 24, 2026 03:15
@tongyuantongyu tongyuantongyu requested a review from stnie April 24, 2026 06:02
@tongyuantongyu tongyuantongyu force-pushed the ytong/sampler-logprob-perf branch from 9f7cb9f to 5464f4e Compare April 24, 2026 06:02
@tongyuantongyu
Copy link
Copy Markdown
Member Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45345 [ run ] triggered by Bot. Commit: 5464f4e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45345 [ run ] completed with state SUCCESS. Commit: 5464f4e
/LLM/main/L0_MergeRequest_PR pipeline #35591 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@tongyuantongyu tongyuantongyu force-pushed the ytong/sampler-logprob-perf branch from 5464f4e to 2087607 Compare April 27, 2026 02:33
@tongyuantongyu
Copy link
Copy Markdown
Member Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45625 [ run ] triggered by Bot. Commit: 2087607 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45625 [ run ] completed with state SUCCESS. Commit: 2087607
/LLM/main/L0_MergeRequest_PR pipeline #35838 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@tongyuantongyu
Copy link
Copy Markdown
Member Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45719 [ run ] triggered by Bot. Commit: 2087607 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45719 [ run ] completed with state SUCCESS. Commit: 2087607
/LLM/main/L0_MergeRequest_PR pipeline #35920 completed with status: 'SUCCESS'

CI Report

Link to invocation

@tongyuantongyu tongyuantongyu merged commit 0c9a668 into NVIDIA:main Apr 30, 2026
5 checks passed
evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request May 4, 2026
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants