Skip to content

[TRTLLM-11508][refactor] decouple MTP num_nextn_predict_layers from max_draft_len#12341

Merged
QiJune merged 3 commits into
NVIDIA:mainfrom
zhaoyangwang-nvidia:refactor-mtp-nlayers
May 8, 2026
Merged

[TRTLLM-11508][refactor] decouple MTP num_nextn_predict_layers from max_draft_len#12341
QiJune merged 3 commits into
NVIDIA:mainfrom
zhaoyangwang-nvidia:refactor-mtp-nlayers

Conversation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator

@zhaoyangwang-nvidia zhaoyangwang-nvidia commented Mar 19, 2026

…ax_draft_len

Summary by CodeRabbit

  • Chores
    • Updated multi-token prediction (MTP) speculative decoding configuration: the num_nextn_predict_layers parameter has been replaced with max_draft_len across all configuration files and examples. Users must update their MTP configurations to use the new parameter name with equivalent values.

Description

The internal field num_nextn_predict_layers_from_model_config has been removed and replaced by num_nextn_predict_layers.

The original num_nextn_predict_layers field in MTPDecodingConfig, which conflated two separate concerns, was split into two fields with clear responsibilities:

Field Source Role
max_draft_len User-facing Controls how many draft tokens to produce
num_nextn_predict_layers Auto-populated from model (internal) How many MTP layers actually exist in the checkpoint

Parameter Logic Per Mode

Eagle MTP (e.g. DeepSeek-V3, model has only 1 MTP layer)

• num_nextn_predict_layers = 1 (read from model)
• max_draft_len = N (set by user, default 1)
• Behavior: runs the single MTP layer N times, producing N draft tokens

Vanilla MTP (model has multiple MTP layers)

• num_nextn_predict_layers = M (read from model)
• User does not set max_draft_len → automatically uses M, runs all layers
• User explicitly sets max_draft_len = N:
	○ N < M: prints a warning, uses N layers, produces N draft tokens

N >= M: uses M, produces M draft tokens

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39550 [ run ] triggered by Bot. Commit: dd33dcc Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39558 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39558 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30775 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39583 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39583 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30795 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39665 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39665 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30869 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39717 [ run ] triggered by Bot. Commit: 4289529 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39717 [ run ] completed with state SUCCESS. Commit: 4289529
/LLM/main/L0_MergeRequest_PR pipeline #30914 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39735 [ run ] triggered by Bot. Commit: 4289529 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39735 [ run ] completed with state SUCCESS. Commit: 4289529
/LLM/main/L0_MergeRequest_PR pipeline #30930 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39785 [ run ] triggered by Bot. Commit: 5187058 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39785 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 3/21.

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39813 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39813 [ run ] completed with state SUCCESS. Commit: a79ca0f
/LLM/main/L0_MergeRequest_PR pipeline #30990 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39820 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39820 [ run ] completed with state SUCCESS. Commit: a79ca0f
/LLM/main/L0_MergeRequest_PR pipeline #30997 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46997 [ run ] triggered by Bot. Commit: 4d892ab Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46997 [ run ] completed with state SUCCESS. Commit: 4d892ab
/LLM/main/L0_MergeRequest_PR pipeline #36974 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47021 [ run ] triggered by Bot. Commit: 4d892ab Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47021 [ run ] completed with state SUCCESS. Commit: 4d892ab
/LLM/main/L0_MergeRequest_PR pipeline #36997 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47068 [ run ] triggered by Bot. Commit: 1bf3eca Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47068 [ run ] completed with state SUCCESS. Commit: 1bf3eca
/LLM/main/L0_MergeRequest_PR pipeline #37041 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47102 [ run ] triggered by Bot. Commit: 1bf3eca Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47102 [ run ] completed with state SUCCESS. Commit: 1bf3eca
/LLM/main/L0_MergeRequest_PR pipeline #37071 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

⚠️ Bot command ignored: The /bot command must appear at the very beginning of the comment (no leading blank lines or spaces). Please post a new comment with /bot as the first character.

@zhaoyangwang-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47158 [ run ] triggered by Bot. Commit: e136f55 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47158 [ run ] completed with state SUCCESS. Commit: e136f55
/LLM/main/L0_MergeRequest_PR pipeline #37118 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xxi-nv
Copy link
Copy Markdown
Collaborator

xxi-nv commented May 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47208 [ run ] triggered by Bot. Commit: e136f55 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47208 [ run ] completed with state SUCCESS. Commit: e136f55
/LLM/main/L0_MergeRequest_PR pipeline #37164 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@xxi-nv
Copy link
Copy Markdown
Collaborator

xxi-nv commented May 8, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47266 [ run ] triggered by Bot. Commit: e136f55 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47266 [ run ] completed with state SUCCESS. Commit: e136f55
/LLM/main/L0_MergeRequest_PR pipeline #37210 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@QiJune QiJune merged commit 517203a into NVIDIA:main May 8, 2026
6 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026
zhaoyangwang-nvidia added a commit to zhaoyangwang-nvidia/TensorRT-LLM that referenced this pull request May 22, 2026
The decoupling refactor (NVIDIA#12341) made num_nextn_predict_layers a
checkpoint-only property and clamped max_draft_len to it for vanilla MTP.
This silently dropped the shared-weights hack that lets vanilla run on
single-MTP-layer checkpoints (e.g., DeepSeek-V3-Lite) with
max_draft_len > ckpt MTP count, breaking the lone CI coverage of the
vanilla path (TestDeepSeekV3Lite::test_fp8_block_scales[mtp=vanilla]).

Restore the hack by expanding pretrained_config.num_nextn_predict_layers
to max_draft_len before model construction when use_mtp_vanilla=True,
preserving the original checkpoint count as _ckpt_num_nextn_predict_layers
for weight loader mod-indexing and FP8 exclude_modules duplication.
Updates the affected modeling files (deepseekv3, glm, nemotron_h,
exaone_moe) to read ckpt_nextn from the preserved field, with a fallback
to num_nextn_predict_layers when no expansion happened.

Remove the waive for the failing test.

Signed-off-by: ZhaoyangWang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.