[TRTLLM-11508][refactor] decouple MTP num_nextn_predict_layers from max_draft_len by zhaoyangwang-nvidia · Pull Request #12341 · NVIDIA/TensorRT-LLM

zhaoyangwang-nvidia · 2026-03-19T06:14:29Z

…ax_draft_len

Summary by CodeRabbit

Chores
- Updated multi-token prediction (MTP) speculative decoding configuration: the num_nextn_predict_layers parameter has been replaced with max_draft_len across all configuration files and examples. Users must update their MTP configurations to use the new parameter name with equivalent values.

Description

The internal field num_nextn_predict_layers_from_model_config has been removed and replaced by num_nextn_predict_layers.

The original num_nextn_predict_layers field in MTPDecodingConfig, which conflated two separate concerns, was split into two fields with clear responsibilities:

Field	Source	Role
`max_draft_len`	User-facing	Controls how many draft tokens to produce
`num_nextn_predict_layers`	Auto-populated from model (internal)	How many MTP layers actually exist in the checkpoint

Parameter Logic Per Mode

Eagle MTP (e.g. DeepSeek-V3, model has only 1 MTP layer)

• num_nextn_predict_layers = 1 (read from model)
• max_draft_len = N (set by user, default 1)
• Behavior: runs the single MTP layer N times, producing N draft tokens

Vanilla MTP (model has multiple MTP layers)

• num_nextn_predict_layers = M (read from model)
• User does not set max_draft_len → automatically uses M, runs all layers
• User explicitly sets max_draft_len = N:
	○ N < M: prints a warning, uses N layers, produces N draft tokens

N >= M: uses M, produces M draft tokens

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

zhaoyangwang-nvidia · 2026-03-19T06:18:41Z

/bot run

tensorrt-cicd · 2026-03-19T06:24:26Z

PR_Github #39550 [ run ] triggered by Bot. Commit: dd33dcc Link to invocation

zhaoyangwang-nvidia · 2026-03-19T06:56:17Z

/bot run

tensorrt-cicd · 2026-03-19T07:01:50Z

PR_Github #39558 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

tensorrt-cicd · 2026-03-19T09:12:29Z

PR_Github #39558 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30775 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-19T09:17:45Z

/bot run

tensorrt-cicd · 2026-03-19T09:24:16Z

PR_Github #39583 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

tensorrt-cicd · 2026-03-19T11:18:56Z

PR_Github #39583 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30795 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-20T01:49:21Z

/bot run

tensorrt-cicd · 2026-03-20T01:54:56Z

PR_Github #39665 [ run ] triggered by Bot. Commit: da046c5 Link to invocation

tensorrt-cicd · 2026-03-20T03:50:41Z

PR_Github #39665 [ run ] completed with state SUCCESS. Commit: da046c5
/LLM/main/L0_MergeRequest_PR pipeline #30869 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-20T08:08:18Z

/bot run

tensorrt-cicd · 2026-03-20T08:14:27Z

PR_Github #39717 [ run ] triggered by Bot. Commit: 4289529 Link to invocation

tensorrt-cicd · 2026-03-20T11:40:48Z

PR_Github #39717 [ run ] completed with state SUCCESS. Commit: 4289529
/LLM/main/L0_MergeRequest_PR pipeline #30914 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-20T11:53:57Z

/bot run

tensorrt-cicd · 2026-03-20T11:59:36Z

PR_Github #39735 [ run ] triggered by Bot. Commit: 4289529 Link to invocation

tensorrt-cicd · 2026-03-20T14:26:43Z

PR_Github #39735 [ run ] completed with state SUCCESS. Commit: 4289529
/LLM/main/L0_MergeRequest_PR pipeline #30930 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-21T03:45:56Z

/bot run

tensorrt-cicd · 2026-03-21T03:51:47Z

PR_Github #39785 [ run ] triggered by Bot. Commit: 5187058 Link to invocation

tensorrt-cicd · 2026-03-21T03:51:48Z

PR_Github #39785 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 3/21.

Link to invocation

zhaoyangwang-nvidia · 2026-03-22T08:22:20Z

/bot run

tensorrt-cicd · 2026-03-22T08:28:15Z

PR_Github #39813 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

tensorrt-cicd · 2026-03-22T10:21:33Z

PR_Github #39813 [ run ] completed with state SUCCESS. Commit: a79ca0f
/LLM/main/L0_MergeRequest_PR pipeline #30990 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

zhaoyangwang-nvidia · 2026-03-22T10:41:16Z

/bot run

tensorrt-cicd · 2026-03-22T10:46:50Z

PR_Github #39820 [ run ] triggered by Bot. Commit: a79ca0f Link to invocation

tensorrt-cicd · 2026-03-22T12:45:48Z

PR_Github #39820 [ run ] completed with state SUCCESS. Commit: a79ca0f
/LLM/main/L0_MergeRequest_PR pipeline #30997 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tensorrt-cicd · 2026-05-06T12:45:58Z

PR_Github #46997 [ run ] triggered by Bot. Commit: 4d892ab Link to invocation

tensorrt-cicd · 2026-05-06T14:31:22Z

PR_Github #46997 [ run ] completed with state SUCCESS. Commit: 4d892ab
/LLM/main/L0_MergeRequest_PR pipeline #36974 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

zhaoyangwang-nvidia · 2026-05-06T15:15:39Z

/bot run

tensorrt-cicd · 2026-05-06T15:21:56Z

PR_Github #47021 [ run ] triggered by Bot. Commit: 4d892ab Link to invocation

tensorrt-cicd · 2026-05-06T16:13:38Z

PR_Github #47021 [ run ] completed with state SUCCESS. Commit: 4d892ab
/LLM/main/L0_MergeRequest_PR pipeline #36997 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

zhaoyangwang-nvidia · 2026-05-07T01:18:36Z

/bot run

tensorrt-cicd · 2026-05-07T01:24:11Z

PR_Github #47068 [ run ] triggered by Bot. Commit: 1bf3eca Link to invocation

tensorrt-cicd · 2026-05-07T03:23:58Z

PR_Github #47068 [ run ] completed with state SUCCESS. Commit: 1bf3eca
/LLM/main/L0_MergeRequest_PR pipeline #37041 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

zhaoyangwang-nvidia · 2026-05-07T03:45:47Z

/bot run

tensorrt-cicd · 2026-05-07T03:51:21Z

PR_Github #47102 [ run ] triggered by Bot. Commit: 1bf3eca Link to invocation

tensorrt-cicd · 2026-05-07T05:21:06Z

PR_Github #47102 [ run ] completed with state SUCCESS. Commit: 1bf3eca
/LLM/main/L0_MergeRequest_PR pipeline #37071 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

…ax_draft_len Signed-off-by: ZhaoyangWang <[email protected]>

…ngConfig.num_nextn_predict_layers Signed-off-by: ZhaoyangWang <[email protected]>

Signed-off-by: ZhaoyangWang <[email protected]>

zhaoyangwang-nvidia · 2026-05-07T07:28:44Z

/bot run

github-actions · 2026-05-07T07:28:58Z

⚠️ Bot command ignored: The /bot command must appear at the very beginning of the comment (no leading blank lines or spaces). Please post a new comment with /bot as the first character.

zhaoyangwang-nvidia · 2026-05-07T07:42:07Z

/bot run

tensorrt-cicd · 2026-05-07T07:47:42Z

PR_Github #47158 [ run ] triggered by Bot. Commit: e136f55 Link to invocation

tensorrt-cicd · 2026-05-07T12:13:25Z

PR_Github #47158 [ run ] completed with state SUCCESS. Commit: e136f55
/LLM/main/L0_MergeRequest_PR pipeline #37118 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xxi-nv · 2026-05-07T12:40:49Z

/bot run

tensorrt-cicd · 2026-05-07T12:46:29Z

PR_Github #47208 [ run ] triggered by Bot. Commit: e136f55 Link to invocation

tensorrt-cicd · 2026-05-07T15:31:29Z

PR_Github #47208 [ run ] completed with state SUCCESS. Commit: e136f55
/LLM/main/L0_MergeRequest_PR pipeline #37164 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

xxi-nv · 2026-05-08T01:12:41Z

/bot run

tensorrt-cicd · 2026-05-08T01:19:36Z

PR_Github #47266 [ run ] triggered by Bot. Commit: e136f55 Link to invocation

tensorrt-cicd · 2026-05-08T06:44:14Z

PR_Github #47266 [ run ] completed with state SUCCESS. Commit: e136f55
/LLM/main/L0_MergeRequest_PR pipeline #37210 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

…ax_draft_len (NVIDIA#12341) Signed-off-by: ZhaoyangWang <[email protected]>

The decoupling refactor (NVIDIA#12341) made num_nextn_predict_layers a checkpoint-only property and clamped max_draft_len to it for vanilla MTP. This silently dropped the shared-weights hack that lets vanilla run on single-MTP-layer checkpoints (e.g., DeepSeek-V3-Lite) with max_draft_len > ckpt MTP count, breaking the lone CI coverage of the vanilla path (TestDeepSeekV3Lite::test_fp8_block_scales[mtp=vanilla]). Restore the hack by expanding pretrained_config.num_nextn_predict_layers to max_draft_len before model construction when use_mtp_vanilla=True, preserving the original checkpoint count as _ckpt_num_nextn_predict_layers for weight loader mod-indexing and FP8 exclude_modules duplication. Updates the affected modeling files (deepseekv3, glm, nemotron_h, exaone_moe) to read ckpt_nextn from the preserved field, with a fallback to num_nextn_predict_layers when no expansion happened. Remove the waive for the failing test. Signed-off-by: ZhaoyangWang <[email protected]>

github-actions Bot assigned zhaoyangwang-nvidia Mar 19, 2026

zhaoyangwang-nvidia force-pushed the refactor-mtp-nlayers branch from da046c5 to 4289529 Compare March 20, 2026 08:08

zhaoyangwang-nvidia force-pushed the refactor-mtp-nlayers branch from 4289529 to 5187058 Compare March 21, 2026 03:45

zhaoyangwang-nvidia force-pushed the refactor-mtp-nlayers branch from 5187058 to a79ca0f Compare March 22, 2026 08:22

zhaoyangwang-nvidia added 3 commits May 7, 2026 00:22

[TRTLLM-11508][refactor] decouple MTP num_nextn_predict_layers from m…

d586f50

…ax_draft_len Signed-off-by: ZhaoyangWang <[email protected]>

[TRTLLM-11508][refactor] add backward compat for deprecated MTPDecodi…

81620e5

…ngConfig.num_nextn_predict_layers Signed-off-by: ZhaoyangWang <[email protected]>

Fix new ci issue

e136f55

Signed-off-by: ZhaoyangWang <[email protected]>

zhaoyangwang-nvidia force-pushed the refactor-mtp-nlayers branch from 1bf3eca to e136f55 Compare May 7, 2026 07:25

QiJune merged commit 517203a into NVIDIA:main May 8, 2026
6 checks passed

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026

[TRTLLM-11508][refactor] decouple MTP num_nextn_predict_layers from m…

65295c7

…ax_draft_len (NVIDIA#12341) Signed-off-by: ZhaoyangWang <[email protected]>

zhaoyangwang-nvidia mentioned this pull request May 22, 2026

[https://nvbugs/6195110][fix] Restore MTP vanilla shared-weights path #14457

Open

1 task

Conversation

zhaoyangwang-nvidia commented Mar 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Parameter Logic Per Mode

Eagle MTP (e.g. DeepSeek-V3, model has only 1 MTP layer)

Vanilla MTP (model has multiple MTP layers)

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

zhaoyangwang-nvidia commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 21, 2026

Uh oh!

tensorrt-cicd commented Mar 21, 2026

Uh oh!

tensorrt-cicd commented Mar 21, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented Mar 22, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented Mar 22, 2026

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

zhaoyangwang-nvidia commented May 6, 2026

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

zhaoyangwang-nvidia commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

tensorrt-cicd commented May 7, 2026

Uh oh!

zhaoyangwang-nvidia commented Mar 19, 2026 •

edited by coderabbitai Bot

Loading