[None][feat] Mamba optimization and mixed quantization support for nemotron-h by Wanli-Jiang · Pull Request #11972 · NVIDIA/TensorRT-LLM

Wanli-Jiang · 2026-03-06T04:19:00Z

Features:

Enable FlashInfer for MTP
add stochastic rounding for Mamba SSM cache
Include [None][feat] Super LayerWise Quant #11998
Include [None][fix] Mamba2 Chunking Issue #11959

TODO:

waiting for flashinfer release. For now, use a nightly build flashinfer version.

Summary by CodeRabbit

New Features
- Added Mamba SSM stochastic rounding option to enable improved numerical precision for state updates when using float16 cache with Mamba-based models.
Chores
- Updated package dependency specification to use direct wheel reference instead of strict version pinning.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

NVShreyas · 2026-03-06T16:46:29Z

@Wanli-Jiang I think to test, we can add an extra line in the dockerfile here to install flashinfer nightly -
RUN pip install --force-reinstall --no-deps \ https://github.com/flashinfer-ai/flashinfer/releases/download/nightly-v0.6.5-20260305/flashinfer_python-0.6.5.dev20260305-py3-none-any.whl

Wanli-Jiang · 2026-03-09T05:50:57Z

/bot run --disable-fail-fast

coderabbitai · 2026-03-09T05:55:17Z

📝 Walkthrough

Walkthrough

This pull request introduces support for Mamba SSM stochastic rounding by adding a new configuration field that flows from CLI arguments through config classes to the Mamba2 mixer implementation. The change includes conditional routing logic between FlashInfer and native execution paths based on hardware capabilities and dtype constraints. The FlashInfer dependency is updated to an URL-based installation reference.

Changes

Cohort / File(s)	Summary
Configuration Fields `tensorrt_llm/llmapi/llm_args.py`, `tensorrt_llm/models/modeling_utils.py`	Added `mamba_ssm_stochastic_rounding` boolean field to KvCacheConfig and QuantConfig with default False and documentation indicating it applies to float16 cache dtype.
CLI and Argument Handling `examples/llm-api/quickstart_advanced.py`	Added `--mamba_ssm_stochastic_rounding` CLI flag (store_true, default False) and threaded it into KvCacheConfig initialization via `setup_llm`.
Config Propagation `tensorrt_llm/_torch/pyexecutor/model_loader.py`	Extended `validate_and_set_mamba_ssm_cache_dtype` signature to accept `mamba_ssm_stochastic_rounding` parameter and propagate it to `config.quant_config`.
Implementation Logic `tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`	Implemented conditional routing logic to enable stochastic rounding based on head_dim, dtype (float16), and FlashInfer availability. Added kwargs construction for both MTP and non-MTP execution paths with optional `rand_seed` when stochastic rounding is active. Includes warning when stochastic rounding is requested but unavailable.
Dependency Management `requirements.txt`	Replaced pinned FlashInfer version with URL-based wheel installation reference.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as CLI Parser
    participant KvCache as KvCacheConfig
    participant Loader as ModelLoader
    participant QuantCfg as QuantConfig
    participant Mixer as Mamba2Mixer
    
    User->>CLI: --mamba_ssm_stochastic_rounding flag
    CLI->>KvCache: args.mamba_ssm_stochastic_rounding
    KvCache->>Loader: kv_cache_config.mamba_ssm_stochastic_rounding
    Loader->>QuantCfg: validate_and_set_mamba_ssm_cache_dtype()
    QuantCfg->>QuantCfg: Set mamba_ssm_stochastic_rounding
    QuantCfg->>Mixer: config.mamba_ssm_stochastic_rounding
    
    Mixer->>Mixer: Check head_dim in [64, 128]?
    Mixer->>Mixer: Check dtype == float16?
    Mixer->>Mixer: Check FlashInfer available?
    
    alt All conditions met
        Mixer->>Mixer: _use_stochastic_rounding = True
        Mixer->>Mixer: Add rand_seed to kwargs
    else Conditions not met
        Mixer->>Mixer: _use_stochastic_rounding = False
        Mixer->>Mixer: Emit warning
    end
    
    alt _use_flashinfer enabled
        Mixer->>Mixer: Route to FlashInfer path
    else
        Mixer->>Mixer: Route to native implementation
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	PR description is incomplete and does not follow the template structure, missing detailed Description, Test Coverage, and partially filled checklist sections.	Complete the Description section explaining what and why; detail the Test Coverage section with relevant tests; and fill in all PR Checklist items with clear explanations or confirmations.
Title check	❓ Inconclusive	The PR title mentions 'Mamba optimization and mixed quantization support' but the primary changes focus on enabling FlashInfer for MTP and adding stochastic rounding for Mamba SSM cache, which are more specific to the actual changeset.	Consider using a more precise title that directly references the main feature: enable FlashInfer for MTP and add stochastic rounding for Mamba SSM, or align the title with the actual scope of changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tensorrt_llm/_torch/pyexecutor/model_loader.py (1)

37-52: ⚠️ Potential issue | 🟡 Minor

Fail fast when stochastic rounding resolves to a non-FP16 cache dtype.

"auto" is only resolved here, but the new flag is copied through unconditionally. If the resolved Mamba SSM cache dtype ends up as BF16/FP32, the config still carries an unusable stochastic-rounding request deeper into the runtime instead of rejecting it at the first point where the actual dtype is known.

Suggested fix

 def validate_and_set_mamba_ssm_cache_dtype(
         config: ModelConfig,
         mamba_ssm_cache_dtype: str,
         mamba_ssm_stochastic_rounding: bool = False) -> None:
@@
-    config.quant_config.mamba_ssm_cache_dtype = mamba_ssm_cache_dtype
-    config.quant_config.mamba_ssm_stochastic_rounding = mamba_ssm_stochastic_rounding
+    config.quant_config.mamba_ssm_cache_dtype = mamba_ssm_cache_dtype
+    if mamba_ssm_stochastic_rounding and mamba_ssm_cache_dtype != torch.float16:
+        raise ValueError(
+            "kv_cache_config.mamba_ssm_stochastic_rounding requires "
+            'kv_cache_config.mamba_ssm_cache_dtype="float16"'
+        )
+    config.quant_config.mamba_ssm_stochastic_rounding = mamba_ssm_stochastic_rounding

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/model_loader.py` around lines 37 - 52, In
validate_and_set_mamba_ssm_cache_dtype, after resolving mamba_ssm_cache_dtype
(via str_dtype_to_torch or config.pretrained_config.torch_dtype), immediately
check if mamba_ssm_stochastic_rounding is True and the resolved dtype is not
torch.float16 (FP16); if so, raise a ValueError (or similar) rejecting the
incompatible combination instead of silently storing it on
config.quant_config.mamba_ssm_stochastic_rounding; otherwise continue to set
config.quant_config.mamba_ssm_cache_dtype and mamba_ssm_stochastic_rounding as
before. Ensure you reference the resolved mamba_ssm_cache_dtype and the boolean
mamba_ssm_stochastic_rounding within validate_and_set_mamba_ssm_cache_dtype (and
use ModelConfig/quant_config fields) so the check occurs before writing into
config.quant_config.

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Update copyright year to include 2026.

The copyright header currently shows 2022-2024, but this file has meaningful modifications in 2026. As per coding guidelines, the copyright header should reflect the year of the latest meaningful modification.

Suggested fix

-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py` at line 1, Update the file
copyright header to reflect the latest modification year: change the existing
"2022-2024" string in the top-of-file comment to "2022-2026" so the header reads
"Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES"; locate the header in
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py (top file comment) and perform
the replacement while preserving the SPDX and surrounding comment formatting.

🧹 Nitpick comments (1)

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py (1)
444-453: Minor inconsistency: dt_softplus differs between MTP and non-MTP paths.

The MTP path uses dt_softplus=True (line 405) while the non-MTP path uses dt_softplus=self.delta_softplus (line 449). If this is intentional for speculative decoding behavior, consider adding a brief comment explaining why MTP always uses True.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py` around lines 444 - 453,
The dt_softplus flag is inconsistent between the MTP path (where
dt_softplus=True is hard-coded) and the non-MTP path (where
dt_softplus=self.delta_softplus) around the selective_state_update call; either
make them consistent or document the intentional difference. Locate the MTP
branch that builds ssu_kwargs with dt_softplus=True and the non-MTP branch that
sets dt_softplus=self.delta_softplus (used when calling selective_state_update /
selective_state_update in mamba2_mixer) and add a short inline comment
explaining why MTP forces True for speculative decoding (or change the MTP
assignment to use self.delta_softplus if it should match behavior) so the
difference is explicit and not surprising.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`:
- Line 1: Update the file copyright header to reflect the latest modification
year: change the existing "2022-2024" string in the top-of-file comment to
"2022-2026" so the header reads "Copyright (c) 2022-2026 NVIDIA CORPORATION &
AFFILIATES"; locate the header in
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py (top file comment) and perform
the replacement while preserving the SPDX and surrounding comment formatting.

In `@tensorrt_llm/_torch/pyexecutor/model_loader.py`:
- Around line 37-52: In validate_and_set_mamba_ssm_cache_dtype, after resolving
mamba_ssm_cache_dtype (via str_dtype_to_torch or
config.pretrained_config.torch_dtype), immediately check if
mamba_ssm_stochastic_rounding is True and the resolved dtype is not
torch.float16 (FP16); if so, raise a ValueError (or similar) rejecting the
incompatible combination instead of silently storing it on
config.quant_config.mamba_ssm_stochastic_rounding; otherwise continue to set
config.quant_config.mamba_ssm_cache_dtype and mamba_ssm_stochastic_rounding as
before. Ensure you reference the resolved mamba_ssm_cache_dtype and the boolean
mamba_ssm_stochastic_rounding within validate_and_set_mamba_ssm_cache_dtype (and
use ModelConfig/quant_config fields) so the check occurs before writing into
config.quant_config.

---

Nitpick comments:
In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`:
- Around line 444-453: The dt_softplus flag is inconsistent between the MTP path
(where dt_softplus=True is hard-coded) and the non-MTP path (where
dt_softplus=self.delta_softplus) around the selective_state_update call; either
make them consistent or document the intentional difference. Locate the MTP
branch that builds ssu_kwargs with dt_softplus=True and the non-MTP branch that
sets dt_softplus=self.delta_softplus (used when calling selective_state_update /
selective_state_update in mamba2_mixer) and add a short inline comment
explaining why MTP forces True for speculative decoding (or change the MTP
assignment to use self.delta_softplus if it should match behavior) so the
difference is explicit and not surprising.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0068c16a-62e8-4517-af61-f1083da999f5

📥 Commits

Reviewing files that changed from the base of the PR and between 1074aa9 and 9761690.

📒 Files selected for processing (6)

examples/llm-api/quickstart_advanced.py
requirements.txt
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py
tensorrt_llm/_torch/pyexecutor/model_loader.py
tensorrt_llm/llmapi/llm_args.py
tensorrt_llm/models/modeling_utils.py

tensorrt-cicd · 2026-03-09T05:56:28Z

PR_Github #38209 [ run ] triggered by Bot. Commit: 9761690 Link to invocation

tensorrt-cicd · 2026-03-09T09:45:00Z

PR_Github #38209 [ run ] completed with state SUCCESS. Commit: 9761690
/LLM/main/L0_MergeRequest_PR pipeline #29599 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…r Mamba SSM cache Signed-off-by: Wanli Jiang <[email protected]>

Signed-off-by: Wanli Jiang <[email protected]>

Signed-off-by: Izzy Putterman <[email protected]>

Superjomn

LGTM on the llmapi changes.

Wanli-Jiang · 2026-03-10T04:06:55Z

/bot run --disable-fail-fast

Wanli-Jiang · 2026-03-10T04:07:06Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2026-03-10T04:13:10Z

PR_Github #38368 [ run ] triggered by Bot. Commit: 373dd0a Link to invocation

tensorrt-cicd · 2026-03-10T04:13:49Z

PR_Github #38369 [ run ] triggered by Bot. Commit: 373dd0a Link to invocation

tensorrt-cicd · 2026-03-10T04:13:51Z

PR_Github #38368 [ run ] completed with state ABORTED. Commit: 373dd0a

Link to invocation

Wanli-Jiang · 2026-03-10T04:21:44Z

/bot help

github-actions · 2026-03-10T04:21:50Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Wanli-Jiang · 2026-03-10T04:22:40Z

/bot run --disable-fail-fast --extra-stage "Build-Docker-Images"

tensorrt-cicd · 2026-03-10T04:28:20Z

PR_Github #38372 [ run ] triggered by Bot. Commit: 373dd0a Link to invocation

sunnyqgg

LGTM

QiJune

LGTM for the API change.

Wanli-Jiang · 2026-03-10T12:16:10Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-03-10T12:22:17Z

PR_Github #38437 [ run ] triggered by Bot. Commit: 373dd0a Link to invocation

tensorrt-cicd · 2026-03-10T15:47:41Z

PR_Github #38437 [ run ] completed with state SUCCESS. Commit: 373dd0a
/LLM/main/L0_MergeRequest_PR pipeline #29797 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Wanli Jiang <[email protected]>

Wanli-Jiang · 2026-03-11T02:52:22Z

/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-3,DGX_H100-4_GPUs-PyTorch-Others-2" --disable-fail-fast

tensorrt-cicd · 2026-03-11T02:58:49Z

PR_Github #38524 [ run ] triggered by Bot. Commit: 2e26d16 Link to invocation

tensorrt-cicd · 2026-03-11T05:50:00Z

PR_Github #38524 [ run ] completed with state SUCCESS. Commit: 2e26d16
/LLM/main/L0_MergeRequest_PR pipeline #29870 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

Wanli-Jiang · 2026-03-11T11:19:38Z

/bot skip --comment “Skipped since the duplicated PR12072 is passed CI testing"

tensorrt-cicd · 2026-03-11T11:25:03Z

PR_Github #38584 [ ] completed with state FAILURE. Commit: 2e26d16
Not allowed on merged PR

Link to invocation

…motron-h (NVIDIA#11972) Signed-off-by: Wanli Jiang <[email protected]> Signed-off-by: Izzy Putterman <[email protected]> Co-authored-by: Izzy Putterman <[email protected]>

github-actions Bot assigned Wanli-Jiang Mar 6, 2026

Wanli-Jiang force-pushed the user/williamj/support-stochastic-rounding branch from 6b6516f to 9761690 Compare March 9, 2026 05:46

Wanli-Jiang marked this pull request as ready for review March 9, 2026 05:48

Wanli-Jiang requested review from a team as code owners March 9, 2026 05:48

Wanli-Jiang requested review from kaiyux, nv-guomingz, omera-nv and syuoni March 9, 2026 05:48

Wanli-Jiang commented Mar 9, 2026

View reviewed changes

Comment thread requirements.txt Outdated

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

Wanli-Jiang mentioned this pull request Mar 9, 2026

[None][feat] PR collections for PBR #12026

Closed

1 task

Wanli-Jiang force-pushed the user/williamj/support-stochastic-rounding branch from 9761690 to c49539c Compare March 9, 2026 09:15

Wanli-Jiang commented Mar 9, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py Outdated

nv-guomingz approved these changes Mar 9, 2026

View reviewed changes

Wanli-Jiang and others added 3 commits March 9, 2026 19:09

[None][feat] Enable FlashInfer for MTP and add stochastic rounding fo…

aaeabae

…r Mamba SSM cache Signed-off-by: Wanli Jiang <[email protected]>

[None][feat] Add philox_rounds for mamba cache fp16 stochastic rounding

10c457c

Signed-off-by: Wanli Jiang <[email protected]>

[None][feat] Super LayerWise Quant

e999d84

Signed-off-by: Izzy Putterman <[email protected]>

sunnyqgg reviewed Mar 10, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py Outdated

Superjomn approved these changes Mar 10, 2026

View reviewed changes

kaiyux reviewed Mar 10, 2026

View reviewed changes

Comment thread tensorrt_llm/models/modeling_utils.py

Wanli-Jiang requested a review from lucaslie March 10, 2026 04:04

Wanli-Jiang changed the title ~~[None][feat] Enable FlashInfer for MTP and add stochastic rounding for Mamba SSM cache~~ [None][feat] Mamba optimization and mixed quantization support for nemotron-h Mar 10, 2026

sunnyqgg approved these changes Mar 10, 2026

View reviewed changes

QiJune approved these changes Mar 10, 2026

View reviewed changes

Wanli-Jiang added 2 commits March 10, 2026 18:47

Upgrade flashinfer-python to 0.6.6

a58f7c5

Signed-off-by: Wanli Jiang <[email protected]>

Fix CI test failure

2e26d16

Signed-off-by: Wanli Jiang <[email protected]>

tijyojwad approved these changes Mar 11, 2026

View reviewed changes

yihwang-nv mentioned this pull request Mar 11, 2026

[None][chore] Update flashinfer to 0.6.6 #12094

Closed

1 task

Wanli-Jiang enabled auto-merge (squash) March 11, 2026 11:22

Wanli-Jiang merged commit 73fca4e into NVIDIA:main Mar 11, 2026
8 checks passed

Wanli-Jiang mentioned this pull request Mar 11, 2026

[None][feat] Duplicated PR for CI tests #12072

Closed

1 task

Conversation

Wanli-Jiang commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

NVShreyas commented Mar 6, 2026

Uh oh!

Uh oh!

Wanli-Jiang commented Mar 9, 2026

Uh oh!

coderabbitai Bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

Uh oh!

Superjomn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Wanli-Jiang commented Mar 10, 2026

Uh oh!

Wanli-Jiang commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

Wanli-Jiang commented Mar 10, 2026

Uh oh!

github-actions Bot commented Mar 10, 2026

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

Wanli-Jiang commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

sunnyqgg left a comment

Choose a reason for hiding this comment

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

Wanli-Jiang commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

Wanli-Jiang commented Mar 11, 2026

Uh oh!

tensorrt-cicd commented Mar 11, 2026

Uh oh!

tensorrt-cicd commented Mar 11, 2026

Uh oh!

Wanli-Jiang commented Mar 11, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 11, 2026

Uh oh!

Wanli-Jiang commented Mar 6, 2026 •

edited

Loading

coderabbitai Bot commented Mar 9, 2026 •

edited

Loading