[None][fix] skip inference_mode() when torch.compile=True for gemma3 fp8#12367
Conversation
54ea19a to
1505b7d
Compare
📝 WalkthroughWalkthroughThis pull request adds a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/models/modeling_gemma3.py`:
- Line 1: Add the required NVIDIA Apache-2.0 copyright/license header as the
very first lines of tensorrt_llm/_torch/models/modeling_gemma3.py (i.e., place
it before the existing executable statement "import functools"); include the
correct NVIDIA copyright statement with the year of latest meaningful
modification and the full Apache License 2.0 notice used across the repo so the
file conforms to the repository header policy.
- Around line 35-39: The compiled branch in the wrapper function (guarded by
torch.compiler.is_compiling()) directly calls func and thus loses the grad-off
semantics applied in the non-compiled branch via torch.inference_mode(); update
wrapper so that when torch.compiler.is_compiling() is True it executes func
inside a torch.no_grad() context (i.e., wrap the call to func in
torch.no_grad()) to ensure consistent gradient-disabled behavior between
compiled and non-compiled paths while keeping the existing
torch.inference_mode() for the non-compiled branch.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a89e8eed-7b9d-4d11-b09d-62255ca821af
📒 Files selected for processing (4)
tensorrt_llm/_torch/models/modeling_gemma3.pytests/integration/defs/accuracy/test_llm_api_pytorch.pytests/integration/test_lists/qa/llm_function_core.txttests/integration/test_lists/test-db/l0_h100.yml
|
Couple of questions based on MR description:
|
|
4cfe756 to
8dbcb50
Compare
|
/bot run |
|
PR_Github #42585 [ run ] triggered by Bot. Commit: |
|
PR_Github #42585 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42648 [ run ] triggered by Bot. Commit: |
|
PR_Github #42648 [ run ] completed with state
|
|
/bot run |
Signed-off-by: Anurag Mukkara <[email protected]>
Signed-off-by: Anurag Mukkara <[email protected]>
Signed-off-by: Anurag Mukkara <[email protected]>
|
/bot run |
|
/bot run |
|
/bot kill |
|
PR_Github #42736 [ kill ] completed with state |
|
/bot run |
|
PR_Github #42737 [ run ] triggered by Bot. Commit: |
|
PR_Github #42737 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42764 [ run ] triggered by Bot. Commit: |
|
PR_Github #42764 [ run ] completed with state |
Summary by CodeRabbit
Release Notes
Refactor
Tests
Description
Add conditional decorator that skips torch.inference_mode() when inside torch.compile dynamo trace.
RuntimeError: Cannot set version_counter for inference tensor.Set maybe_execute_in_parallel(.., disable_on_compile=True) in QKNormRopeAttention following the pattern of [https://nvbugs/6029220][fix] Disable multi-stream in maybe_execute_i… #12659
Test Coverage
Parameterize
tests/integration/defs/accuracy/test_llm_api_pytorch.py::test_fp8_prequantizedwithtorch_compile=[True, False]PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.