[None][fix] Add streaming support to no </think> for nemotron model#12176
Conversation
Signed-off-by: Joyjit Daw <[email protected]> Made-with: Cursor
Override finish() in NemotronV3ReasoningParser so that when force_nonempty_content is not set and no </think> closing tag is found, reasoning stays in reasoning_content and content remains empty. When force_nonempty_content is set, the parent behavior is preserved (reasoning moved to content). Also clears accumulated reasoning on closing tag detection to reduce memory. Signed-off-by: Joyjit Daw <[email protected]> Made-with: Cursor
📝 WalkthroughWalkthroughThe PR introduces streaming finish handling to reasoning parsers by adding Changes
Sequence Diagram(s)sequenceDiagram
participant Streaming as Streaming Handler
participant PostProc as PostProcess Handler
participant Parser as Reasoning Parser
participant State as Parser State
Streaming->>PostProc: apply_reasoning_parser(text, streaming=true, finished=false)
PostProc->>Parser: parse_delta(text)
Parser->>State: Accumulate reasoning segment
Parser-->>PostProc: (content, reasoning_content)
PostProc-->>Streaming: (processed_content, reasoning)
Note over Streaming: Stream continues...
Streaming->>PostProc: apply_reasoning_parser(text, streaming=true, finished=true)
PostProc->>Parser: parse_delta(final_text)
PostProc->>Parser: finish()
Parser->>State: Flush accumulated_reasoning
Parser->>State: Check closing tag status
Parser-->>PostProc: ReasoningParserResult
PostProc->>PostProc: Merge finish result
PostProc-->>Streaming: (final_content, final_reasoning)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unittest/llmapi/test_reasoning_parser.py (1)
208-225: Flake8 indentation warnings.Static analysis flagged indentation issues at lines 51, 106, and 225. These are minor formatting issues but should be addressed for consistency.
Suggested fix for line 225
`@pytest.mark.parametrize`(("delta_texts", "finish_content", "finish_reasoning", "reasoning_completed", "chat_template_kwargs"), [ - (["a", "b"], "", "", False, None), + (["a", "b"], "", "", False, None),Apply similar fixes to lines 51 and 106 to align with the hanging indent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/llmapi/test_reasoning_parser.py` around lines 208 - 225, The parametrize block for the test (pytest.mark.parametrize with parameter names "delta_texts", "finish_content", "finish_reasoning", "reasoning_completed", "chat_template_kwargs") has inconsistent hanging indent causing Flake8 warnings; reformat the multi-line tuple and the list of parameter tuples so continuation lines align under the first opening parenthesis (or use a consistent hanging indent style) and apply the same alignment fix to the other similar blocks referenced around the symbols R1_END and the parameter sets to eliminate the indentation warnings at the three reported locations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/llmapi/reasoning_parser.py`:
- Around line 234-244: The finish() implementation in NemotronV3 clears
_accumulated_reasoning when the parser is in_reasoning and no closing tag was
found, but only returns the remaining _buffer as reasoning_content, losing
accumulated content; modify finish() so that when the early-return branch runs
(not _force_nonempty_content and in_reasoning and not _found_closing_tag) it
combines _accumulated_reasoning with _buffer (e.g., concatenate non-empty pieces
or prefer accumulated if present) into the returned
ReasoningParserResult(reasoning_content=...) before clearing state, ensuring
behavior matches DeepSeekR1Parser.finish(); keep uses of _buffer,
_accumulated_reasoning, in_reasoning and _found_closing_tag and still
call/return _maybe_swap_content(super().finish()) in the other path.
---
Nitpick comments:
In `@tests/unittest/llmapi/test_reasoning_parser.py`:
- Around line 208-225: The parametrize block for the test
(pytest.mark.parametrize with parameter names "delta_texts", "finish_content",
"finish_reasoning", "reasoning_completed", "chat_template_kwargs") has
inconsistent hanging indent causing Flake8 warnings; reformat the multi-line
tuple and the list of parameter tuples so continuation lines align under the
first opening parenthesis (or use a consistent hanging indent style) and apply
the same alignment fix to the other similar blocks referenced around the symbols
R1_END and the parameter sets to eliminate the indentation warnings at the three
reported locations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8c0a48ce-7a10-49e5-a90d-b2c12692695a
📒 Files selected for processing (4)
tensorrt_llm/llmapi/reasoning_parser.pytensorrt_llm/serve/postprocess_handlers.pytensorrt_llm/serve/responses_utils.pytests/unittest/llmapi/test_reasoning_parser.py
Signed-off-by: Joyjit Daw <[email protected]>
Signed-off-by: Joyjit Daw <[email protected]>
|
/bot run --disable-fail-fast |
|
PR_Github #38887 [ run ] triggered by Bot. Commit: |
Signed-off-by: Joyjit Daw <[email protected]>
1d6fc47 to
78d36d5
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #38895 [ run ] triggered by Bot. Commit: |
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot run |
|
PR_Github #38926 [ run ] triggered by Bot. Commit: |
|
PR_Github #38926 [ run ] completed with state |
|
PR_Github #38895 [ run ] completed with state
|
|
/bot run --reuse-test 30203 |
|
PR_Github #38928 [ run ] triggered by Bot. Commit: |
|
PR_Github #38928 [ run ] completed with state |
|
/bot run --reuse-test 30203 |
|
PR_Github #38952 [ run ] triggered by Bot. Commit: |
|
PR_Github #38952 [ run ] completed with state |
…VIDIA#12176) Signed-off-by: Joyjit Daw <[email protected]>
…VIDIA#12176) Signed-off-by: Joyjit Daw <[email protected]>
Summary by CodeRabbit
New Features
Bug Fixes
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.