[TRTLLM-11246][feat] Add tool parser support for GLM-4 models by JunyiXu-nv · Pull Request #11986 · NVIDIA/TensorRT-LLM

JunyiXu-nv · 2026-03-06T13:52:10Z

Summary by CodeRabbit

New Features
- Added GLM-4 tool calling support with real-time streaming XML-to-JSON conversion
- Implemented type-aware argument extraction with robust fallback parsing strategies
- Added automatic type inference from tool definitions for improved argument handling and validation
Tests
- Introduced comprehensive test suite for GLM-4 tool parser covering detection, streaming, multi-tool scenarios, and edge cases

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Junyi Xu <[email protected]>

JunyiXu-nv · 2026-03-06T14:03:49Z

/bot run

tensorrt-cicd · 2026-03-06T14:10:13Z

PR_Github #38034 [ run ] triggered by Bot. Commit: b08fe7e Link to invocation

tensorrt-cicd · 2026-03-06T14:35:20Z

PR_Github #38034 [ run ] completed with state FAILURE. Commit: b08fe7e
/LLM/main/L0_MergeRequest_PR pipeline #29460 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Junyi Xu <[email protected]>

JunyiXu-nv · 2026-03-09T02:37:22Z

/bot run

coderabbitai · 2026-03-09T02:40:15Z

📝 Walkthrough

Walkthrough

Introduces a new GLM-4 tool parser module implementing streaming XML-to-JSON parsing for GLM-4 tool calls. The implementation includes a state machine for incremental parsing, type inference from tool definitions, and argument parsing with multiple fallback strategies integrated into the existing parser factory.

Changes

Cohort / File(s)	Summary
GLM-4 Tool Parser Implementation `tensorrt_llm/serve/tool_parser/glm4_parser.py`	New module implementing Glm4ToolParser class with StreamState enum and helper functions. Core features: streaming XML-to-JSON conversion via state machine, type inference from tool definitions, argument parsing with JSON/ast.literal_eval/quoting fallbacks, and both one-shot and streaming detection/parsing modes.
Parser Factory Registration `tensorrt_llm/serve/tool_parser/tool_parser_factory.py`	Registers new Glm4ToolParser in ToolParserFactory parsers mapping under "glm4" key via import and factory extension.
Type Inference Utilities `tensorrt_llm/serve/tool_parser/utils.py`	Adds infer_type_from_json_schema() function to extract parameter types from JSON Schema definitions, supporting type, anyOf/oneOf, enum, allOf, properties, and items structures.
Test Suite `tests/unittest/llmapi/apps/test_tool_parsers.py`	Adds comprehensive TestGlm4ToolParser test class covering streaming parsing, multi-tool handling, parameter parsing, number type coercion, format compliance, and structural tagging verification. (Note: test suite appears duplicated in diff.)

Sequence Diagram

sequenceDiagram
    participant Stream as Text Stream
    participant Parser as Glm4ToolParser
    participant State as StreamState
    participant Tools as Tool Definitions
    participant Output as Parsed Result

    Stream->>Parser: parse_streaming_increment(new_text)
    
    loop For Each Character
        Parser->>State: Update state (INIT → BETWEEN → IN_KEY → WAITING_VALUE → IN_VALUE)
        Parser->>Parser: Accumulate XML fragments
    end
    
    alt Tool Call Complete
        Parser->>Parser: Extract func_name & arguments
        Parser->>Tools: get_argument_type(func_name, arg_key)
        Tools-->>Parser: Expected type (string/number/integer/boolean)
        Parser->>Parser: parse_arguments(value, inferred_type) with fallbacks
        Parser->>Output: Emit ToolCallItem with parsed args
    else Partial Data
        Parser->>Output: Return intermediate state
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description is incomplete—it only contains the template structure without filling in the required Description and Test Coverage sections.	Fill in the Description section with details about the GLM-4 tool parser implementation and the Test Coverage section listing relevant tests that validate the changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding tool parser support for GLM-4 models, with appropriate format including JIRA ticket and feature type.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

tensorrt_llm/serve/tool_parser/glm4_parser.py (2)

477-480: Minor: Duplicate logic branches.

Lines 477-478 and 479-480 have identical logic. Consider consolidating:

Suggested simplification

             elif arg_type == "string":
                 if isinstance(parsed_value, str):
                     arguments[arg_key] = parsed_value
                 elif isinstance(parsed_value, (dict, list)):
                     arguments[arg_key] = json.dumps(parsed_value, ensure_ascii=False)
                 else:
                     arguments[arg_key] = str(parsed_value)
-            elif arg_type is None:
-                arguments[arg_key] = parsed_value if is_good_json else arg_value
             else:
+                # Covers arg_type is None and all other types
                 arguments[arg_key] = parsed_value if is_good_json else arg_value

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/tool_parser/glm4_parser.py` around lines 477 - 480, The
two branches with identical bodies can be collapsed: remove the duplicate
elif/else and replace with a single assignment that sets arguments[arg_key] =
parsed_value if is_good_json else arg_value (using the existing
arg_type/arg_key/parsed_value/is_good_json/arg_value variables) so the logic is
not repeated in the function containing this code; ensure any needed handling
for arg_type being None is preserved by performing the assignment once in place
of both branches.

162-164: Consider narrowing exception handling.

Per coding guidelines, avoid broad exception handling. While catching all exceptions at parsing boundaries prevents crashes from malformed model output, consider catching more specific exceptions (e.g., re.error, json.JSONDecodeError, KeyError, AttributeError) to avoid masking unexpected bugs.

Suggested narrowing

-        except Exception as e:
+        except (re.error, json.JSONDecodeError, KeyError, AttributeError, IndexError) as e:
             logger.error(f"Error in detect_and_parse: {e}")
             return StreamingParseResult(normal_text=text)

The same applies to lines 440-441 and 455-457.

tensorrt_llm/serve/tool_parser/utils.py (1)

5-5: Consider using built-in generics for Python 3.10+.

Per codebase conventions (Python 3.10+), you can use dict and Optional from builtins instead of importing from typing. However, since Any still requires the typing import, this is a minor nit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/tool_parser/utils.py` at line 5, The import in utils.py
currently pulls Dict and Optional from typing; update it to use built-in
generics and PEP 604 union syntax: import only Any from typing (e.g., "from
typing import Any"), then change any type annotations that used Dict[...] to use
built-in dict[...] and change Optional[T] to T | None (or T | None with Any
where applicable) so you no longer import Dict/Optional from typing; keep Any
imported from typing as needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/serve/tool_parser/glm4_parser.py`:
- Line 1: This file (glm4_parser.py) is missing the required NVIDIA copyright
header and Apache-2.0 license block; add the standard NVIDIA copyright header at
the top of the file (replacing or preceding the existing attribution comment),
include the correct year for the latest meaningful modification, and ensure the
full Apache License 2.0 notice and SPDX identifier are present so the file
header matches other TensorRT-LLM source files.

---

Nitpick comments:
In `@tensorrt_llm/serve/tool_parser/glm4_parser.py`:
- Around line 477-480: The two branches with identical bodies can be collapsed:
remove the duplicate elif/else and replace with a single assignment that sets
arguments[arg_key] = parsed_value if is_good_json else arg_value (using the
existing arg_type/arg_key/parsed_value/is_good_json/arg_value variables) so the
logic is not repeated in the function containing this code; ensure any needed
handling for arg_type being None is preserved by performing the assignment once
in place of both branches.

In `@tensorrt_llm/serve/tool_parser/utils.py`:
- Line 5: The import in utils.py currently pulls Dict and Optional from typing;
update it to use built-in generics and PEP 604 union syntax: import only Any
from typing (e.g., "from typing import Any"), then change any type annotations
that used Dict[...] to use built-in dict[...] and change Optional[T] to T | None
(or T | None with Any where applicable) so you no longer import Dict/Optional
from typing; keep Any imported from typing as needed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bf922efa-dc3e-4fbd-8698-4b98195daa1a

📥 Commits

Reviewing files that changed from the base of the PR and between 5b0c956 and 59b1f11.

📒 Files selected for processing (4)

tensorrt_llm/serve/tool_parser/glm4_parser.py
tensorrt_llm/serve/tool_parser/tool_parser_factory.py
tensorrt_llm/serve/tool_parser/utils.py
tests/unittest/llmapi/apps/test_tool_parsers.py

tensorrt-cicd · 2026-03-09T02:43:36Z

PR_Github #38177 [ run ] triggered by Bot. Commit: 59b1f11 Link to invocation

tensorrt-cicd · 2026-03-09T07:51:53Z

PR_Github #38177 [ run ] completed with state SUCCESS. Commit: 59b1f11
/LLM/main/L0_MergeRequest_PR pipeline #29576 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-03-09T08:03:13Z

/bot run

tensorrt-cicd · 2026-03-09T08:08:28Z

PR_Github #38228 [ run ] triggered by Bot. Commit: 59b1f11 Link to invocation

tensorrt-cicd · 2026-03-09T11:05:05Z

PR_Github #38228 [ run ] completed with state SUCCESS. Commit: 59b1f11
/LLM/main/L0_MergeRequest_PR pipeline #29616 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Link to invocation

Superjomn

LGTM

…#11986) Signed-off-by: Junyi Xu <[email protected]>

[TRTLLM-11246][feat] Add tool parser support for GLM-4 models

b08fe7e

Signed-off-by: Junyi Xu <[email protected]>

JunyiXu-nv requested a review from a team as a code owner March 6, 2026 13:52

JunyiXu-nv requested a review from Superjomn March 6, 2026 13:52

github-actions Bot assigned JunyiXu-nv Mar 6, 2026

JunyiXu-nv requested a review from LinPoly March 6, 2026 14:03

Fix nosec

59b1f11

Signed-off-by: Junyi Xu <[email protected]>

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

Comment thread tensorrt_llm/serve/tool_parser/glm4_parser.py

Superjomn approved these changes Mar 10, 2026

View reviewed changes

Superjomn merged commit b411e14 into NVIDIA:main Mar 10, 2026
7 checks passed

bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Mar 12, 2026

[TRTLLM-11246][feat] Add tool parser support for GLM-4 models (NVIDIA…

27847db

…#11986) Signed-off-by: Junyi Xu <[email protected]>

tianyuz-nv pushed a commit to wanqian-nv/TensorRT-LLM that referenced this pull request Mar 19, 2026

[TRTLLM-11246][feat] Add tool parser support for GLM-4 models (NVIDIA…

07f5beb

…#11986) Signed-off-by: Junyi Xu <[email protected]>

limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026

[TRTLLM-11246][feat] Add tool parser support for GLM-4 models (NVIDIA…

100de41

…#11986) Signed-off-by: Junyi Xu <[email protected]>

longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026

[TRTLLM-11246][feat] Add tool parser support for GLM-4 models (NVIDIA…

7223e78

…#11986) Signed-off-by: Junyi Xu <[email protected]>

Conversation

JunyiXu-nv commented Mar 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

JunyiXu-nv commented Mar 6, 2026

Uh oh!

tensorrt-cicd commented Mar 6, 2026

Uh oh!

tensorrt-cicd commented Mar 6, 2026

Uh oh!

JunyiXu-nv commented Mar 9, 2026

Uh oh!

coderabbitai Bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

JunyiXu-nv commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

Superjomn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JunyiXu-nv commented Mar 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 9, 2026 •

edited

Loading