[None][test] Add E2E logprobs test for disaggregated serving via OpenAI API by yingguo-trt · Pull Request #12275 · NVIDIA/TensorRT-LLM

yingguo-trt · 2026-03-17T07:32:44Z

Summary by CodeRabbit

New Features
- Added support for optional logprobs field in chat completion responses.
- Added new disaggregated serving configurations for Llama 3.1 and Deepseek models with UCX backend support.
Tests
- Added integration test for logprobs serving in disaggregated inference scenarios.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…serving Signed-off-by: yingguo-trt <[email protected]>

Signed-off-by: yingguo-trt <[email protected]>

…etter coverage - Rename to test_disaggregated_logprobs_serving to distinguish from the single-GPU variant in test_disaggregated_single_gpu.py - Switch model from DeepSeek-V3-Lite-fp8 (MoE) to Llama-3.1-8B-Instruct (dense) to avoid CUTLASS DeepGemm JIT incompatibility on Blackwell GPUs - Remove skip_pre_hopper since dense model doesn't require Hopper - Remove streaming/api_type parametrize to avoid 4x cluster startups; test all combinations (completions/chat x streaming/non-streaming) within a single cluster session - Add streaming vs non-streaming consistency check with logprob comparison - Extract SSE parsing and logprobs validation into helper functions - Update test list entries in all CI config files Signed-off-by: yingguo-trt <[email protected]>

Signed-off-by: yingguo-trt <[email protected]>

When logprobs=True but top_logprobs is not requested, ChatCompletionLogProbsContent left top_logprobs as None, causing Pydantic validation to fail. Set top_logprobs=[] explicitly when top_logprobs is False. Signed-off-by: yingguo-trt <[email protected]>

Signed-off-by: yingguo-trt <[email protected]>

yingguo-trt · 2026-03-17T07:45:14Z

/bot run

coderabbitai · 2026-03-17T07:46:19Z

📝 Walkthrough

Walkthrough

This pull request adds logprobs serving support for disaggregated inference by updating the OpenAI protocol schema to make top_logprobs optional, introducing new disaggregated configuration files, implementing an integration test for logprobs serving across streaming and non-streaming endpoints, and registering the test in the test suite.

Changes

Cohort / File(s)	Summary
OpenAI Protocol Schema `tensorrt_llm/serve/openai_protocol.py`	Changed `top_logprobs` field type from `List[ChatCompletionLogProb]` to `Optional[List[ChatCompletionLogProb]]` to allow the field to be explicitly optional in the public schema.
Disaggregated Configuration Files `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml`, `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_llama31_8b_ucx.yaml`	Added MOE backend AUTO configuration to existing config and created new Llama-3.1-8B disaggregated setup config with tensor parallelism, UCX transceiver backend, and logits gathering enabled.
Integration Test Implementation `tests/integration/defs/disaggregated/test_disaggregated.py`	Added new test function `test_disaggregated_logprobs_serving` that validates logprobs extraction and comparison across streaming and non-streaming OpenAI-compatible endpoints (completions and chat). Note: Test implementation appears duplicated within the file.
Test List Registrations `tests/integration/test_lists/qa/llm_function_core.txt`, `tests/integration/test_lists/qa/llm_function_core_sanity.txt`	Registered new test case `disaggregated/test_disaggregated.py::test_disaggregated_logprobs_serving[llama-3.1-8b-instruct]` in test discovery lists.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The pull request description is incomplete and largely empty. All required sections (Description, Test Coverage) lack substantive content beyond placeholder HTML comments.	Complete the Description section by explaining the issue and solution. Populate Test Coverage section with a clear list of relevant tests that safeguard the changes. Ensure the PR title follows the template format with a valid ticket identifier and type.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: adding an end-to-end logprobs test for disaggregated serving via OpenAI API, matching the primary focus of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/integration/defs/disaggregated/test_disaggregated.py (1)

2257-2293: Tighten the top_logprobs=3 assertion.

This only checks that top_logprobs is non-empty, so returning one candidate for a top_logprobs=3 request would still pass. It is worth enforcing at least the requested upper bound, and ideally that the sampled token is present in the returned candidates.

🔍 Tighten the check

                     for item in content:
                         top_lps = item.get("top_logprobs")
                         assert top_lps is not None and len(top_lps) > 0, (
                             f"top_logprobs should be non-empty when requested: {item}"
                         )
+                        assert len(top_lps) <= top_lp_payload["top_logprobs"], (
+                            f"top_logprobs exceeded requested limit: {item}")
+                        assert any(tl["token"] == item["token"]
+                                   for tl in top_lps), (
+                                       f"sampled token missing from top_logprobs: {item}")
                         for tl in top_lps:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/disaggregated/test_disaggregated.py` around lines 2257
- 2293, Enforce that the returned top_logprobs match the requested bound and
include the sampled token: after building top_lp_payload (which sets
"top_logprobs": 3) and retrieving lp_obj/content/top_lps, assert len(top_lps) >=
top_lp_payload["top_logprobs"] (or hardcode 3) and also assert the sampled token
from the content item (e.g., item.get("token") or item.get("text") — whichever
field holds the generated token in this response shape) appears among the
top_lps tokens using a membership check like any(tl["token"] == sampled_token
for tl in top_lps); use the existing variables top_lp_payload, lp_obj, content,
item, top_lps and tl to locate and implement the checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/serve/openai_protocol.py`:
- Around line 510-511: Change the ChatCompletionLogProbsContent.top_logprobs
field to use a non-nullable list with a default factory so it always serializes
as [] instead of null; specifically, replace the
Optional[List[ChatCompletionLogProb]] = None declaration in class
ChatCompletionLogProbsContent with a List[ChatCompletionLogProb] defaulting to
Field(default_factory=list) (matching the pattern used by CompletionLogProbs) so
model_dump without exclude_none produces an empty list.

In `@tests/integration/defs/disaggregated/test_disaggregated.py`:
- Around line 2241-2255: The test can silently pass when there are no comparable
logprob entries; after obtaining ns_logprobs and st_logprobs (and before the
for-loop that skips i==0 and None entries) add an explicit check that there is
at least one comparable entry: compute comparable_count = sum(1 for i,(n,s) in
enumerate(zip(ns_logprobs,st_logprobs)) if i != 0 and n is not None and s is not
None) and assert comparable_count > 0 with a clear message including api_type,
ns_logprobs and st_logprobs; keep the existing np.isclose comparisons unchanged
but rely on this new assertion to fail when both paths return no comparable
logprobs.
- Around line 2181-2189: This test uses the "llama31_8b_ucx" config but doesn't
set the UCX env vars; before calling setup_disagg_cluster, update the
environment passed (llm_venv._new_env) to include TRTLLM_USE_UCX_KVCACHE="1" and
an appropriate UCX_TLS (e.g., "tcp,self,sm") so the UCX code path is exercised;
either mutate a copy of llm_venv._new_env or create a new env dict and pass that
into setup_disagg_cluster (refer to config_file, llama31_8b_ucx, and the
setup_disagg_cluster call to locate where to change).

---

Nitpick comments:
In `@tests/integration/defs/disaggregated/test_disaggregated.py`:
- Around line 2257-2293: Enforce that the returned top_logprobs match the
requested bound and include the sampled token: after building top_lp_payload
(which sets "top_logprobs": 3) and retrieving lp_obj/content/top_lps, assert
len(top_lps) >= top_lp_payload["top_logprobs"] (or hardcode 3) and also assert
the sampled token from the content item (e.g., item.get("token") or
item.get("text") — whichever field holds the generated token in this response
shape) appears among the top_lps tokens using a membership check like
any(tl["token"] == sampled_token for tl in top_lps); use the existing variables
top_lp_payload, lp_obj, content, item, top_lps and tl to locate and implement
the checks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c42b77c7-9a58-4506-8daf-c003c86ca197

📥 Commits

Reviewing files that changed from the base of the PR and between ed9618b and b2a4901.

📒 Files selected for processing (6)

tensorrt_llm/serve/openai_protocol.py
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_llama31_8b_ucx.yaml
tests/integration/defs/disaggregated/test_disaggregated.py
tests/integration/test_lists/qa/llm_function_core.txt
tests/integration/test_lists/qa/llm_function_core_sanity.txt

tensorrt-cicd · 2026-03-17T07:51:55Z

PR_Github #39219 [ run ] triggered by Bot. Commit: de35027 Link to invocation

…sertion Signed-off-by: yingguo-trt <[email protected]>

Signed-off-by: yingguo-trt <[email protected]>

yingguo-trt · 2026-03-17T08:26:54Z

/bot run

tensorrt-cicd · 2026-03-17T08:33:40Z

PR_Github #39224 [ run ] triggered by Bot. Commit: a210de6 Link to invocation

tensorrt-cicd · 2026-03-17T11:23:20Z

PR_Github #39224 [ run ] completed with state SUCCESS. Commit: a210de6
/LLM/main/L0_MergeRequest_PR pipeline #30476 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

yingguo-trt · 2026-03-18T01:55:32Z

/bot reuse-pipeline

tensorrt-cicd · 2026-03-18T02:02:32Z

PR_Github #39348 [ reuse-pipeline ] triggered by Bot. Commit: a210de6 Link to invocation

tensorrt-cicd · 2026-03-18T02:03:57Z

PR_Github #39348 [ reuse-pipeline ] completed with state SUCCESS. Commit: a210de6
Can't reuse PR_Github #39224 with status: FAILED

Link to invocation

yingguo-trt · 2026-03-18T03:26:54Z

/bot run --stage-list ""

tensorrt-cicd · 2026-03-18T03:36:13Z

PR_Github #39376 [ run ] triggered by Bot. Commit: a210de6 Link to invocation

tensorrt-cicd · 2026-03-18T04:52:15Z

PR_Github #39376 [ run ] completed with state SUCCESS. Commit: a210de6
/LLM/main/L0_MergeRequest_PR pipeline #30617 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

yingguo-trt · 2026-03-18T05:14:25Z

/bot reuse-pipeline

tensorrt-cicd · 2026-03-18T05:20:06Z

PR_Github #39384 [ reuse-pipeline ] triggered by Bot. Commit: a210de6 Link to invocation

tensorrt-cicd · 2026-03-18T05:27:19Z

PR_Github #39384 [ reuse-pipeline ] completed with state SUCCESS. Commit: a210de6
Reusing PR_Github #39376 (Partly Tested) for commit a210de6

Link to invocation

…AI API (NVIDIA#12275) Signed-off-by: yingguo-trt <[email protected]>

yingguo-trt added 21 commits March 17, 2026 15:26

[None][test] add test_disaggregated_logprobs for 4-GPU disaggregated …

5fc140f

…serving Signed-off-by: yingguo-trt <[email protected]>

[None][test] add test_disaggregated_logprobs for 4-GPU disaggregated …

07e6fa8

…serving Signed-off-by: yingguo-trt <[email protected]>

fix

16a2b19

Signed-off-by: yingguo-trt <[email protected]>

fix pytest marker parameter

e29f05e

Signed-off-by: yingguo-trt <[email protected]>

add server_start_timeout

80cfd50

Signed-off-by: yingguo-trt <[email protected]>

adapte GB200 moe backend

e72f1cd

Signed-off-by: yingguo-trt <[email protected]>

[None][test] switch test_disaggregated_logprobs to Llama-3.1-8B-Instruct

768e7d2

Signed-off-by: yingguo-trt <[email protected]>

modify test name in *.txt

cff693f

Signed-off-by: yingguo-trt <[email protected]>

remove testcase in test-db

3f6d9b1

Signed-off-by: yingguo-trt <[email protected]>

set temperature=0

0484004

Signed-off-by: yingguo-trt <[email protected]>

adapte rtol&atol

2e23e69

Signed-off-by: yingguo-trt <[email protected]>

update docstring

54a276d

Signed-off-by: yingguo-trt <[email protected]>

Fix Chat API 500 when top_logprobs not requested & add releated test

35da924

Signed-off-by: yingguo-trt <[email protected]>

refactor test case

5cbd1a1

Signed-off-by: yingguo-trt <[email protected]>

Skip first-token logprob comparison in disaggregated streaming test

1f6af7b

Signed-off-by: yingguo-trt <[email protected]>

modify backend to UCX

c2718d8

Signed-off-by: yingguo-trt <[email protected]>

modify backend to UCX

bcb0eb4

Signed-off-by: yingguo-trt <[email protected]>

modify backend to UCX

86e5b64

Signed-off-by: yingguo-trt <[email protected]>

fix pre-commit error

b2a4901

Signed-off-by: yingguo-trt <[email protected]>

yingguo-trt requested review from LarryXFly and xinhe-nv March 17, 2026 07:32

yingguo-trt requested review from a team as code owners March 17, 2026 07:32

yingguo-trt requested a review from hchings March 17, 2026 07:32

github-actions Bot assigned yingguo-trt Mar 17, 2026

revert

de35027

Signed-off-by: yingguo-trt <[email protected]>

yingguo-trt removed the request for review from hchings March 17, 2026 07:40

xinhe-nv approved these changes Mar 17, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread tensorrt_llm/serve/openai_protocol.py

Comment thread tests/integration/defs/disaggregated/test_disaggregated.py

Comment thread tests/integration/defs/disaggregated/test_disaggregated.py

LarryXFly approved these changes Mar 17, 2026

View reviewed changes

yingguo-trt added 2 commits March 17, 2026 16:12

test_disaggregated_logprobs_serving: add UCX env, tighten logprobs as…

beb868d

…sertion Signed-off-by: yingguo-trt <[email protected]>

fix pre-commit error

a210de6

Signed-off-by: yingguo-trt <[email protected]>

yingguo-trt enabled auto-merge (squash) March 18, 2026 02:05

nv-guomingz approved these changes Mar 18, 2026

View reviewed changes

yingguo-trt merged commit c20a86f into NVIDIA:main Mar 18, 2026
5 checks passed

limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026

[None][test] Add E2E logprobs test for disaggregated serving via Open…

2cadb83

…AI API (NVIDIA#12275) Signed-off-by: yingguo-trt <[email protected]>

longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026

[None][test] Add E2E logprobs test for disaggregated serving via Open…

bfb0d30

…AI API (NVIDIA#12275) Signed-off-by: yingguo-trt <[email protected]>

Conversation

yingguo-trt commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

yingguo-trt commented Mar 17, 2026

Uh oh!

coderabbitai Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

yingguo-trt commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

yingguo-trt commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

yingguo-trt commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

yingguo-trt commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yingguo-trt commented Mar 17, 2026 •

edited

Loading

coderabbitai Bot commented Mar 17, 2026 •

edited

Loading