Skip to content

[None][test] Add E2E logprobs test for disaggregated serving via OpenAI API#12275

Merged
yingguo-trt merged 24 commits into
NVIDIA:mainfrom
yingguo-trt:feat/disagg-logprobs-multi-gpu-test
Mar 18, 2026
Merged

[None][test] Add E2E logprobs test for disaggregated serving via OpenAI API#12275
yingguo-trt merged 24 commits into
NVIDIA:mainfrom
yingguo-trt:feat/disagg-logprobs-multi-gpu-test

Conversation

@yingguo-trt
Copy link
Copy Markdown
Collaborator

@yingguo-trt yingguo-trt commented Mar 17, 2026

Summary by CodeRabbit

  • New Features

    • Added support for optional logprobs field in chat completion responses.
    • Added new disaggregated serving configurations for Llama 3.1 and Deepseek models with UCX backend support.
  • Tests

    • Added integration test for logprobs serving in disaggregated inference scenarios.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
…etter coverage

- Rename to test_disaggregated_logprobs_serving to distinguish from
  the single-GPU variant in test_disaggregated_single_gpu.py
- Switch model from DeepSeek-V3-Lite-fp8 (MoE) to Llama-3.1-8B-Instruct
  (dense) to avoid CUTLASS DeepGemm JIT incompatibility on Blackwell GPUs
- Remove skip_pre_hopper since dense model doesn't require Hopper
- Remove streaming/api_type parametrize to avoid 4x cluster startups;
  test all combinations (completions/chat x streaming/non-streaming)
  within a single cluster session
- Add streaming vs non-streaming consistency check with logprob comparison
- Extract SSE parsing and logprobs validation into helper functions
- Update test list entries in all CI config files

Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
When logprobs=True but top_logprobs is not requested, ChatCompletionLogProbsContent
left top_logprobs as None, causing Pydantic validation to fail. Set top_logprobs=[]
explicitly when top_logprobs is False.

Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
@yingguo-trt yingguo-trt removed the request for review from hchings March 17, 2026 07:40
@yingguo-trt
Copy link
Copy Markdown
Collaborator Author

/bot run

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

This pull request adds logprobs serving support for disaggregated inference by updating the OpenAI protocol schema to make top_logprobs optional, introducing new disaggregated configuration files, implementing an integration test for logprobs serving across streaming and non-streaming endpoints, and registering the test in the test suite.

Changes

Cohort / File(s) Summary
OpenAI Protocol Schema
tensorrt_llm/serve/openai_protocol.py
Changed top_logprobs field type from List[ChatCompletionLogProb] to Optional[List[ChatCompletionLogProb]] to allow the field to be explicitly optional in the public schema.
Disaggregated Configuration Files
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml, tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_llama31_8b_ucx.yaml
Added MOE backend AUTO configuration to existing config and created new Llama-3.1-8B disaggregated setup config with tensor parallelism, UCX transceiver backend, and logits gathering enabled.
Integration Test Implementation
tests/integration/defs/disaggregated/test_disaggregated.py
Added new test function test_disaggregated_logprobs_serving that validates logprobs extraction and comparison across streaming and non-streaming OpenAI-compatible endpoints (completions and chat). Note: Test implementation appears duplicated within the file.
Test List Registrations
tests/integration/test_lists/qa/llm_function_core.txt, tests/integration/test_lists/qa/llm_function_core_sanity.txt
Registered new test case disaggregated/test_disaggregated.py::test_disaggregated_logprobs_serving[llama-3.1-8b-instruct] in test discovery lists.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The pull request description is incomplete and largely empty. All required sections (Description, Test Coverage) lack substantive content beyond placeholder HTML comments. Complete the Description section by explaining the issue and solution. Populate Test Coverage section with a clear list of relevant tests that safeguard the changes. Ensure the PR title follows the template format with a valid ticket identifier and type.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding an end-to-end logprobs test for disaggregated serving via OpenAI API, matching the primary focus of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/integration/defs/disaggregated/test_disaggregated.py (1)

2257-2293: Tighten the top_logprobs=3 assertion.

This only checks that top_logprobs is non-empty, so returning one candidate for a top_logprobs=3 request would still pass. It is worth enforcing at least the requested upper bound, and ideally that the sampled token is present in the returned candidates.

🔍 Tighten the check
                     for item in content:
                         top_lps = item.get("top_logprobs")
                         assert top_lps is not None and len(top_lps) > 0, (
                             f"top_logprobs should be non-empty when requested: {item}"
                         )
+                        assert len(top_lps) <= top_lp_payload["top_logprobs"], (
+                            f"top_logprobs exceeded requested limit: {item}")
+                        assert any(tl["token"] == item["token"]
+                                   for tl in top_lps), (
+                                       f"sampled token missing from top_logprobs: {item}")
                         for tl in top_lps:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/disaggregated/test_disaggregated.py` around lines 2257
- 2293, Enforce that the returned top_logprobs match the requested bound and
include the sampled token: after building top_lp_payload (which sets
"top_logprobs": 3) and retrieving lp_obj/content/top_lps, assert len(top_lps) >=
top_lp_payload["top_logprobs"] (or hardcode 3) and also assert the sampled token
from the content item (e.g., item.get("token") or item.get("text") — whichever
field holds the generated token in this response shape) appears among the
top_lps tokens using a membership check like any(tl["token"] == sampled_token
for tl in top_lps); use the existing variables top_lp_payload, lp_obj, content,
item, top_lps and tl to locate and implement the checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/serve/openai_protocol.py`:
- Around line 510-511: Change the ChatCompletionLogProbsContent.top_logprobs
field to use a non-nullable list with a default factory so it always serializes
as [] instead of null; specifically, replace the
Optional[List[ChatCompletionLogProb]] = None declaration in class
ChatCompletionLogProbsContent with a List[ChatCompletionLogProb] defaulting to
Field(default_factory=list) (matching the pattern used by CompletionLogProbs) so
model_dump without exclude_none produces an empty list.

In `@tests/integration/defs/disaggregated/test_disaggregated.py`:
- Around line 2241-2255: The test can silently pass when there are no comparable
logprob entries; after obtaining ns_logprobs and st_logprobs (and before the
for-loop that skips i==0 and None entries) add an explicit check that there is
at least one comparable entry: compute comparable_count = sum(1 for i,(n,s) in
enumerate(zip(ns_logprobs,st_logprobs)) if i != 0 and n is not None and s is not
None) and assert comparable_count > 0 with a clear message including api_type,
ns_logprobs and st_logprobs; keep the existing np.isclose comparisons unchanged
but rely on this new assertion to fail when both paths return no comparable
logprobs.
- Around line 2181-2189: This test uses the "llama31_8b_ucx" config but doesn't
set the UCX env vars; before calling setup_disagg_cluster, update the
environment passed (llm_venv._new_env) to include TRTLLM_USE_UCX_KVCACHE="1" and
an appropriate UCX_TLS (e.g., "tcp,self,sm") so the UCX code path is exercised;
either mutate a copy of llm_venv._new_env or create a new env dict and pass that
into setup_disagg_cluster (refer to config_file, llama31_8b_ucx, and the
setup_disagg_cluster call to locate where to change).

---

Nitpick comments:
In `@tests/integration/defs/disaggregated/test_disaggregated.py`:
- Around line 2257-2293: Enforce that the returned top_logprobs match the
requested bound and include the sampled token: after building top_lp_payload
(which sets "top_logprobs": 3) and retrieving lp_obj/content/top_lps, assert
len(top_lps) >= top_lp_payload["top_logprobs"] (or hardcode 3) and also assert
the sampled token from the content item (e.g., item.get("token") or
item.get("text") — whichever field holds the generated token in this response
shape) appears among the top_lps tokens using a membership check like
any(tl["token"] == sampled_token for tl in top_lps); use the existing variables
top_lp_payload, lp_obj, content, item, top_lps and tl to locate and implement
the checks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c42b77c7-9a58-4506-8daf-c003c86ca197

📥 Commits

Reviewing files that changed from the base of the PR and between ed9618b and b2a4901.

📒 Files selected for processing (6)
  • tensorrt_llm/serve/openai_protocol.py
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_llama31_8b_ucx.yaml
  • tests/integration/defs/disaggregated/test_disaggregated.py
  • tests/integration/test_lists/qa/llm_function_core.txt
  • tests/integration/test_lists/qa/llm_function_core_sanity.txt

Comment thread tensorrt_llm/serve/openai_protocol.py
Comment thread tests/integration/defs/disaggregated/test_disaggregated.py
Comment thread tests/integration/defs/disaggregated/test_disaggregated.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39219 [ run ] triggered by Bot. Commit: de35027 Link to invocation

@yingguo-trt
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39224 [ run ] triggered by Bot. Commit: a210de6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39224 [ run ] completed with state SUCCESS. Commit: a210de6
/LLM/main/L0_MergeRequest_PR pipeline #30476 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yingguo-trt
Copy link
Copy Markdown
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39348 [ reuse-pipeline ] triggered by Bot. Commit: a210de6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39348 [ reuse-pipeline ] completed with state SUCCESS. Commit: a210de6
Can't reuse PR_Github #39224 with status: FAILED

Link to invocation

@yingguo-trt yingguo-trt enabled auto-merge (squash) March 18, 2026 02:05
@yingguo-trt
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list ""

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39376 [ run ] triggered by Bot. Commit: a210de6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39376 [ run ] completed with state SUCCESS. Commit: a210de6
/LLM/main/L0_MergeRequest_PR pipeline #30617 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@yingguo-trt
Copy link
Copy Markdown
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39384 [ reuse-pipeline ] triggered by Bot. Commit: a210de6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39384 [ reuse-pipeline ] completed with state SUCCESS. Commit: a210de6
Reusing PR_Github #39376 (Partly Tested) for commit a210de6

Link to invocation

@yingguo-trt yingguo-trt merged commit c20a86f into NVIDIA:main Mar 18, 2026
5 checks passed
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants