[None][test] Add E2E logprobs test for disaggregated serving via OpenAI API#12275
Conversation
…serving Signed-off-by: yingguo-trt <[email protected]>
…serving Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
…etter coverage - Rename to test_disaggregated_logprobs_serving to distinguish from the single-GPU variant in test_disaggregated_single_gpu.py - Switch model from DeepSeek-V3-Lite-fp8 (MoE) to Llama-3.1-8B-Instruct (dense) to avoid CUTLASS DeepGemm JIT incompatibility on Blackwell GPUs - Remove skip_pre_hopper since dense model doesn't require Hopper - Remove streaming/api_type parametrize to avoid 4x cluster startups; test all combinations (completions/chat x streaming/non-streaming) within a single cluster session - Add streaming vs non-streaming consistency check with logprob comparison - Extract SSE parsing and logprobs validation into helper functions - Update test list entries in all CI config files Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
When logprobs=True but top_logprobs is not requested, ChatCompletionLogProbsContent left top_logprobs as None, causing Pydantic validation to fail. Set top_logprobs=[] explicitly when top_logprobs is False. Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
|
/bot run |
📝 WalkthroughWalkthroughThis pull request adds logprobs serving support for disaggregated inference by updating the OpenAI protocol schema to make top_logprobs optional, introducing new disaggregated configuration files, implementing an integration test for logprobs serving across streaming and non-streaming endpoints, and registering the test in the test suite. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/integration/defs/disaggregated/test_disaggregated.py (1)
2257-2293: Tighten thetop_logprobs=3assertion.This only checks that
top_logprobsis non-empty, so returning one candidate for atop_logprobs=3request would still pass. It is worth enforcing at least the requested upper bound, and ideally that the sampled token is present in the returned candidates.🔍 Tighten the check
for item in content: top_lps = item.get("top_logprobs") assert top_lps is not None and len(top_lps) > 0, ( f"top_logprobs should be non-empty when requested: {item}" ) + assert len(top_lps) <= top_lp_payload["top_logprobs"], ( + f"top_logprobs exceeded requested limit: {item}") + assert any(tl["token"] == item["token"] + for tl in top_lps), ( + f"sampled token missing from top_logprobs: {item}") for tl in top_lps:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/disaggregated/test_disaggregated.py` around lines 2257 - 2293, Enforce that the returned top_logprobs match the requested bound and include the sampled token: after building top_lp_payload (which sets "top_logprobs": 3) and retrieving lp_obj/content/top_lps, assert len(top_lps) >= top_lp_payload["top_logprobs"] (or hardcode 3) and also assert the sampled token from the content item (e.g., item.get("token") or item.get("text") — whichever field holds the generated token in this response shape) appears among the top_lps tokens using a membership check like any(tl["token"] == sampled_token for tl in top_lps); use the existing variables top_lp_payload, lp_obj, content, item, top_lps and tl to locate and implement the checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/serve/openai_protocol.py`:
- Around line 510-511: Change the ChatCompletionLogProbsContent.top_logprobs
field to use a non-nullable list with a default factory so it always serializes
as [] instead of null; specifically, replace the
Optional[List[ChatCompletionLogProb]] = None declaration in class
ChatCompletionLogProbsContent with a List[ChatCompletionLogProb] defaulting to
Field(default_factory=list) (matching the pattern used by CompletionLogProbs) so
model_dump without exclude_none produces an empty list.
In `@tests/integration/defs/disaggregated/test_disaggregated.py`:
- Around line 2241-2255: The test can silently pass when there are no comparable
logprob entries; after obtaining ns_logprobs and st_logprobs (and before the
for-loop that skips i==0 and None entries) add an explicit check that there is
at least one comparable entry: compute comparable_count = sum(1 for i,(n,s) in
enumerate(zip(ns_logprobs,st_logprobs)) if i != 0 and n is not None and s is not
None) and assert comparable_count > 0 with a clear message including api_type,
ns_logprobs and st_logprobs; keep the existing np.isclose comparisons unchanged
but rely on this new assertion to fail when both paths return no comparable
logprobs.
- Around line 2181-2189: This test uses the "llama31_8b_ucx" config but doesn't
set the UCX env vars; before calling setup_disagg_cluster, update the
environment passed (llm_venv._new_env) to include TRTLLM_USE_UCX_KVCACHE="1" and
an appropriate UCX_TLS (e.g., "tcp,self,sm") so the UCX code path is exercised;
either mutate a copy of llm_venv._new_env or create a new env dict and pass that
into setup_disagg_cluster (refer to config_file, llama31_8b_ucx, and the
setup_disagg_cluster call to locate where to change).
---
Nitpick comments:
In `@tests/integration/defs/disaggregated/test_disaggregated.py`:
- Around line 2257-2293: Enforce that the returned top_logprobs match the
requested bound and include the sampled token: after building top_lp_payload
(which sets "top_logprobs": 3) and retrieving lp_obj/content/top_lps, assert
len(top_lps) >= top_lp_payload["top_logprobs"] (or hardcode 3) and also assert
the sampled token from the content item (e.g., item.get("token") or
item.get("text") — whichever field holds the generated token in this response
shape) appears among the top_lps tokens using a membership check like
any(tl["token"] == sampled_token for tl in top_lps); use the existing variables
top_lp_payload, lp_obj, content, item, top_lps and tl to locate and implement
the checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c42b77c7-9a58-4506-8daf-c003c86ca197
📒 Files selected for processing (6)
tensorrt_llm/serve/openai_protocol.pytests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yamltests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_llama31_8b_ucx.yamltests/integration/defs/disaggregated/test_disaggregated.pytests/integration/test_lists/qa/llm_function_core.txttests/integration/test_lists/qa/llm_function_core_sanity.txt
|
PR_Github #39219 [ run ] triggered by Bot. Commit: |
…sertion Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
|
/bot run |
|
PR_Github #39224 [ run ] triggered by Bot. Commit: |
|
PR_Github #39224 [ run ] completed with state
|
|
/bot reuse-pipeline |
|
PR_Github #39348 [ reuse-pipeline ] triggered by Bot. Commit: |
|
PR_Github #39348 [ reuse-pipeline ] completed with state |
|
/bot run --stage-list "" |
|
PR_Github #39376 [ run ] triggered by Bot. Commit: |
|
PR_Github #39376 [ run ] completed with state |
|
/bot reuse-pipeline |
|
PR_Github #39384 [ reuse-pipeline ] triggered by Bot. Commit: |
|
PR_Github #39384 [ reuse-pipeline ] completed with state |
…AI API (NVIDIA#12275) Signed-off-by: yingguo-trt <[email protected]>
…AI API (NVIDIA#12275) Signed-off-by: yingguo-trt <[email protected]>
Summary by CodeRabbit
New Features
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.