[None][chore] Align perf benchmark output format#12067
Conversation
Signed-off-by: yingguo-trt <[email protected]>
… in disagg perf tests Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
…rk backend Signed-off-by: yingguo-trt <[email protected]>
… log reporter Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
📝 WalkthroughWalkthroughThis PR enhances benchmark failure reporting in disaggregated scenarios by introducing failure request counting capabilities. The shell script outputs failure metrics from JSON results, while the reporting module adds configurable extraction logic to parse failures from either JSON or log sources based on a benchmark configuration flag. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/disaggregated/slurm/benchmark/run_benchmark_nv_sa.sh`:
- Around line 194-201: The failure-suppression at the end of the Python here-doc
(the "2>/dev/null || true" attached to the command `python -
"${output_dir}/result.json" <<-'PYEOF' ... PYEOF`) hides missing or malformed
result.json files; remove the `2>/dev/null || true` and either allow the Python
process to surface errors (so the script fails visibly) or modify the Python
block to catch exceptions, print a clear error message including the bad
filename, and exit with a non‑zero status; keep the existing JSON logic but
ensure the here-doc (`PYEOF`) returns a non-zero exit on parse/open failures so
missing/malformed result.json is not treated as a successful run.
In `@tests/integration/defs/perf/disagg/reporting/report.py`:
- Around line 199-207: The concurrency suffix used when building base_test_name
no longer matches the pattern expected by _get_network_name, so update the
suffix to match the parser: change the construction of base_test_name (the line
that currently uses f"{test_name}_con:{concurrency}") to use the same "-con-<n>"
format that _get_network_name expects (e.g., f"{test_name}-con-{concurrency}"),
ensuring network_name is parsed correctly by _get_network_name.
- Around line 102-114: The code currently only catches json.JSONDecodeError when
reading result_json_path so filesystem races or permission errors during open()
will raise and abort parsing; wrap the open()+json.load() in a try that first
catches OSError (or more specific FileNotFoundError, PermissionError) to log and
continue, and then separately catch json.JSONDecodeError to log and continue,
ensuring you only broaden the except to those specific exceptions and still
update/skip total_requests and failed_requests correctly when data is available;
use the existing symbols result_json_path, logger, data, total_requests, and
failed_requests to locate and implement the changes.
- Around line 137-146: The NV-SA branch is being chosen only when self.config is
a dict and then relies solely on raw_results' "concurrency" entries, causing
valid NV-SA runs (e.g., when self.config is a BenchmarkConfig instance or when
metric extraction returned no matches) to fall back to 0/0; update the gating
and extraction: determine use_nv_sa by checking the NV-SA flag in either a dict
(self.config.get("benchmark", {}).get("use_nv_sa_benchmark")) or a
BenchmarkConfig instance via getattr(self.config, "benchmark", None) and look
for a truthy use_nv_sa_benchmark there, then call
_extract_request_counts_from_json(concurrencies) but if that returns
empty/invalid counts, implement a fallback that scans raw_results for
concurrency_*/result.json files (or their JSON payloads) and parse them to
compute failed_requests and total_requests before falling back to
_extract_request_counts_from_log(log_content); reference variables/functions:
use_nv_sa, self.config, raw_results, _extract_request_counts_from_json, and
_extract_request_counts_from_log.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a2d79d8f-4e76-4ff1-b31e-edb9b681fdc6
📒 Files selected for processing (2)
examples/disaggregated/slurm/benchmark/run_benchmark_nv_sa.shtests/integration/defs/perf/disagg/reporting/report.py
Signed-off-by: yingguo-trt <[email protected]>
|
/bot skip --comment "Not cover in CI pipelines" |
fredricz-20070104
left a comment
There was a problem hiding this comment.
It's okay for me. Please ask for Kaiyu, Xie's review.
|
PR_Github #38399 [ skip ] triggered by Bot. Commit: |
|
PR_Github #38399 [ skip ] completed with state |
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Signed-off-by: yingguo-trt <[email protected]>
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.