Skip to content

[None][chore] Align perf benchmark output format#12067

Merged
yingguo-trt merged 9 commits into
NVIDIA:mainfrom
yingguo-trt:feat/align-perf-benchmark-output-format
Mar 10, 2026
Merged

[None][chore] Align perf benchmark output format#12067
yingguo-trt merged 9 commits into
NVIDIA:mainfrom
yingguo-trt:feat/align-perf-benchmark-output-format

Conversation

@yingguo-trt
Copy link
Copy Markdown
Collaborator

@yingguo-trt yingguo-trt commented Mar 10, 2026

Summary by CodeRabbit

  • Tests & Benchmarking
    • Enhanced benchmark failure reporting to display total failed requests alongside existing metrics.
    • Improved reporting logic to support multiple benchmark output formats with flexible extraction strategies.
    • Refined request count aggregation for more accurate failure tracking across benchmarks.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

This PR enhances benchmark failure reporting in disaggregated scenarios by introducing failure request counting capabilities. The shell script outputs failure metrics from JSON results, while the reporting module adds configurable extraction logic to parse failures from either JSON or log sources based on a benchmark configuration flag.

Changes

Cohort / File(s) Summary
Benchmark Failure Reporting
examples/disaggregated/slurm/benchmark/run_benchmark_nv_sa.sh
Adds post-benchmark logic to load result.json and output total failed request count calculated as num_prompts - completed.
Report Parsing Logic
tests/integration/defs/perf/disagg/reporting/report.py
Introduces configurable request-count extraction with two new helper methods: log-based and JSON-based parsers. Updated parse() to select extraction strategy based on benchmark.use_nv_sa_benchmark flag. Refined ResultSaver to write headers only on first write and improved docstring consistency.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is empty; it contains only the template with unfilled sections and no actual explanation of the changes. Fill in the Description section explaining what changes were made and why. Provide a Test Coverage section listing relevant tests. Complete the PR Checklist appropriately.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][chore] Align perf benchmark output format' clearly and concisely describes the main change: aligning performance benchmark output format across multiple benchmark scripts and test reporting logic.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/disaggregated/slurm/benchmark/run_benchmark_nv_sa.sh`:
- Around line 194-201: The failure-suppression at the end of the Python here-doc
(the "2>/dev/null || true" attached to the command `python -
"${output_dir}/result.json" <<-'PYEOF' ... PYEOF`) hides missing or malformed
result.json files; remove the `2>/dev/null || true` and either allow the Python
process to surface errors (so the script fails visibly) or modify the Python
block to catch exceptions, print a clear error message including the bad
filename, and exit with a non‑zero status; keep the existing JSON logic but
ensure the here-doc (`PYEOF`) returns a non-zero exit on parse/open failures so
missing/malformed result.json is not treated as a successful run.

In `@tests/integration/defs/perf/disagg/reporting/report.py`:
- Around line 199-207: The concurrency suffix used when building base_test_name
no longer matches the pattern expected by _get_network_name, so update the
suffix to match the parser: change the construction of base_test_name (the line
that currently uses f"{test_name}_con:{concurrency}") to use the same "-con-<n>"
format that _get_network_name expects (e.g., f"{test_name}-con-{concurrency}"),
ensuring network_name is parsed correctly by _get_network_name.
- Around line 102-114: The code currently only catches json.JSONDecodeError when
reading result_json_path so filesystem races or permission errors during open()
will raise and abort parsing; wrap the open()+json.load() in a try that first
catches OSError (or more specific FileNotFoundError, PermissionError) to log and
continue, and then separately catch json.JSONDecodeError to log and continue,
ensuring you only broaden the except to those specific exceptions and still
update/skip total_requests and failed_requests correctly when data is available;
use the existing symbols result_json_path, logger, data, total_requests, and
failed_requests to locate and implement the changes.
- Around line 137-146: The NV-SA branch is being chosen only when self.config is
a dict and then relies solely on raw_results' "concurrency" entries, causing
valid NV-SA runs (e.g., when self.config is a BenchmarkConfig instance or when
metric extraction returned no matches) to fall back to 0/0; update the gating
and extraction: determine use_nv_sa by checking the NV-SA flag in either a dict
(self.config.get("benchmark", {}).get("use_nv_sa_benchmark")) or a
BenchmarkConfig instance via getattr(self.config, "benchmark", None) and look
for a truthy use_nv_sa_benchmark there, then call
_extract_request_counts_from_json(concurrencies) but if that returns
empty/invalid counts, implement a fallback that scans raw_results for
concurrency_*/result.json files (or their JSON payloads) and parse them to
compute failed_requests and total_requests before falling back to
_extract_request_counts_from_log(log_content); reference variables/functions:
use_nv_sa, self.config, raw_results, _extract_request_counts_from_json, and
_extract_request_counts_from_log.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a2d79d8f-4e76-4ff1-b31e-edb9b681fdc6

📥 Commits

Reviewing files that changed from the base of the PR and between a20de88 and 58d0bb6.

📒 Files selected for processing (2)
  • examples/disaggregated/slurm/benchmark/run_benchmark_nv_sa.sh
  • tests/integration/defs/perf/disagg/reporting/report.py

Comment thread examples/disaggregated/slurm/benchmark/run_benchmark_nv_sa.sh
Comment thread tests/integration/defs/perf/disagg/reporting/report.py
Comment thread tests/integration/defs/perf/disagg/reporting/report.py
Comment thread tests/integration/defs/perf/disagg/reporting/report.py
@yingguo-trt
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "Not cover in CI pipelines"

Copy link
Copy Markdown
Collaborator

@fredricz-20070104 fredricz-20070104 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay for me. Please ask for Kaiyu, Xie's review.

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38399 [ skip ] triggered by Bot. Commit: 5622a58 Link to invocation

@yingguo-trt yingguo-trt enabled auto-merge (squash) March 10, 2026 07:28
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38399 [ skip ] completed with state SUCCESS. Commit: 5622a58
Skipping testing for commit 5622a58

Link to invocation

@yingguo-trt yingguo-trt merged commit 81350b7 into NVIDIA:main Mar 10, 2026
5 checks passed
@yingguo-trt yingguo-trt deleted the feat/align-perf-benchmark-output-format branch March 11, 2026 02:24
bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Mar 12, 2026
tianyuz-nv pushed a commit to wanqian-nv/TensorRT-LLM that referenced this pull request Mar 19, 2026
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants