[None][chore] Alltoall benchmark script refine (second time).#12192
Conversation
|
/bot run --disable-fail-fast |
|
PR_Github #38847 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThe changes refactor the benchmarking script's profile resolution to return four values instead of five, add explicit CLI overrides for profile parameters, integrate per-iteration kernel timing spans from CUPTI analysis, relax CUPTI initialization gating, and propagate worker environment variables across MPI launch modes. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can get early access to new features in CodeRabbit.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/microbenchmarks/bench_moe_comm.py`:
- Around line 1005-1008: Replace the boolean-or fallbacks with explicit None
checks and validation: for hidden_size, top_k, and num_experts_total use
int(args.hidden_size) if args.hidden_size is not None else int(prof.hidden_size)
(and similarly for top_k and num_experts_total), then immediately validate each
is a positive integer and raise a clear ValueError if <= 0; for quant_algo keep
the explicit None check (quant_algo = args.quant if args.quant is not None else
prof.quant_algo) rather than using "or" so empty/zero values aren't silently
discarded. Ensure you reference the existing symbols hidden_size, top_k,
num_experts_total, quant_algo, args, and prof when making the change.
- Around line 871-875: The --quant argparse entry is failing because type=lambda
s: QuantAlgo[str(s).upper()] returns a QuantAlgo member while choices=[q.name
for q in QuantAlgo] contains uppercase names, causing mismatch; fix by making
both type and choices use the enum values (lowercase) consistently: replace the
type lambda with something like lambda s: QuantAlgo(s.lower()) (or QuantAlgo(s)
if inputs are already canonical) and set choices to [q.value for q in QuantAlgo]
so argparse validates against the same string form; adjust the "--quant" option
where QuantAlgo and the current lambda are referenced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6ee64098-1feb-496c-9391-d1ccd730ff04
📒 Files selected for processing (1)
tests/microbenchmarks/bench_moe_comm.py
|
PR_Github #38847 [ run ] completed with state
|
|
/bot run --disable-fail-fast --reuse-test |
|
PR_Github #38869 [ run ] triggered by Bot. Commit: |
|
PR_Github #38869 [ run ] completed with state
|
Signed-off-by: Bo Li <[email protected]>
…ate OP duration, instead of CUDA event which may have overhead. Signed-off-by: Bo Li <[email protected]>
Signed-off-by: Bo Li <[email protected]>
c9fccd5 to
2bafd5d
Compare
|
/bot run --disable-fail-fast |
1 similar comment
|
/bot run --disable-fail-fast |
Signed-off-by: Bo Li <[email protected]>
|
/bot run --disable-fail-fast |
|
PR_Github #38993 [ run ] triggered by Bot. Commit: |
|
PR_Github #38994 [ run ] triggered by Bot. Commit: |
|
PR_Github #38994 [ run ] completed with state
|
|
/bot run --disable-fail-fast --reuse-test |
|
PR_Github #39025 [ run ] triggered by Bot. Commit: |
|
PR_Github #39025 [ run ] completed with state
|
|
/bot run --disable-fail-fast --reuse-test |
|
PR_Github #39046 [ run ] triggered by Bot. Commit: |
|
PR_Github #39046 [ run ] completed with state
|
|
/bot run --disable-fail-fast --reuse-test |
|
PR_Github #39061 [ run ] triggered by Bot. Commit: |
|
PR_Github #39061 [ run ] completed with state |
…#12192) Signed-off-by: Bo Li <[email protected]>
…#12192) Signed-off-by: Bo Li <[email protected]>
…#12192) Signed-off-by: Bo Li <[email protected]>
Further refine AlltoAll benchmark script.
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.Summary by CodeRabbit
New Features
Improvements
Changes