[TRTLLM-10061][fix] Use ceil_div for head/size calculations#12441
Conversation
d8eb15f to
74a6a7d
Compare
📝 WalkthroughWalkthroughThe pull request replaces floor-division calculations with ceiling-division calculations in model configuration bindings. A Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/bot run |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/model_config.py (1)
740-748:⚠️ Potential issue | 🔴 CriticalAssertion will fail when
attn_cp_size > 1due to mismatched division factors.The assertion at line 741 checks
hidden_size % num_heads == 0, butnum_headsandhidden_sizeare computed with different divisors:
num_headsis divided byattn_tp_size * attn_cp_size(line 662-663)hidden_sizeis divided by onlyattn_tp_size(line 665-666)When
attn_cp_size > 1, these unequal divisions will produce values that no longer satisfy the divisibility constraint.Example: With
attn_tp_size=2,attn_cp_size=2,num_attention_heads=11,hidden_size=100:
num_heads = ceil_div(11, 4) = 3hidden_size = ceil_div(100, 2) = 5050 % 3 ≠ 0→ assertion failsThis defeats the stated PR objective to support models with head counts not evenly divisible by parallelism factors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/model_config.py` around lines 740 - 748, The assertion hidden_size % num_heads == 0 can fail when attn_cp_size > 1 because num_heads and hidden_size were divided by different parallelism factors; replace the assert with a safe computation that derives head_size = ceil_div(hidden_size, num_heads) (or equivalent integer ceiling division already used elsewhere) and assign that to head_size while logging a warning about rounding instead of crashing; update the block that sets head_size (referencing head_size, hidden_size, num_heads and the existing ceil_div helper) to compute and use the rounded head size rather than asserting exact divisibility.
🧹 Nitpick comments (3)
tests/unittest/_torch/test_model_config.py (2)
76-77: Duplicateceil_divdefinition—consider reusing from production code.This helper duplicates the one in
model_config.pyandpyexecutor/_util.py. If the utility were a public import, the test could reuse it and stay in sync automatically.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/_torch/test_model_config.py` around lines 76 - 77, The test defines a duplicate helper ceil_div which duplicates the implementation in production (model_config.py and pyexecutor/_util.py); remove the local ceil_div in tests/unittest/_torch/test_model_config.py and import the canonical function from the production utility (e.g., import ceil_div from pyexecutor._util or the public function in model_config) so the test reuses the single source of truth and stays in sync with production code.
56-67: Test doesn't exercise ceiling-division edge cases.The test uses
num_attention_heads=16with parallelism factors that multiply to 8 (e.g.,tp_size=4, cp_size=2). Since16 / 8 = 2exactly,ceil_div(16, 8) == 16 // 8. The test passes regardless of whether ceiling or floor division is used.Consider adding a test case where values are not evenly divisible to validate that ceiling division produces the expected behavior:
🧪 Proposed test case for non-evenly divisible heads
`@pytest.mark.parametrize`( "num_attention_heads,expected_sharded_heads", [ pytest.param(16, 2, id="evenly_divisible"), # 16 / 8 = 2 pytest.param(10, 2, id="ceil_rounds_up"), # ceil(10 / 8) = 2 pytest.param(9, 2, id="ceil_rounds_up_2"), # ceil(9 / 8) = 2 ], ) def test_ceil_div_sharding(num_attention_heads, expected_sharded_heads): ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/_torch/test_model_config.py` around lines 56 - 67, The current test test_get_bindings_model_config_attention_dp_attn_tp_override only uses num_attention_heads that divide evenly by the product of parallelism factors, so it doesn't catch ceiling-division bugs; update the test to parametrically exercise non-evenly-divisible head counts by adding cases (e.g., num_attention_heads=10 and 9) alongside the existing 16 and compute the expected per-shard head count (expected_sharded_heads = ceil(num_attention_heads / (tp_size * cp_size))) and assert the resulting Mapping/make_pretrained_config behavior produces that expected_sharded_heads (referencing Mapping, make_pretrained_config and test_get_bindings_model_config_attention_dp_attn_tp_override to locate where to add the new parametrized inputs and assertions).tensorrt_llm/_torch/model_config.py (1)
659-660: Consider importing the existingceil_divutility.A typed
ceil_divhelper already exists intensorrt_llm/_torch/pyexecutor/_util.py(lines 53-54). Reusing it avoids duplication and maintains consistency across the codebase.♻️ Proposed refactor to reuse existing utility
+from tensorrt_llm._torch.pyexecutor._util import ceil_div ... - def ceil_div(a, b): - return (a + b - 1) // b🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/model_config.py` around lines 659 - 660, Remove the local duplicate ceil_div function in model_config and import the existing typed utility from tensorrt_llm._torch.pyexecutor._util; replace the local definition with "from tensorrt_llm._torch.pyexecutor._util import ceil_div" and ensure any callers in model_config continue to use ceil_div (preserving type compatibility), so the code reuses the single canonical implementation instead of duplicating it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/model_config.py`:
- Around line 662-666: The CI formatting hooks failed for
tensorrt_llm/_torch/model_config.py; run the project's formatters locally and
commit the fixes: run the pre-commit formatter (pre-commit run --files
tensorrt_llm/_torch/model_config.py) or run yapf/ruff-format directly to
reformat the occurrences around ceil_div usage for num_heads and hidden_size
(references: ceil_div, num_heads, hidden_size, self.pretrained_config,
attn_tp_size, attn_cp_size) so the file matches the project's style and passes
CI.
In `@tests/unittest/_torch/test_model_config.py`:
- Around line 73-77: CI failed due to formatting in
tests/unittest/_torch/test_model_config.py; run the configured formatters and
apply their fixes: execute the pre-commit hook (pre-commit run --files
tests/unittest/_torch/test_model_config.py) or run yapf/ruff-format manually,
then ensure the block around the attn_tp_size/attn_cp_size assignments and the
ceil_div function follows project style (correct spacing, line breaks, and
indentation) so that function ceil_div and the surrounding lines satisfy yapf
and ruff-format checks before committing.
---
Outside diff comments:
In `@tensorrt_llm/_torch/model_config.py`:
- Around line 740-748: The assertion hidden_size % num_heads == 0 can fail when
attn_cp_size > 1 because num_heads and hidden_size were divided by different
parallelism factors; replace the assert with a safe computation that derives
head_size = ceil_div(hidden_size, num_heads) (or equivalent integer ceiling
division already used elsewhere) and assign that to head_size while logging a
warning about rounding instead of crashing; update the block that sets head_size
(referencing head_size, hidden_size, num_heads and the existing ceil_div helper)
to compute and use the rounded head size rather than asserting exact
divisibility.
---
Nitpick comments:
In `@tensorrt_llm/_torch/model_config.py`:
- Around line 659-660: Remove the local duplicate ceil_div function in
model_config and import the existing typed utility from
tensorrt_llm._torch.pyexecutor._util; replace the local definition with "from
tensorrt_llm._torch.pyexecutor._util import ceil_div" and ensure any callers in
model_config continue to use ceil_div (preserving type compatibility), so the
code reuses the single canonical implementation instead of duplicating it.
In `@tests/unittest/_torch/test_model_config.py`:
- Around line 76-77: The test defines a duplicate helper ceil_div which
duplicates the implementation in production (model_config.py and
pyexecutor/_util.py); remove the local ceil_div in
tests/unittest/_torch/test_model_config.py and import the canonical function
from the production utility (e.g., import ceil_div from pyexecutor._util or the
public function in model_config) so the test reuses the single source of truth
and stays in sync with production code.
- Around line 56-67: The current test
test_get_bindings_model_config_attention_dp_attn_tp_override only uses
num_attention_heads that divide evenly by the product of parallelism factors, so
it doesn't catch ceiling-division bugs; update the test to parametrically
exercise non-evenly-divisible head counts by adding cases (e.g.,
num_attention_heads=10 and 9) alongside the existing 16 and compute the expected
per-shard head count (expected_sharded_heads = ceil(num_attention_heads /
(tp_size * cp_size))) and assert the resulting Mapping/make_pretrained_config
behavior produces that expected_sharded_heads (referencing Mapping,
make_pretrained_config and
test_get_bindings_model_config_attention_dp_attn_tp_override to locate where to
add the new parametrized inputs and assertions).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2305cc2e-d696-43ca-8760-b0f2d3c3e314
📒 Files selected for processing (2)
tensorrt_llm/_torch/model_config.pytests/unittest/_torch/test_model_config.py
|
PR_Github #39895 [ run ] triggered by Bot. Commit: |
|
PR_Github #39895 [ run ] completed with state
|
…config Replace Python floor division (//) with ceil_div when computing num_heads, hidden_size, num_kv_heads, and mlp_hidden_size in get_bindings_model_config. This ensures correct sharding for models whose head counts are not evenly divisible by the parallelism factors. Signed-off-by: Xiwen Yu <[email protected]>
74a6a7d to
27ef0bf
Compare
|
/bot run |
|
PR_Github #39899 [ run ] triggered by Bot. Commit: |
|
PR_Github #39899 [ run ] completed with state |
…2441) Signed-off-by: Xiwen Yu <[email protected]>
(Part of LinearAttention support of C++ KVCacheManager)
Replace Python floor division (//) with ceil_div when computing num_heads, hidden_size, num_kv_heads, and mlp_hidden_size in get_bindings_model_config. This ensures correct sharding for models whose head counts are not evenly divisible by the parallelism factors and aligns behavior with other code paths.
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.