Skip to content

[TRTLLM-10061][fix] Use ceil_div for head/size calculations#12441

Merged
VALLIS-NERIA merged 1 commit into
NVIDIA:mainfrom
VALLIS-NERIA:user/xiweny/ceil_div_model_config
Mar 23, 2026
Merged

[TRTLLM-10061][fix] Use ceil_div for head/size calculations#12441
VALLIS-NERIA merged 1 commit into
NVIDIA:mainfrom
VALLIS-NERIA:user/xiweny/ceil_div_model_config

Conversation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator

@VALLIS-NERIA VALLIS-NERIA commented Mar 23, 2026

(Part of LinearAttention support of C++ KVCacheManager)

Replace Python floor division (//) with ceil_div when computing num_heads, hidden_size, num_kv_heads, and mlp_hidden_size in get_bindings_model_config. This ensures correct sharding for models whose head counts are not evenly divisible by the parallelism factors and aligns behavior with other code paths.

Summary by CodeRabbit

  • Bug Fixes
    • Updated model configuration dimension calculations to use ceiling division for tensor parallelism, improving accuracy when splitting non-evenly divisible dimensions.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@VALLIS-NERIA VALLIS-NERIA self-assigned this Mar 23, 2026
@VALLIS-NERIA VALLIS-NERIA requested a review from a team as a code owner March 23, 2026 06:46
@VALLIS-NERIA VALLIS-NERIA force-pushed the user/xiweny/ceil_div_model_config branch from d8eb15f to 74a6a7d Compare March 23, 2026 06:48
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

The pull request replaces floor-division calculations with ceiling-division calculations in model configuration bindings. A ceil_div helper function is introduced to compute partitioned dimensions for attention heads, hidden size, and MLP hidden size across both production code and corresponding test assertions.

Changes

Cohort / File(s) Summary
Model Config Bindings
tensorrt_llm/_torch/model_config.py
Introduced ceil_div helper function in get_bindings_model_config and replaced floor-division (//) with ceiling division for computing num_attention_heads, hidden_size, num_key_value_heads (scalar and per-layer variants), and intermediate_size. Special-case handling for DeciLMForCausalLM FFN multiplier also updated to use ceiling division.
Model Config Tests
tests/unittest/_torch/test_model_config.py
Updated test_get_bindings_model_config_attention_dp_attn_tp_override to add ceil_div helper and revised all assertion expectations for num_heads, hidden_size, num_kv_heads_per_layer, num_kv_heads(), and mlp_hidden_size to use ceiling division instead of floor division.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description provides a clear technical summary but lacks complete sections from the template (Description, Test Coverage sections are empty). Expand the Description section with more details about the issue, solution, and impact. Provide explicit list of test cases that cover the changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: replacing floor division with ceil_div for head/size calculations in the model config.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/model_config.py (1)

740-748: ⚠️ Potential issue | 🔴 Critical

Assertion will fail when attn_cp_size > 1 due to mismatched division factors.

The assertion at line 741 checks hidden_size % num_heads == 0, but num_heads and hidden_size are computed with different divisors:

  • num_heads is divided by attn_tp_size * attn_cp_size (line 662-663)
  • hidden_size is divided by only attn_tp_size (line 665-666)

When attn_cp_size > 1, these unequal divisions will produce values that no longer satisfy the divisibility constraint.

Example: With attn_tp_size=2, attn_cp_size=2, num_attention_heads=11, hidden_size=100:

  • num_heads = ceil_div(11, 4) = 3
  • hidden_size = ceil_div(100, 2) = 50
  • 50 % 3 ≠ 0 → assertion fails

This defeats the stated PR objective to support models with head counts not evenly divisible by parallelism factors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/model_config.py` around lines 740 - 748, The assertion
hidden_size % num_heads == 0 can fail when attn_cp_size > 1 because num_heads
and hidden_size were divided by different parallelism factors; replace the
assert with a safe computation that derives head_size = ceil_div(hidden_size,
num_heads) (or equivalent integer ceiling division already used elsewhere) and
assign that to head_size while logging a warning about rounding instead of
crashing; update the block that sets head_size (referencing head_size,
hidden_size, num_heads and the existing ceil_div helper) to compute and use the
rounded head size rather than asserting exact divisibility.
🧹 Nitpick comments (3)
tests/unittest/_torch/test_model_config.py (2)

76-77: Duplicate ceil_div definition—consider reusing from production code.

This helper duplicates the one in model_config.py and pyexecutor/_util.py. If the utility were a public import, the test could reuse it and stay in sync automatically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/test_model_config.py` around lines 76 - 77, The test
defines a duplicate helper ceil_div which duplicates the implementation in
production (model_config.py and pyexecutor/_util.py); remove the local ceil_div
in tests/unittest/_torch/test_model_config.py and import the canonical function
from the production utility (e.g., import ceil_div from pyexecutor._util or the
public function in model_config) so the test reuses the single source of truth
and stays in sync with production code.

56-67: Test doesn't exercise ceiling-division edge cases.

The test uses num_attention_heads=16 with parallelism factors that multiply to 8 (e.g., tp_size=4, cp_size=2). Since 16 / 8 = 2 exactly, ceil_div(16, 8) == 16 // 8. The test passes regardless of whether ceiling or floor division is used.

Consider adding a test case where values are not evenly divisible to validate that ceiling division produces the expected behavior:

🧪 Proposed test case for non-evenly divisible heads
`@pytest.mark.parametrize`(
    "num_attention_heads,expected_sharded_heads",
    [
        pytest.param(16, 2, id="evenly_divisible"),  # 16 / 8 = 2
        pytest.param(10, 2, id="ceil_rounds_up"),    # ceil(10 / 8) = 2
        pytest.param(9, 2, id="ceil_rounds_up_2"),   # ceil(9 / 8) = 2
    ],
)
def test_ceil_div_sharding(num_attention_heads, expected_sharded_heads):
    ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/test_model_config.py` around lines 56 - 67, The current
test test_get_bindings_model_config_attention_dp_attn_tp_override only uses
num_attention_heads that divide evenly by the product of parallelism factors, so
it doesn't catch ceiling-division bugs; update the test to parametrically
exercise non-evenly-divisible head counts by adding cases (e.g.,
num_attention_heads=10 and 9) alongside the existing 16 and compute the expected
per-shard head count (expected_sharded_heads = ceil(num_attention_heads /
(tp_size * cp_size))) and assert the resulting Mapping/make_pretrained_config
behavior produces that expected_sharded_heads (referencing Mapping,
make_pretrained_config and
test_get_bindings_model_config_attention_dp_attn_tp_override to locate where to
add the new parametrized inputs and assertions).
tensorrt_llm/_torch/model_config.py (1)

659-660: Consider importing the existing ceil_div utility.

A typed ceil_div helper already exists in tensorrt_llm/_torch/pyexecutor/_util.py (lines 53-54). Reusing it avoids duplication and maintains consistency across the codebase.

♻️ Proposed refactor to reuse existing utility
+from tensorrt_llm._torch.pyexecutor._util import ceil_div
 ...
-        def ceil_div(a, b):
-            return (a + b - 1) // b
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/model_config.py` around lines 659 - 660, Remove the local
duplicate ceil_div function in model_config and import the existing typed
utility from tensorrt_llm._torch.pyexecutor._util; replace the local definition
with "from tensorrt_llm._torch.pyexecutor._util import ceil_div" and ensure any
callers in model_config continue to use ceil_div (preserving type
compatibility), so the code reuses the single canonical implementation instead
of duplicating it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/model_config.py`:
- Around line 662-666: The CI formatting hooks failed for
tensorrt_llm/_torch/model_config.py; run the project's formatters locally and
commit the fixes: run the pre-commit formatter (pre-commit run --files
tensorrt_llm/_torch/model_config.py) or run yapf/ruff-format directly to
reformat the occurrences around ceil_div usage for num_heads and hidden_size
(references: ceil_div, num_heads, hidden_size, self.pretrained_config,
attn_tp_size, attn_cp_size) so the file matches the project's style and passes
CI.

In `@tests/unittest/_torch/test_model_config.py`:
- Around line 73-77: CI failed due to formatting in
tests/unittest/_torch/test_model_config.py; run the configured formatters and
apply their fixes: execute the pre-commit hook (pre-commit run --files
tests/unittest/_torch/test_model_config.py) or run yapf/ruff-format manually,
then ensure the block around the attn_tp_size/attn_cp_size assignments and the
ceil_div function follows project style (correct spacing, line breaks, and
indentation) so that function ceil_div and the surrounding lines satisfy yapf
and ruff-format checks before committing.

---

Outside diff comments:
In `@tensorrt_llm/_torch/model_config.py`:
- Around line 740-748: The assertion hidden_size % num_heads == 0 can fail when
attn_cp_size > 1 because num_heads and hidden_size were divided by different
parallelism factors; replace the assert with a safe computation that derives
head_size = ceil_div(hidden_size, num_heads) (or equivalent integer ceiling
division already used elsewhere) and assign that to head_size while logging a
warning about rounding instead of crashing; update the block that sets head_size
(referencing head_size, hidden_size, num_heads and the existing ceil_div helper)
to compute and use the rounded head size rather than asserting exact
divisibility.

---

Nitpick comments:
In `@tensorrt_llm/_torch/model_config.py`:
- Around line 659-660: Remove the local duplicate ceil_div function in
model_config and import the existing typed utility from
tensorrt_llm._torch.pyexecutor._util; replace the local definition with "from
tensorrt_llm._torch.pyexecutor._util import ceil_div" and ensure any callers in
model_config continue to use ceil_div (preserving type compatibility), so the
code reuses the single canonical implementation instead of duplicating it.

In `@tests/unittest/_torch/test_model_config.py`:
- Around line 76-77: The test defines a duplicate helper ceil_div which
duplicates the implementation in production (model_config.py and
pyexecutor/_util.py); remove the local ceil_div in
tests/unittest/_torch/test_model_config.py and import the canonical function
from the production utility (e.g., import ceil_div from pyexecutor._util or the
public function in model_config) so the test reuses the single source of truth
and stays in sync with production code.
- Around line 56-67: The current test
test_get_bindings_model_config_attention_dp_attn_tp_override only uses
num_attention_heads that divide evenly by the product of parallelism factors, so
it doesn't catch ceiling-division bugs; update the test to parametrically
exercise non-evenly-divisible head counts by adding cases (e.g.,
num_attention_heads=10 and 9) alongside the existing 16 and compute the expected
per-shard head count (expected_sharded_heads = ceil(num_attention_heads /
(tp_size * cp_size))) and assert the resulting Mapping/make_pretrained_config
behavior produces that expected_sharded_heads (referencing Mapping,
make_pretrained_config and
test_get_bindings_model_config_attention_dp_attn_tp_override to locate where to
add the new parametrized inputs and assertions).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2305cc2e-d696-43ca-8760-b0f2d3c3e314

📥 Commits

Reviewing files that changed from the base of the PR and between 124746e and 74a6a7d.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/model_config.py
  • tests/unittest/_torch/test_model_config.py

Comment thread tensorrt_llm/_torch/model_config.py Outdated
Comment thread tests/unittest/_torch/test_model_config.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39895 [ run ] triggered by Bot. Commit: 74a6a7d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39895 [ run ] completed with state FAILURE. Commit: 74a6a7d
/LLM/main/L0_MergeRequest_PR pipeline #31064 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…config

Replace Python floor division (//) with ceil_div when computing
num_heads, hidden_size, num_kv_heads, and mlp_hidden_size in
get_bindings_model_config. This ensures correct sharding for models
whose head counts are not evenly divisible by the parallelism factors.

Signed-off-by: Xiwen Yu <[email protected]>
@VALLIS-NERIA VALLIS-NERIA force-pushed the user/xiweny/ceil_div_model_config branch from 74a6a7d to 27ef0bf Compare March 23, 2026 07:26
@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39899 [ run ] triggered by Bot. Commit: 27ef0bf Link to invocation

@VALLIS-NERIA VALLIS-NERIA enabled auto-merge (squash) March 23, 2026 10:29
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39899 [ run ] completed with state SUCCESS. Commit: 27ef0bf
/LLM/main/L0_MergeRequest_PR pipeline #31068 completed with status: 'SUCCESS'

CI Report

Link to invocation

@VALLIS-NERIA VALLIS-NERIA merged commit ca3d6a1 into NVIDIA:main Mar 23, 2026
6 of 7 checks passed
longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants