[None][fix] Revert "[None][chore] KV Connector Refactor (#11078)" by jthomson04 · Pull Request #11872 · NVIDIA/TensorRT-LLM

jthomson04 · 2026-03-03T19:40:25Z

This reverts commit 0338bb2.

Summary by CodeRabbit

Release Notes

Refactor
- Simplified KV cache batch preparation logic by consolidating scheduling paths.
- Updated sequence management to remove boolean signaling and use assertion-based checks instead.
New Features
- Added support for asynchronous KV cache loading with request tracking and state management.
Tests
- Updated integration and unit tests to verify asynchronous loading behavior and timing adjustments.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

pcastonguay · 2026-03-03T19:43:29Z

/bot run --disable-fail-fast

coderabbitai · 2026-03-03T19:49:05Z

📝 Walkthrough

Walkthrough

Changes the return type of addSequence from bool to void across KV cache manager implementations, refactors asynchronous KV loading state tracking in KvCacheConnectorManager, and simplifies the scheduling path in PyExecutor while making sequence allocation conditional on first-context-chunk status.

Changes

Cohort / File(s)	Summary
KV Cache Manager API `cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`, `cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`, `cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp`	Changed `addSequence` return type from `bool` to `void` in base class, implementation, and Python binding. C++ implementation now uses `TLLM_CHECK` for assertion on emplacement success instead of returning false.
Async KV Loading State Management `tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py`	Introduced `finished_async_loading_requests` dictionary for tracking completed async loads. Added `should_add_sequence()` helper and refactored `take_scheduled_requests_pending_load()` to replace `mark_ready_requests`, improving state transitions for asynchronously loading requests.
Scheduler Flow Simplification `tensorrt_llm/_torch/pyexecutor/py_executor.py`, `tensorrt_llm/_torch/pyexecutor/resource_manager.py`	Simplified KV cache scheduling by using `take_scheduled_requests_pending_load()` in batch preparation. Made `add_sequence` calls conditional on `is_first_context_chunk` status and delegated to `should_add_sequence()`. Added explicit calls to `build_scheduler_output()` in resource managers after cache allocation.
Test Updates `tests/integration/defs/llmapi/test_llm_api_connector.py`, `tests/unittest/_torch/test_connector.py`	Adjusted token count expectations for overlap scheduler behavior, introduced `generate_and_sleep()` wrapper for timing control in integration tests, and added new unit test `test_connector_manager_take_scheduled_requests` covering the refactored scheduling flow.

Sequence Diagram

sequenceDiagram
    participant ResourceMgr as ResourceManager
    participant KVConnMgr as KvCacheConnectorManager
    participant KVCacheMgr as KVCacheManager
    participant Executor as PyExecutor

    Note over ResourceMgr,Executor: Async KV Loading Flow

    Executor->>ResourceMgr: prepare_resources(scheduled_batch)
    activate ResourceMgr
    
    loop For each request in batch
        alt is_first_context_chunk
            ResourceMgr->>KVConnMgr: should_add_sequence(request)
            activate KVConnMgr
            alt request in finished_async_loading_requests
                KVConnMgr-->>ResourceMgr: true
            else
                KVConnMgr-->>ResourceMgr: false
            end
            deactivate KVConnMgr
            
            alt should_add == true
                ResourceMgr->>KVCacheMgr: add_sequence(req_id, input_len, beam_width)
                activate KVCacheMgr
                KVCacheMgr->>KVCacheMgr: assert(emplace successful)
                KVCacheMgr-->>ResourceMgr: void (no return)
                deactivate KVCacheMgr
            end
        end
    end
    
    ResourceMgr->>KVConnMgr: build_scheduler_output(scheduled_batch)
    activate KVConnMgr
    KVConnMgr->>KVConnMgr: process pending async loads
    KVConnMgr-->>ResourceMgr: scheduler output
    deactivate KVConnMgr
    
    ResourceMgr-->>Executor: prepared batch
    deactivate ResourceMgr

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[None][chore] KV Connector Refactor #11078: Directly related as both modify the addSequence return type and implementation in kvCacheManager across multiple files.

Suggested reviewers

pcastonguay
shaharmor98

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.68% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description is incomplete and does not follow the required template. Only a commit revert statement is provided without filling any required sections.	Fill in the Description section explaining why this revert is necessary. Complete the Test Coverage section listing relevant tests. Address PR Checklist items and confirm all checks are reviewed. Remove the template boilerplate and provide substantive explanations.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly indicates this is a revert of a previous KV Connector Refactor commit, accurately describing the primary change.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py`:
- Around line 461-472: The docstring for take_scheduled_requests_pending_load
incorrectly states it returns filtered scheduled requests while the method
actually mutates scheduled_requests.context_requests in place and returns None;
update the docstring for take_scheduled_requests_pending_load to state that it
modifies the passed ScheduledRequests object in-place (removing context requests
that are pending load) and does not return a value, and include a brief note
about the side-effect on scheduled_requests.context_requests so callers don't
expect a new object or to use the return value.

In `@tests/integration/defs/llmapi/test_llm_api_connector.py`:
- Around line 435-442: The third call to model.generate can race with async
connector cleanup; replace the raw synchronous call model.generate([2] * 110,
sampling_params=sampling_params) with a sleep-wrapped await to yield the event
loop first (e.g., await asyncio.sleep(0) followed by await model.generate([2] *
110, sampling_params=sampling_params)) so the connector work can complete before
admission; keep references to model.generate and sampling_params when making the
change.

In `@tests/unittest/_torch/test_connector.py`:
- Around line 140-145: The request mocks (req0, req1) used in the test cause
get_num_new_matched_tokens to raise because request.is_generation_only_request
is not set and MagicMock evaluates truthy; explicitly set
req0.is_generation_only_request = False and req1.is_generation_only_request =
False in the test before calling get_num_new_matched_tokens (or construct the
mocks with that attribute) so the boolean guard behaves correctly and the
RuntimeError is not raised.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d348ab and bb101f6.

📒 Files selected for processing (8)

cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/pyexecutor/resource_manager.py
tests/integration/defs/llmapi/test_llm_api_connector.py
tests/unittest/_torch/test_connector.py

tensorrt-cicd · 2026-03-03T19:52:10Z

PR_Github #37550 [ run ] triggered by Bot. Commit: bb101f6 Link to invocation

tensorrt-cicd · 2026-03-04T01:43:56Z

PR_Github #37550 [ run ] completed with state SUCCESS. Commit: bb101f6
/LLM/main/L0_MergeRequest_PR pipeline #29052 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

pcastonguay · 2026-03-04T14:12:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-04T14:19:33Z

PR_Github #37709 [ run ] triggered by Bot. Commit: 1975371 Link to invocation

tensorrt-cicd · 2026-03-04T19:17:56Z

PR_Github #37709 [ run ] completed with state SUCCESS. Commit: 1975371
/LLM/main/L0_MergeRequest_PR pipeline #29188 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

pcastonguay · 2026-03-04T19:21:27Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-04T19:31:32Z

PR_Github #37749 [ run ] triggered by Bot. Commit: 1975371 Link to invocation

tensorrt-cicd · 2026-03-04T22:40:43Z

PR_Github #37749 [ run ] completed with state SUCCESS. Commit: 1975371
/LLM/main/L0_MergeRequest_PR pipeline #29219 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

eopXD

LGTM since its a direct revert.

pcastonguay · 2026-03-05T18:37:40Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-05T18:44:30Z

PR_Github #37900 [ run ] triggered by Bot. Commit: 1975371 Link to invocation

tensorrt-cicd · 2026-03-05T20:37:13Z

PR_Github #37900 [ run ] completed with state SUCCESS. Commit: 1975371
/LLM/main/L0_MergeRequest_PR pipeline #29346 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jthomson04 · 2026-03-06T00:59:55Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-06T01:06:03Z

PR_Github #37929 [ run ] triggered by Bot. Commit: 5390cf1 Link to invocation

tensorrt-cicd · 2026-03-06T05:28:55Z

PR_Github #37929 [ run ] completed with state SUCCESS. Commit: 5390cf1
/LLM/main/L0_MergeRequest_PR pipeline #29373 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jthomson04 · 2026-03-07T20:04:42Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-07T20:12:04Z

PR_Github #38109 [ run ] triggered by Bot. Commit: d80c3b7 Link to invocation

tensorrt-cicd · 2026-03-08T02:37:43Z

PR_Github #38109 [ run ] completed with state SUCCESS. Commit: d80c3b7
/LLM/main/L0_MergeRequest_PR pipeline #29520 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jthomson04 · 2026-03-09T17:44:49Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-09T17:51:52Z

PR_Github #38308 [ run ] triggered by Bot. Commit: d80c3b7 Link to invocation

tensorrt-cicd · 2026-03-09T20:48:24Z

PR_Github #38308 [ run ] completed with state SUCCESS. Commit: d80c3b7
/LLM/main/L0_MergeRequest_PR pipeline #29685 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: jthomson04 <[email protected]>

jthomson04 · 2026-03-10T16:48:51Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-10T16:56:39Z

PR_Github #38468 [ run ] triggered by Bot. Commit: ef58317 Link to invocation

tensorrt-cicd · 2026-03-11T00:34:25Z

PR_Github #38468 [ run ] completed with state SUCCESS. Commit: ef58317
/LLM/main/L0_MergeRequest_PR pipeline #29822 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jthomson04 · 2026-03-11T02:24:52Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-11T02:31:30Z

PR_Github #38520 [ run ] triggered by Bot. Commit: ef58317 Link to invocation

tensorrt-cicd · 2026-03-11T03:59:43Z

PR_Github #38520 [ run ] completed with state SUCCESS. Commit: ef58317
/LLM/main/L0_MergeRequest_PR pipeline #29868 completed with status: 'SUCCESS'

CI Report

Link to invocation

…" (NVIDIA#11872) Signed-off-by: jthomson04 <[email protected]>

jthomson04 requested review from a team as code owners March 3, 2026 19:40

jthomson04 requested review from dongxuy04 and pcastonguay March 3, 2026 19:40

github-actions Bot assigned jthomson04 Mar 3, 2026

pcastonguay approved these changes Mar 3, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 3, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py

Comment thread tests/integration/defs/llmapi/test_llm_api_connector.py

Comment thread tests/unittest/_torch/test_connector.py

jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from bb101f6 to 1975371 Compare March 3, 2026 22:31

eopXD approved these changes Mar 5, 2026

View reviewed changes

jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from 1975371 to 5390cf1 Compare March 6, 2026 00:59

jthomson04 enabled auto-merge (squash) March 6, 2026 01:11

jthomson04 closed this Mar 7, 2026

auto-merge was automatically disabled March 7, 2026 19:49
Pull request was closed

jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from 5390cf1 to 6b04973 Compare March 7, 2026 19:49

jthomson04 reopened this Mar 7, 2026

jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from dc413b7 to d80c3b7 Compare March 7, 2026 20:04

jthomson04 enabled auto-merge (squash) March 7, 2026 20:04

Revert

ef58317

Signed-off-by: jthomson04 <[email protected]>

jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from d80c3b7 to ef58317 Compare March 10, 2026 16:46

jthomson04 disabled auto-merge March 11, 2026 02:24

jthomson04 merged commit 71ffd8b into NVIDIA:main Mar 11, 2026
5 checks passed

limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026

[None][fix] Revert "[None][chore] KV Connector Refactor (NVIDIA#11078)…

33a7527

…" (NVIDIA#11872) Signed-off-by: jthomson04 <[email protected]>

longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026

[None][fix] Revert "[None][chore] KV Connector Refactor (NVIDIA#11078)…

3e67bce

…" (NVIDIA#11872) Signed-off-by: jthomson04 <[email protected]>

Conversation

jthomson04 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

pcastonguay commented Mar 3, 2026

Uh oh!

coderabbitai Bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 3, 2026

Uh oh!

tensorrt-cicd commented Mar 4, 2026

Uh oh!

pcastonguay commented Mar 4, 2026

Uh oh!

tensorrt-cicd commented Mar 4, 2026

Uh oh!

tensorrt-cicd commented Mar 4, 2026

Uh oh!

pcastonguay commented Mar 4, 2026

Uh oh!

tensorrt-cicd commented Mar 4, 2026

Uh oh!

tensorrt-cicd commented Mar 4, 2026

Uh oh!

eopXD left a comment

Choose a reason for hiding this comment

Uh oh!

pcastonguay commented Mar 5, 2026

Uh oh!

tensorrt-cicd commented Mar 5, 2026

Uh oh!

tensorrt-cicd commented Mar 5, 2026

Uh oh!

jthomson04 commented Mar 6, 2026

Uh oh!

tensorrt-cicd commented Mar 6, 2026

Uh oh!

tensorrt-cicd commented Mar 6, 2026

Uh oh!

jthomson04 commented Mar 7, 2026

Uh oh!

tensorrt-cicd commented Mar 7, 2026

Uh oh!

tensorrt-cicd commented Mar 8, 2026

Uh oh!

jthomson04 commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

jthomson04 commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 11, 2026

Uh oh!

jthomson04 commented Mar 11, 2026

Uh oh!

tensorrt-cicd commented Mar 11, 2026

jthomson04 commented Mar 3, 2026 •

edited

Loading

coderabbitai Bot commented Mar 3, 2026 •

edited

Loading