Skip to content

[None][fix] Revert "[None][chore] KV Connector Refactor (#11078)"#11872

Merged
jthomson04 merged 1 commit into
NVIDIA:mainfrom
jthomson04:jthomson04/revert-kv-connector-refactor
Mar 11, 2026
Merged

[None][fix] Revert "[None][chore] KV Connector Refactor (#11078)"#11872
jthomson04 merged 1 commit into
NVIDIA:mainfrom
jthomson04:jthomson04/revert-kv-connector-refactor

Conversation

@jthomson04
Copy link
Copy Markdown
Collaborator

@jthomson04 jthomson04 commented Mar 3, 2026

This reverts commit 0338bb2.

Summary by CodeRabbit

Release Notes

  • Refactor

    • Simplified KV cache batch preparation logic by consolidating scheduling paths.
    • Updated sequence management to remove boolean signaling and use assertion-based checks instead.
  • New Features

    • Added support for asynchronous KV cache loading with request tracking and state management.
  • Tests

    • Updated integration and unit tests to verify asynchronous loading behavior and timing adjustments.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@pcastonguay
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Changes the return type of addSequence from bool to void across KV cache manager implementations, refactors asynchronous KV loading state tracking in KvCacheConnectorManager, and simplifies the scheduling path in PyExecutor while making sequence allocation conditional on first-context-chunk status.

Changes

Cohort / File(s) Summary
KV Cache Manager API
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h, cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
Changed addSequence return type from bool to void in base class, implementation, and Python binding. C++ implementation now uses TLLM_CHECK for assertion on emplacement success instead of returning false.
Async KV Loading State Management
tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py
Introduced finished_async_loading_requests dictionary for tracking completed async loads. Added should_add_sequence() helper and refactored take_scheduled_requests_pending_load() to replace mark_ready_requests, improving state transitions for asynchronously loading requests.
Scheduler Flow Simplification
tensorrt_llm/_torch/pyexecutor/py_executor.py, tensorrt_llm/_torch/pyexecutor/resource_manager.py
Simplified KV cache scheduling by using take_scheduled_requests_pending_load() in batch preparation. Made add_sequence calls conditional on is_first_context_chunk status and delegated to should_add_sequence(). Added explicit calls to build_scheduler_output() in resource managers after cache allocation.
Test Updates
tests/integration/defs/llmapi/test_llm_api_connector.py, tests/unittest/_torch/test_connector.py
Adjusted token count expectations for overlap scheduler behavior, introduced generate_and_sleep() wrapper for timing control in integration tests, and added new unit test test_connector_manager_take_scheduled_requests covering the refactored scheduling flow.

Sequence Diagram

sequenceDiagram
    participant ResourceMgr as ResourceManager
    participant KVConnMgr as KvCacheConnectorManager
    participant KVCacheMgr as KVCacheManager
    participant Executor as PyExecutor

    Note over ResourceMgr,Executor: Async KV Loading Flow

    Executor->>ResourceMgr: prepare_resources(scheduled_batch)
    activate ResourceMgr
    
    loop For each request in batch
        alt is_first_context_chunk
            ResourceMgr->>KVConnMgr: should_add_sequence(request)
            activate KVConnMgr
            alt request in finished_async_loading_requests
                KVConnMgr-->>ResourceMgr: true
            else
                KVConnMgr-->>ResourceMgr: false
            end
            deactivate KVConnMgr
            
            alt should_add == true
                ResourceMgr->>KVCacheMgr: add_sequence(req_id, input_len, beam_width)
                activate KVCacheMgr
                KVCacheMgr->>KVCacheMgr: assert(emplace successful)
                KVCacheMgr-->>ResourceMgr: void (no return)
                deactivate KVCacheMgr
            end
        end
    end
    
    ResourceMgr->>KVConnMgr: build_scheduler_output(scheduled_batch)
    activate KVConnMgr
    KVConnMgr->>KVConnMgr: process pending async loads
    KVConnMgr-->>ResourceMgr: scheduler output
    deactivate KVConnMgr
    
    ResourceMgr-->>Executor: prepared batch
    deactivate ResourceMgr
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • pcastonguay
  • shaharmor98
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is incomplete and does not follow the required template. Only a commit revert statement is provided without filling any required sections. Fill in the Description section explaining why this revert is necessary. Complete the Test Coverage section listing relevant tests. Address PR Checklist items and confirm all checks are reviewed. Remove the template boilerplate and provide substantive explanations.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly indicates this is a revert of a previous KV Connector Refactor commit, accurately describing the primary change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py`:
- Around line 461-472: The docstring for take_scheduled_requests_pending_load
incorrectly states it returns filtered scheduled requests while the method
actually mutates scheduled_requests.context_requests in place and returns None;
update the docstring for take_scheduled_requests_pending_load to state that it
modifies the passed ScheduledRequests object in-place (removing context requests
that are pending load) and does not return a value, and include a brief note
about the side-effect on scheduled_requests.context_requests so callers don't
expect a new object or to use the return value.

In `@tests/integration/defs/llmapi/test_llm_api_connector.py`:
- Around line 435-442: The third call to model.generate can race with async
connector cleanup; replace the raw synchronous call model.generate([2] * 110,
sampling_params=sampling_params) with a sleep-wrapped await to yield the event
loop first (e.g., await asyncio.sleep(0) followed by await model.generate([2] *
110, sampling_params=sampling_params)) so the connector work can complete before
admission; keep references to model.generate and sampling_params when making the
change.

In `@tests/unittest/_torch/test_connector.py`:
- Around line 140-145: The request mocks (req0, req1) used in the test cause
get_num_new_matched_tokens to raise because request.is_generation_only_request
is not set and MagicMock evaluates truthy; explicitly set
req0.is_generation_only_request = False and req1.is_generation_only_request =
False in the test before calling get_num_new_matched_tokens (or construct the
mocks with that attribute) so the boolean guard behaves correctly and the
RuntimeError is not raised.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d348ab and bb101f6.

📒 Files selected for processing (8)
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
  • tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tensorrt_llm/_torch/pyexecutor/resource_manager.py
  • tests/integration/defs/llmapi/test_llm_api_connector.py
  • tests/unittest/_torch/test_connector.py

Comment thread tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py
Comment thread tests/integration/defs/llmapi/test_llm_api_connector.py
Comment thread tests/unittest/_torch/test_connector.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37550 [ run ] triggered by Bot. Commit: bb101f6 Link to invocation

@jthomson04 jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from bb101f6 to 1975371 Compare March 3, 2026 22:31
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37550 [ run ] completed with state SUCCESS. Commit: bb101f6
/LLM/main/L0_MergeRequest_PR pipeline #29052 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@pcastonguay
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37709 [ run ] triggered by Bot. Commit: 1975371 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37709 [ run ] completed with state SUCCESS. Commit: 1975371
/LLM/main/L0_MergeRequest_PR pipeline #29188 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@pcastonguay
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37749 [ run ] triggered by Bot. Commit: 1975371 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37749 [ run ] completed with state SUCCESS. Commit: 1975371
/LLM/main/L0_MergeRequest_PR pipeline #29219 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Copy link
Copy Markdown
Collaborator

@eopXD eopXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM since its a direct revert.

@pcastonguay
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37900 [ run ] triggered by Bot. Commit: 1975371 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37900 [ run ] completed with state SUCCESS. Commit: 1975371
/LLM/main/L0_MergeRequest_PR pipeline #29346 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@jthomson04 jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from 1975371 to 5390cf1 Compare March 6, 2026 00:59
@jthomson04
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37929 [ run ] triggered by Bot. Commit: 5390cf1 Link to invocation

@jthomson04 jthomson04 enabled auto-merge (squash) March 6, 2026 01:11
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37929 [ run ] completed with state SUCCESS. Commit: 5390cf1
/LLM/main/L0_MergeRequest_PR pipeline #29373 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@jthomson04 jthomson04 closed this Mar 7, 2026
auto-merge was automatically disabled March 7, 2026 19:49

Pull request was closed

@jthomson04 jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from 5390cf1 to 6b04973 Compare March 7, 2026 19:49
@jthomson04 jthomson04 reopened this Mar 7, 2026
@jthomson04 jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from dc413b7 to d80c3b7 Compare March 7, 2026 20:04
@jthomson04
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@jthomson04 jthomson04 enabled auto-merge (squash) March 7, 2026 20:04
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38109 [ run ] triggered by Bot. Commit: d80c3b7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38109 [ run ] completed with state SUCCESS. Commit: d80c3b7
/LLM/main/L0_MergeRequest_PR pipeline #29520 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@jthomson04
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38308 [ run ] triggered by Bot. Commit: d80c3b7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38308 [ run ] completed with state SUCCESS. Commit: d80c3b7
/LLM/main/L0_MergeRequest_PR pipeline #29685 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: jthomson04 <[email protected]>
@jthomson04 jthomson04 force-pushed the jthomson04/revert-kv-connector-refactor branch from d80c3b7 to ef58317 Compare March 10, 2026 16:46
@jthomson04
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38468 [ run ] triggered by Bot. Commit: ef58317 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38468 [ run ] completed with state SUCCESS. Commit: ef58317
/LLM/main/L0_MergeRequest_PR pipeline #29822 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@jthomson04 jthomson04 disabled auto-merge March 11, 2026 02:24
@jthomson04
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38520 [ run ] triggered by Bot. Commit: ef58317 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38520 [ run ] completed with state SUCCESS. Commit: ef58317
/LLM/main/L0_MergeRequest_PR pipeline #29868 completed with status: 'SUCCESS'

CI Report

Link to invocation

@jthomson04 jthomson04 merged commit 71ffd8b into NVIDIA:main Mar 11, 2026
5 checks passed
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants