[None][fix] Revert "[None][chore] KV Connector Refactor (#11078)"#11872
Conversation
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughChanges the return type of Changes
Sequence DiagramsequenceDiagram
participant ResourceMgr as ResourceManager
participant KVConnMgr as KvCacheConnectorManager
participant KVCacheMgr as KVCacheManager
participant Executor as PyExecutor
Note over ResourceMgr,Executor: Async KV Loading Flow
Executor->>ResourceMgr: prepare_resources(scheduled_batch)
activate ResourceMgr
loop For each request in batch
alt is_first_context_chunk
ResourceMgr->>KVConnMgr: should_add_sequence(request)
activate KVConnMgr
alt request in finished_async_loading_requests
KVConnMgr-->>ResourceMgr: true
else
KVConnMgr-->>ResourceMgr: false
end
deactivate KVConnMgr
alt should_add == true
ResourceMgr->>KVCacheMgr: add_sequence(req_id, input_len, beam_width)
activate KVCacheMgr
KVCacheMgr->>KVCacheMgr: assert(emplace successful)
KVCacheMgr-->>ResourceMgr: void (no return)
deactivate KVCacheMgr
end
end
end
ResourceMgr->>KVConnMgr: build_scheduler_output(scheduled_batch)
activate KVConnMgr
KVConnMgr->>KVConnMgr: process pending async loads
KVConnMgr-->>ResourceMgr: scheduler output
deactivate KVConnMgr
ResourceMgr-->>Executor: prepared batch
deactivate ResourceMgr
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py`:
- Around line 461-472: The docstring for take_scheduled_requests_pending_load
incorrectly states it returns filtered scheduled requests while the method
actually mutates scheduled_requests.context_requests in place and returns None;
update the docstring for take_scheduled_requests_pending_load to state that it
modifies the passed ScheduledRequests object in-place (removing context requests
that are pending load) and does not return a value, and include a brief note
about the side-effect on scheduled_requests.context_requests so callers don't
expect a new object or to use the return value.
In `@tests/integration/defs/llmapi/test_llm_api_connector.py`:
- Around line 435-442: The third call to model.generate can race with async
connector cleanup; replace the raw synchronous call model.generate([2] * 110,
sampling_params=sampling_params) with a sleep-wrapped await to yield the event
loop first (e.g., await asyncio.sleep(0) followed by await model.generate([2] *
110, sampling_params=sampling_params)) so the connector work can complete before
admission; keep references to model.generate and sampling_params when making the
change.
In `@tests/unittest/_torch/test_connector.py`:
- Around line 140-145: The request mocks (req0, req1) used in the test cause
get_num_new_matched_tokens to raise because request.is_generation_only_request
is not set and MagicMock evaluates truthy; explicitly set
req0.is_generation_only_request = False and req1.is_generation_only_request =
False in the test before calling get_num_new_matched_tokens (or construct the
mocks with that attribute) so the boolean guard behaves correctly and the
RuntimeError is not raised.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.hcpp/tensorrt_llm/batch_manager/kvCacheManager.cppcpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpptensorrt_llm/_torch/pyexecutor/kv_cache_connector.pytensorrt_llm/_torch/pyexecutor/py_executor.pytensorrt_llm/_torch/pyexecutor/resource_manager.pytests/integration/defs/llmapi/test_llm_api_connector.pytests/unittest/_torch/test_connector.py
|
PR_Github #37550 [ run ] triggered by Bot. Commit: |
bb101f6 to
1975371
Compare
|
PR_Github #37550 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #37709 [ run ] triggered by Bot. Commit: |
|
PR_Github #37709 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #37749 [ run ] triggered by Bot. Commit: |
|
PR_Github #37749 [ run ] completed with state
|
eopXD
left a comment
There was a problem hiding this comment.
LGTM since its a direct revert.
|
/bot run --disable-fail-fast |
|
PR_Github #37900 [ run ] triggered by Bot. Commit: |
|
PR_Github #37900 [ run ] completed with state
|
1975371 to
5390cf1
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #37929 [ run ] triggered by Bot. Commit: |
|
PR_Github #37929 [ run ] completed with state
|
Pull request was closed
5390cf1 to
6b04973
Compare
dc413b7 to
d80c3b7
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #38109 [ run ] triggered by Bot. Commit: |
|
PR_Github #38109 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #38308 [ run ] triggered by Bot. Commit: |
|
PR_Github #38308 [ run ] completed with state
|
Signed-off-by: jthomson04 <[email protected]>
d80c3b7 to
ef58317
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #38468 [ run ] triggered by Bot. Commit: |
|
PR_Github #38468 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #38520 [ run ] triggered by Bot. Commit: |
|
PR_Github #38520 [ run ] completed with state |
…" (NVIDIA#11872) Signed-off-by: jthomson04 <[email protected]>
…" (NVIDIA#11872) Signed-off-by: jthomson04 <[email protected]>
This reverts commit 0338bb2.
Summary by CodeRabbit
Release Notes
Refactor
New Features
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.