[None][feat] Add support for FlexKV by pcastonguay · Pull Request #12512 · NVIDIA/TensorRT-LLM

pcastonguay · 2026-03-24T17:52:23Z

Summary by CodeRabbit

New Features
- Added ability to query request completion status.
- Enhanced KV cache connector with initialization synchronization and metadata handling.
- Extended request tracking to include connector-matched token metrics.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: scutizhang <[email protected]>

Signed-off-by: Patrice Castonguay <[email protected]>

coderabbitai · 2026-03-24T17:59:33Z

📝 Walkthrough

Walkthrough

Introduced isFinishedWithoutError() method to detect whether requests finished without error conditions; enhanced KV cache connector with initialization blocking and metadata handling; integrated metadata handling into executor loops with state tracking.

Changes

Cohort / File(s)	Summary
C++ API Additions `cpp/include/tensorrt_llm/batch_manager/llmRequest.h`, `cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp`	Added `isFinishedWithoutError()` method to check finish reasons across beams; exposed as read-only Python property `is_finished_without_error`.
KV Cache Connector Enhancements `tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py`	Added `wait_for_initialization()` blocking method to scheduler and manager; updated `get_num_new_matched_tokens()` to persist matched token count to request; updated `handle_metadata()` to early-return when scheduler output is unavailable.
Executor Integration `tensorrt_llm/_torch/pyexecutor/py_executor.py`, `tensorrt_llm/_torch/pyexecutor/llm_request.py`	Added blocking KV connector initialization call after hook registration; integrated `handle_metadata()` calls before batch loading in executor loops; added `py_num_connector_matched_tokens` state tracking to requests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description is largely incomplete and uses the template placeholder text without providing actual implementation details, rationale, or test coverage information.	Replace template placeholders with concrete details: explain the FlexKV feature being added, why it's needed, what test coverage exists, and confirm all checklist items are properly addressed.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly indicates the main feature being added (FlexKV support) and is concise and specific.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py`:
- Around line 498-500: The early return when self._scheduler_output is None
allows stale _metadata to persist; update handle_metadata() so that when
self._scheduler_output is None it explicitly calls _clear_connector_meta()
before returning (or alternatively ensure start_load_kv() implementations check
for and handle missing metadata), i.e., modify the flow around
bind_connector_meta(), _scheduler_output, and _clear_connector_meta() so
_metadata is cleared whenever _scheduler_output is absent to avoid passing stale
metadata into start_load_kv().

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c77cd546-3774-4188-8ba2-4c2b9cb7e6a3

📥 Commits

Reviewing files that changed from the base of the PR and between 73a02ee and 800372a.

📒 Files selected for processing (5)

cpp/include/tensorrt_llm/batch_manager/llmRequest.h
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py
tensorrt_llm/_torch/pyexecutor/llm_request.py
tensorrt_llm/_torch/pyexecutor/py_executor.py

pcastonguay · 2026-03-24T18:00:00Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-24T18:06:44Z

PR_Github #40158 [ run ] triggered by Bot. Commit: c5b1399 Link to invocation

pcastonguay · 2026-03-24T18:07:56Z

Same changes as in #9698, moving to my fork to merge more quickly.

tensorrt-cicd · 2026-03-25T03:32:51Z

PR_Github #40158 [ run ] completed with state SUCCESS. Commit: c5b1399
/LLM/main/L0_MergeRequest_PR pipeline #31303 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

pcastonguay · 2026-03-25T12:44:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-25T12:50:40Z

PR_Github #40324 [ run ] triggered by Bot. Commit: c5b1399 Link to invocation

tensorrt-cicd · 2026-03-25T22:02:01Z

PR_Github #40324 [ run ] completed with state SUCCESS. Commit: c5b1399
/LLM/main/L0_MergeRequest_PR pipeline #31434 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

pcastonguay · 2026-03-25T23:31:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-25T23:39:09Z

PR_Github #40387 [ run ] triggered by Bot. Commit: c5b1399 Link to invocation

tensorrt-cicd · 2026-03-26T02:06:54Z

PR_Github #40387 [ run ] completed with state FAILURE. Commit: c5b1399
/LLM/main/L0_MergeRequest_PR pipeline #31484 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

pcastonguay · 2026-03-26T14:03:36Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-26T14:10:10Z

PR_Github #40452 [ run ] triggered by Bot. Commit: c5b1399 Link to invocation

tensorrt-cicd · 2026-03-26T15:58:07Z

PR_Github #40452 [ run ] completed with state SUCCESS. Commit: c5b1399
/LLM/main/L0_MergeRequest_PR pipeline #31542 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

pcastonguay · 2026-03-27T00:43:02Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-27T00:49:04Z

PR_Github #40469 [ run ] triggered by Bot. Commit: c5b1399 Link to invocation

tensorrt-cicd · 2026-03-27T02:32:32Z

PR_Github #40469 [ run ] completed with state SUCCESS. Commit: c5b1399
/LLM/main/L0_MergeRequest_PR pipeline #31558 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

pcastonguay · 2026-03-27T15:58:47Z

/bot skip --comment "Flaky multi-GPU Nemotron test"

tensorrt-cicd · 2026-03-27T16:04:34Z

PR_Github #40511 [ skip ] triggered by Bot. Commit: c5b1399 Link to invocation

tensorrt-cicd · 2026-03-27T16:10:55Z

PR_Github #40511 [ skip ] completed with state SUCCESS. Commit: c5b1399
Skipping testing for commit c5b1399

Link to invocation

pcastonguay requested a review from a team as a code owner March 24, 2026 17:52

pcastonguay requested a review from achartier March 24, 2026 17:52

github-actions Bot assigned pcastonguay Mar 24, 2026

pcastonguay requested review from Shixiaowei02 and jthomson04 March 24, 2026 17:52

axxx03 and others added 2 commits March 24, 2026 10:58

Add flexkv support

e6e44db

Signed-off-by: scutizhang <[email protected]>

Fixing pre-commit

c5b1399

Signed-off-by: Patrice Castonguay <[email protected]>

coderabbitai Bot reviewed Mar 24, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/kv_cache_connector.py

pcastonguay force-pushed the feature/support_flexkv branch from 800372a to c5b1399 Compare March 24, 2026 17:59

jthomson04 approved these changes Mar 24, 2026

View reviewed changes

Funatiq approved these changes Mar 25, 2026

View reviewed changes

pcastonguay enabled auto-merge (squash) March 25, 2026 20:50

pcastonguay merged commit 789494f into NVIDIA:main Mar 27, 2026
5 checks passed

pcastonguay mentioned this pull request Mar 27, 2026

[None][feat] Support using FlexKV as anothor KV Cache Offloading option. #9698

Closed

Conversation

pcastonguay commented Mar 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pcastonguay commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

pcastonguay commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

pcastonguay commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

pcastonguay commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

pcastonguay commented Mar 26, 2026

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

pcastonguay commented Mar 27, 2026

Uh oh!

tensorrt-cicd commented Mar 27, 2026

Uh oh!

tensorrt-cicd commented Mar 27, 2026

Uh oh!

pcastonguay commented Mar 27, 2026

Uh oh!

tensorrt-cicd commented Mar 27, 2026

Uh oh!

tensorrt-cicd commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pcastonguay commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading