[TRTLLM-11579][feat] VisualGen batch inference support in serve module by JunyiXu-nv · Pull Request #12350 · NVIDIA/TensorRT-LLM

JunyiXu-nv · 2026-03-19T09:00:53Z

Add MediaStorage.save_images() for batch image saving from (B,H,W,C) tensors
Add MediaStorage.save_videos() for batch video saving from (B,T,H,W,C) tensors
Fix openai_image_generation endpoint to split batch output tensors and return all generated images in the response data list
Fix openai_image_edit endpoint with same batch handling
Fix openai_video_generation_sync to save all videos via save_videos()
Fix _generate_video_background (async) to save all videos via save_videos()
Map VideoGenerationRequest.n to params.num_images_per_prompt in parse_visual_gen_params (was previously only handled for image requests)
Update MockVisualGen to be batch-aware: expands single tensors to (N,...) when params.num_images_per_prompt > 1
Add batch tests for save_images/save_videos in test_media_storage.py
Add batch endpoint tests for n>1 in test_trtllm_serve_endpoints.py

Summary by CodeRabbit

Release Notes

New Features
- Added batch processing support for image and video generation, enabling multiple images/videos to be generated in a single request.
- Introduced n parameter for video generation requests to control output quantity, matching existing image generation behavior.
- Enhanced media storage with batch-aware APIs for handling multiple images and videos simultaneously.
Tests
- Added comprehensive test coverage for batch image and video operations.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-03-19T09:07:17Z

📝 Walkthrough

Walkthrough

This PR adds batch support for media storage in the TensorRT-LLM serving stack. Two new batch APIs (save_images, save_videos) are introduced in MediaStorage to handle multiple images or videos with filename indexing. The OpenAI server endpoints for image generation, image editing, and video generation are updated to use these new batch APIs. Parameter mapping for the n field is extended to video generation, and comprehensive batch-aware tests are added.

Changes

Cohort / File(s)	Summary
Media Storage Batch APIs `tensorrt_llm/serve/media_storage.py`	Added `save_images()` and `save_videos()` batch methods that normalize input tensor shapes, derive output filename extensions from format parameter, iterate over batch dimension, and return lists of saved file paths.
OpenAI Server Integration `tensorrt_llm/serve/openai_server.py`	Updated image generation, image editing, and video generation endpoints to normalize output tensors into lists and call the new batch save APIs (`save_images`, `save_videos`) instead of single-file operations; changed control flow to derive response paths from returned lists.
Parameter Mapping `tensorrt_llm/serve/visual_gen_utils.py`	Extended `parse_visual_gen_params` to map the `n` field from `VideoGenerationRequest` to `VisualGenParams.num_images_per_prompt` for video generation parity with image generation.
Batch Storage Tests `tests/unittest/_torch/visual_gen/test_media_storage.py`	Added unit tests validating batch behavior of `save_images()` and `save_videos()` with various input shapes, format parameters, and audio pairing scenarios.
Endpoint Batch Tests `tests/unittest/_torch/visual_gen/test_trtllm_serve_endpoints.py`	Extended mock visual generator with batch awareness and `_maybe_batch()` helper; added/strengthened endpoint tests for `n > 1` scenarios in image generation, image editing, and video generation (both sync and async).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server as OpenAI Server
    participant VisualGen as Visual Generator
    participant Storage as MediaStorage
    participant Disk as File System

    Client->>Server: Request with n=2 (images/videos)
    Server->>VisualGen: generate(num_images_per_prompt=2)
    VisualGen->>VisualGen: Generate batched output
    VisualGen-->>Server: Return (batch, H, W, C) tensor
    Server->>Storage: save_images/save_videos(batched_tensor, prefix, format)
    loop For each batch item
        Storage->>Disk: Write {prefix}_0.{ext}, {prefix}_1.{ext}, ...
    end
    Disk-->>Storage: File paths written
    Storage-->>Server: Return [path_0, path_1, ...]
    Server->>Server: Construct response with paths
    Server-->>Client: Return list of images/videos

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the JIRA ticket, feature type, and main objective of adding batch inference support for VisualGen in the serve module.
Description check	✅ Passed	The PR description provides comprehensive coverage of changes across all affected files with clear, bullet-point explanations of what was added and fixed, plus test coverage details.
Docstring Coverage	✅ Passed	Docstring coverage is 85.29% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can customize the tone of the review comments and chat replies.

Configure the tone_instructions setting to customize the tone of the review comments and chat replies. For example, you can set the tone to Act like a strict teacher, Act like a pirate and more.

coderabbitai

🧹 Nitpick comments (2)

tensorrt_llm/serve/openai_server.py (2)
1920-1933: Only first video path stored in job metadata.

When n > 1, all videos are saved to disk but only saved_paths[0] is stored in job.output_path. The other generated videos exist on disk (as {video_id}_1.mp4, etc.) but aren't tracked in the job metadata.

For MVP this is acceptable, but consider storing all paths (e.g., as a list) if clients need to retrieve all generated videos via the /v1/videos/{video_id}/content endpoint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/openai_server.py` around lines 1920 - 1933, The job
metadata only stores the first saved video path (saved_paths[0]) after calling
MediaStorage.save_videos, so when multiple videos are generated the rest are not
tracked; update the code that handles VIDEO_STORE.get(video_id) to assign all
returned paths to the job (e.g., set job.output_path to the full saved_paths
list or a new job.output_paths field) and persist the change (ensure
VIDEO_STORE.save/update is called if required), referencing
MediaStorage.save_videos, saved_paths, VIDEO_STORE, job.output_path (or add
job.output_paths) and video_id to locate where to change the assignment and
storage logic.
1738-1748: Batch videos saved but only first returned to client.

save_videos() persists all videos in the batch, but FileResponse returns only saved_paths[0]. This is the expected behavior for the sync endpoint (returning a single file), but clients requesting n > 1 might expect multiple videos.

Consider documenting this behavior or providing an alternative response format (e.g., JSON with URLs) when n > 1 is requested in the synchronous endpoint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/openai_server.py` around lines 1738 - 1748, The sync
endpoint currently saves all batch videos via MediaStorage.save_videos
(saved_paths) but always returns only the first file (actual_output_path) to the
client; update the handler in openai_server.py to detect when the request asked
for n > 1 and either (a) return a JSON response containing the list of
saved_paths (or accessible URLs) instead of a single FileResponse, or (b)
document the current behavior clearly and add an optional query flag to force
single-file vs multi-file responses; make the change around the block that calls
MediaStorage.save_videos and constructs the FileResponse so you either build and
return a JSON payload with all saved_paths when request.n > 1 or keep
FileResponse for single outputs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/serve/openai_server.py`:
- Around line 1920-1933: The job metadata only stores the first saved video path
(saved_paths[0]) after calling MediaStorage.save_videos, so when multiple videos
are generated the rest are not tracked; update the code that handles
VIDEO_STORE.get(video_id) to assign all returned paths to the job (e.g., set
job.output_path to the full saved_paths list or a new job.output_paths field)
and persist the change (ensure VIDEO_STORE.save/update is called if required),
referencing MediaStorage.save_videos, saved_paths, VIDEO_STORE, job.output_path
(or add job.output_paths) and video_id to locate where to change the assignment
and storage logic.
- Around line 1738-1748: The sync endpoint currently saves all batch videos via
MediaStorage.save_videos (saved_paths) but always returns only the first file
(actual_output_path) to the client; update the handler in openai_server.py to
detect when the request asked for n > 1 and either (a) return a JSON response
containing the list of saved_paths (or accessible URLs) instead of a single
FileResponse, or (b) document the current behavior clearly and add an optional
query flag to force single-file vs multi-file responses; make the change around
the block that calls MediaStorage.save_videos and constructs the FileResponse so
you either build and return a JSON payload with all saved_paths when request.n >
1 or keep FileResponse for single outputs.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3259c7d1-a87a-4759-8edd-bee7d9e2bde7

📥 Commits

Reviewing files that changed from the base of the PR and between 19e2030 and 750afe2.

📒 Files selected for processing (5)

tensorrt_llm/serve/media_storage.py
tensorrt_llm/serve/openai_server.py
tensorrt_llm/serve/visual_gen_utils.py
tests/unittest/_torch/visual_gen/test_media_storage.py
tests/unittest/_torch/visual_gen/test_trtllm_serve_endpoints.py

JunyiXu-nv · 2026-03-19T09:28:32Z

/bot run

tensorrt-cicd · 2026-03-19T09:34:20Z

PR_Github #39584 [ run ] triggered by Bot. Commit: dac8f78 Link to invocation

tensorrt-cicd · 2026-03-19T12:53:09Z

PR_Github #39584 [ run ] completed with state SUCCESS. Commit: dac8f78
/LLM/main/L0_MergeRequest_PR pipeline #30797 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-03-20T03:57:22Z

/bot run

tensorrt-cicd · 2026-03-20T04:04:27Z

PR_Github #39686 [ run ] triggered by Bot. Commit: 6922a16 Link to invocation

tensorrt-cicd · 2026-03-20T08:16:52Z

PR_Github #39686 [ run ] completed with state SUCCESS. Commit: 6922a16
/LLM/main/L0_MergeRequest_PR pipeline #30884 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-03-24T02:29:22Z

/bot run

tensorrt-cicd · 2026-03-24T02:35:11Z

PR_Github #40021 [ run ] triggered by Bot. Commit: d559788 Link to invocation

tensorrt-cicd · 2026-03-24T05:17:49Z

PR_Github #40021 [ run ] completed with state SUCCESS. Commit: d559788
/LLM/main/L0_MergeRequest_PR pipeline #31178 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-03-24T07:37:53Z

/bot run

tensorrt-cicd · 2026-03-24T07:43:21Z

PR_Github #40084 [ run ] triggered by Bot. Commit: d559788 Link to invocation

tensorrt-cicd · 2026-03-24T11:59:11Z

PR_Github #40084 [ run ] completed with state SUCCESS. Commit: d559788
/LLM/main/L0_MergeRequest_PR pipeline #31237 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-03-25T05:00:12Z

/bot run

tensorrt-cicd · 2026-03-25T05:06:01Z

PR_Github #40240 [ run ] triggered by Bot. Commit: 4bb8fd6 Link to invocation

tensorrt-cicd · 2026-03-25T11:42:33Z

PR_Github #40240 [ run ] completed with state SUCCESS. Commit: 4bb8fd6
/LLM/main/L0_MergeRequest_PR pipeline #31372 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-03-26T02:13:05Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-26T02:18:43Z

PR_Github #40405 [ run ] triggered by Bot. Commit: 4bb8fd6 Link to invocation

tensorrt-cicd · 2026-04-14T15:57:30Z

PR_Github #43154 [ run ] completed with state SUCCESS. Commit: 311165d
/LLM/main/L0_MergeRequest_PR pipeline #33784 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-04-15T02:12:48Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-15T02:18:36Z

PR_Github #43334 [ run ] triggered by Bot. Commit: f4feee0 Link to invocation

tensorrt-cicd · 2026-04-15T07:17:42Z

PR_Github #43334 [ run ] completed with state SUCCESS. Commit: f4feee0
/LLM/main/L0_MergeRequest_PR pipeline #33874 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-04-21T02:53:19Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-21T02:59:53Z

PR_Github #44594 [ run ] triggered by Bot. Commit: 7e2d420 Link to invocation

tensorrt-cicd · 2026-04-21T11:44:29Z

PR_Github #44594 [ run ] completed with state SUCCESS. Commit: 7e2d420
/LLM/main/L0_MergeRequest_PR pipeline #34980 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-04-22T02:52:37Z

/bot run

tensorrt-cicd · 2026-04-22T02:58:10Z

PR_Github #44857 [ run ] triggered by Bot. Commit: 7e2d420 Link to invocation

tensorrt-cicd · 2026-04-22T03:42:13Z

PR_Github #44857 [ run ] completed with state SUCCESS. Commit: 7e2d420
/LLM/main/L0_MergeRequest_PR pipeline #35198 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-04-23T01:29:26Z

/bot run

tensorrt-cicd · 2026-04-23T01:35:04Z

PR_Github #45057 [ run ] triggered by Bot. Commit: 7e2d420 Link to invocation

tensorrt-cicd · 2026-04-23T04:14:07Z

PR_Github #45057 [ run ] completed with state SUCCESS. Commit: 7e2d420
/LLM/main/L0_MergeRequest_PR pipeline #35360 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Port the batch-inference support from the original PR onto the new media/encoding + openai_video_routes layout introduced by NVIDIA#13635: - tensorrt_llm/media/encoding.py: add save_images() and save_videos() free functions plus a shared _resolve_batch_paths() helper. Both accept either a path prefix or an explicit List[str] of per-item paths. - tensorrt_llm/serve/openai_video_routes.py: sync and async video endpoints now call save_videos(). The async background task records every saved path on VideoJob.output_paths; delete_video() removes all of them. The sync endpoint still returns only the first file (OpenAI Videos API has no multi-file response yet). - tensorrt_llm/serve/openai_protocol.py: add VideoJob.output_paths for the multi-output case. - tensorrt_llm/serve/visual_gen_utils.py: map VideoGenerationRequest.n to VisualGenParams.num_images_per_prompt (already done for image requests on main). - tests: cover save_images/save_videos in tests/unittest/media/test_encoding.py and add n=2 batch tests for sync and async video endpoints. Signed-off-by: JunyiXu-nv <[email protected]>

JunyiXu-nv · 2026-05-11T03:05:51Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-11T03:11:50Z

PR_Github #47650 [ run ] triggered by Bot. Commit: 9d54687 Link to invocation

tensorrt-cicd · 2026-05-11T13:06:01Z

PR_Github #47650 [ run ] completed with state SUCCESS. Commit: 9d54687
/LLM/main/L0_MergeRequest_PR pipeline #37554 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JunyiXu-nv · 2026-05-12T02:30:43Z

/bot run

tensorrt-cicd · 2026-05-12T02:36:15Z

PR_Github #47834 [ run ] triggered by Bot. Commit: 9d54687 Link to invocation

tensorrt-cicd · 2026-05-12T05:36:51Z

PR_Github #47834 [ run ] completed with state SUCCESS. Commit: 9d54687
/LLM/main/L0_MergeRequest_PR pipeline #37717 completed with status: 'SUCCESS'

CI Report

Link to invocation

NVIDIA#12350) Signed-off-by: JunyiXu-nv <[email protected]>

JunyiXu-nv requested review from a team as code owners March 19, 2026 09:00

JunyiXu-nv requested a review from Superjomn March 19, 2026 09:00

github-actions Bot assigned JunyiXu-nv Mar 19, 2026

coderabbitai Bot reviewed Mar 19, 2026

View reviewed changes

JunyiXu-nv requested review from QiJune and zhenhuaw-me and removed request for Superjomn March 19, 2026 09:25

zhenhuaw-me reviewed Mar 20, 2026

View reviewed changes

Comment thread tensorrt_llm/serve/media_storage.py Outdated

Comment thread tensorrt_llm/serve/media_storage.py Outdated

JunyiXu-nv requested a review from a team as a code owner March 20, 2026 03:56

JunyiXu-nv requested a review from nv-guomingz March 20, 2026 03:56

JunyiXu-nv force-pushed the user/junyix/fix-trtllm-11579 branch from 6922a16 to d559788 Compare March 24, 2026 02:22

JunyiXu-nv force-pushed the user/junyix/fix-trtllm-11579 branch from 311165d to f4feee0 Compare April 15, 2026 02:06

JunyiXu-nv requested a review from a team as a code owner April 21, 2026 02:51

JunyiXu-nv requested review from moraxu and tijyojwad April 21, 2026 02:51

moraxu approved these changes May 8, 2026

View reviewed changes

JunyiXu-nv force-pushed the user/junyix/fix-trtllm-11579 branch 2 times, most recently from 3b062df to fd6a9c1 Compare May 11, 2026 02:45

JunyiXu-nv force-pushed the user/junyix/fix-trtllm-11579 branch from fd6a9c1 to 9d54687 Compare May 11, 2026 02:56

JunyiXu-nv merged commit defc515 into NVIDIA:main May 12, 2026
6 checks passed

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026

[TRTLLM-11579][feat] VisualGen batch inference support in serve module (

18a48b8

NVIDIA#12350) Signed-off-by: JunyiXu-nv <[email protected]>

Conversation

JunyiXu-nv commented Mar 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Mar 19, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

JunyiXu-nv commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

Uh oh!

Uh oh!

JunyiXu-nv commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

JunyiXu-nv commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

JunyiXu-nv commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

JunyiXu-nv commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

JunyiXu-nv commented Mar 26, 2026

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

JunyiXu-nv commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

JunyiXu-nv commented Apr 21, 2026

Uh oh!

tensorrt-cicd commented Apr 21, 2026

Uh oh!

tensorrt-cicd commented Apr 21, 2026

Uh oh!

JunyiXu-nv commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

JunyiXu-nv commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

JunyiXu-nv commented Mar 19, 2026 •

edited by coderabbitai Bot

Loading