Skip to content

[TRTLLM-11579][feat] VisualGen batch inference support in serve module#12350

Merged
JunyiXu-nv merged 1 commit into
NVIDIA:mainfrom
JunyiXu-nv:user/junyix/fix-trtllm-11579
May 12, 2026
Merged

[TRTLLM-11579][feat] VisualGen batch inference support in serve module#12350
JunyiXu-nv merged 1 commit into
NVIDIA:mainfrom
JunyiXu-nv:user/junyix/fix-trtllm-11579

Conversation

@JunyiXu-nv
Copy link
Copy Markdown
Collaborator

@JunyiXu-nv JunyiXu-nv commented Mar 19, 2026

  • Add MediaStorage.save_images() for batch image saving from (B,H,W,C) tensors
  • Add MediaStorage.save_videos() for batch video saving from (B,T,H,W,C) tensors
  • Fix openai_image_generation endpoint to split batch output tensors and return all generated images in the response data list
  • Fix openai_image_edit endpoint with same batch handling
  • Fix openai_video_generation_sync to save all videos via save_videos()
  • Fix _generate_video_background (async) to save all videos via save_videos()
  • Map VideoGenerationRequest.n to params.num_images_per_prompt in parse_visual_gen_params (was previously only handled for image requests)
  • Update MockVisualGen to be batch-aware: expands single tensors to (N,...) when params.num_images_per_prompt > 1
  • Add batch tests for save_images/save_videos in test_media_storage.py
  • Add batch endpoint tests for n>1 in test_trtllm_serve_endpoints.py

Summary by CodeRabbit

Release Notes

  • New Features

    • Added batch processing support for image and video generation, enabling multiple images/videos to be generated in a single request.
    • Introduced n parameter for video generation requests to control output quantity, matching existing image generation behavior.
    • Enhanced media storage with batch-aware APIs for handling multiple images and videos simultaneously.
  • Tests

    • Added comprehensive test coverage for batch image and video operations.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@JunyiXu-nv JunyiXu-nv requested review from a team as code owners March 19, 2026 09:00
@JunyiXu-nv JunyiXu-nv requested a review from Superjomn March 19, 2026 09:00
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

This PR adds batch support for media storage in the TensorRT-LLM serving stack. Two new batch APIs (save_images, save_videos) are introduced in MediaStorage to handle multiple images or videos with filename indexing. The OpenAI server endpoints for image generation, image editing, and video generation are updated to use these new batch APIs. Parameter mapping for the n field is extended to video generation, and comprehensive batch-aware tests are added.

Changes

Cohort / File(s) Summary
Media Storage Batch APIs
tensorrt_llm/serve/media_storage.py
Added save_images() and save_videos() batch methods that normalize input tensor shapes, derive output filename extensions from format parameter, iterate over batch dimension, and return lists of saved file paths.
OpenAI Server Integration
tensorrt_llm/serve/openai_server.py
Updated image generation, image editing, and video generation endpoints to normalize output tensors into lists and call the new batch save APIs (save_images, save_videos) instead of single-file operations; changed control flow to derive response paths from returned lists.
Parameter Mapping
tensorrt_llm/serve/visual_gen_utils.py
Extended parse_visual_gen_params to map the n field from VideoGenerationRequest to VisualGenParams.num_images_per_prompt for video generation parity with image generation.
Batch Storage Tests
tests/unittest/_torch/visual_gen/test_media_storage.py
Added unit tests validating batch behavior of save_images() and save_videos() with various input shapes, format parameters, and audio pairing scenarios.
Endpoint Batch Tests
tests/unittest/_torch/visual_gen/test_trtllm_serve_endpoints.py
Extended mock visual generator with batch awareness and _maybe_batch() helper; added/strengthened endpoint tests for n > 1 scenarios in image generation, image editing, and video generation (both sync and async).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server as OpenAI Server
    participant VisualGen as Visual Generator
    participant Storage as MediaStorage
    participant Disk as File System

    Client->>Server: Request with n=2 (images/videos)
    Server->>VisualGen: generate(num_images_per_prompt=2)
    VisualGen->>VisualGen: Generate batched output
    VisualGen-->>Server: Return (batch, H, W, C) tensor
    Server->>Storage: save_images/save_videos(batched_tensor, prefix, format)
    loop For each batch item
        Storage->>Disk: Write {prefix}_0.{ext}, {prefix}_1.{ext}, ...
    end
    Disk-->>Storage: File paths written
    Storage-->>Server: Return [path_0, path_1, ...]
    Server->>Server: Construct response with paths
    Server-->>Client: Return list of images/videos
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the JIRA ticket, feature type, and main objective of adding batch inference support for VisualGen in the serve module.
Description check ✅ Passed The PR description provides comprehensive coverage of changes across all affected files with clear, bullet-point explanations of what was added and fixed, plus test coverage details.
Docstring Coverage ✅ Passed Docstring coverage is 85.29% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the tone of the review comments and chat replies.

Configure the tone_instructions setting to customize the tone of the review comments and chat replies. For example, you can set the tone to Act like a strict teacher, Act like a pirate and more.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tensorrt_llm/serve/openai_server.py (2)

1920-1933: Only first video path stored in job metadata.

When n > 1, all videos are saved to disk but only saved_paths[0] is stored in job.output_path. The other generated videos exist on disk (as {video_id}_1.mp4, etc.) but aren't tracked in the job metadata.

For MVP this is acceptable, but consider storing all paths (e.g., as a list) if clients need to retrieve all generated videos via the /v1/videos/{video_id}/content endpoint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/openai_server.py` around lines 1920 - 1933, The job
metadata only stores the first saved video path (saved_paths[0]) after calling
MediaStorage.save_videos, so when multiple videos are generated the rest are not
tracked; update the code that handles VIDEO_STORE.get(video_id) to assign all
returned paths to the job (e.g., set job.output_path to the full saved_paths
list or a new job.output_paths field) and persist the change (ensure
VIDEO_STORE.save/update is called if required), referencing
MediaStorage.save_videos, saved_paths, VIDEO_STORE, job.output_path (or add
job.output_paths) and video_id to locate where to change the assignment and
storage logic.

1738-1748: Batch videos saved but only first returned to client.

save_videos() persists all videos in the batch, but FileResponse returns only saved_paths[0]. This is the expected behavior for the sync endpoint (returning a single file), but clients requesting n > 1 might expect multiple videos.

Consider documenting this behavior or providing an alternative response format (e.g., JSON with URLs) when n > 1 is requested in the synchronous endpoint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/openai_server.py` around lines 1738 - 1748, The sync
endpoint currently saves all batch videos via MediaStorage.save_videos
(saved_paths) but always returns only the first file (actual_output_path) to the
client; update the handler in openai_server.py to detect when the request asked
for n > 1 and either (a) return a JSON response containing the list of
saved_paths (or accessible URLs) instead of a single FileResponse, or (b)
document the current behavior clearly and add an optional query flag to force
single-file vs multi-file responses; make the change around the block that calls
MediaStorage.save_videos and constructs the FileResponse so you either build and
return a JSON payload with all saved_paths when request.n > 1 or keep
FileResponse for single outputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/serve/openai_server.py`:
- Around line 1920-1933: The job metadata only stores the first saved video path
(saved_paths[0]) after calling MediaStorage.save_videos, so when multiple videos
are generated the rest are not tracked; update the code that handles
VIDEO_STORE.get(video_id) to assign all returned paths to the job (e.g., set
job.output_path to the full saved_paths list or a new job.output_paths field)
and persist the change (ensure VIDEO_STORE.save/update is called if required),
referencing MediaStorage.save_videos, saved_paths, VIDEO_STORE, job.output_path
(or add job.output_paths) and video_id to locate where to change the assignment
and storage logic.
- Around line 1738-1748: The sync endpoint currently saves all batch videos via
MediaStorage.save_videos (saved_paths) but always returns only the first file
(actual_output_path) to the client; update the handler in openai_server.py to
detect when the request asked for n > 1 and either (a) return a JSON response
containing the list of saved_paths (or accessible URLs) instead of a single
FileResponse, or (b) document the current behavior clearly and add an optional
query flag to force single-file vs multi-file responses; make the change around
the block that calls MediaStorage.save_videos and constructs the FileResponse so
you either build and return a JSON payload with all saved_paths when request.n >
1 or keep FileResponse for single outputs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3259c7d1-a87a-4759-8edd-bee7d9e2bde7

📥 Commits

Reviewing files that changed from the base of the PR and between 19e2030 and 750afe2.

📒 Files selected for processing (5)
  • tensorrt_llm/serve/media_storage.py
  • tensorrt_llm/serve/openai_server.py
  • tensorrt_llm/serve/visual_gen_utils.py
  • tests/unittest/_torch/visual_gen/test_media_storage.py
  • tests/unittest/_torch/visual_gen/test_trtllm_serve_endpoints.py

@JunyiXu-nv JunyiXu-nv requested review from QiJune and zhenhuaw-me and removed request for Superjomn March 19, 2026 09:25
@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39584 [ run ] triggered by Bot. Commit: dac8f78 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39584 [ run ] completed with state SUCCESS. Commit: dac8f78
/LLM/main/L0_MergeRequest_PR pipeline #30797 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Comment thread tensorrt_llm/serve/media_storage.py Outdated
Comment thread tensorrt_llm/serve/media_storage.py Outdated
@JunyiXu-nv JunyiXu-nv requested a review from a team as a code owner March 20, 2026 03:56
@JunyiXu-nv JunyiXu-nv requested a review from nv-guomingz March 20, 2026 03:56
@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39686 [ run ] triggered by Bot. Commit: 6922a16 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39686 [ run ] completed with state SUCCESS. Commit: 6922a16
/LLM/main/L0_MergeRequest_PR pipeline #30884 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv JunyiXu-nv force-pushed the user/junyix/fix-trtllm-11579 branch from 6922a16 to d559788 Compare March 24, 2026 02:22
@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40021 [ run ] triggered by Bot. Commit: d559788 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40021 [ run ] completed with state SUCCESS. Commit: d559788
/LLM/main/L0_MergeRequest_PR pipeline #31178 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40084 [ run ] triggered by Bot. Commit: d559788 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40084 [ run ] completed with state SUCCESS. Commit: d559788
/LLM/main/L0_MergeRequest_PR pipeline #31237 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40240 [ run ] triggered by Bot. Commit: 4bb8fd6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40240 [ run ] completed with state SUCCESS. Commit: 4bb8fd6
/LLM/main/L0_MergeRequest_PR pipeline #31372 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40405 [ run ] triggered by Bot. Commit: 4bb8fd6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43154 [ run ] completed with state SUCCESS. Commit: 311165d
/LLM/main/L0_MergeRequest_PR pipeline #33784 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv JunyiXu-nv force-pushed the user/junyix/fix-trtllm-11579 branch from 311165d to f4feee0 Compare April 15, 2026 02:06
@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43334 [ run ] triggered by Bot. Commit: f4feee0 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43334 [ run ] completed with state SUCCESS. Commit: f4feee0
/LLM/main/L0_MergeRequest_PR pipeline #33874 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv JunyiXu-nv requested a review from a team as a code owner April 21, 2026 02:51
@JunyiXu-nv JunyiXu-nv requested review from moraxu and tijyojwad April 21, 2026 02:51
@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44594 [ run ] triggered by Bot. Commit: 7e2d420 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44594 [ run ] completed with state SUCCESS. Commit: 7e2d420
/LLM/main/L0_MergeRequest_PR pipeline #34980 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44857 [ run ] triggered by Bot. Commit: 7e2d420 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44857 [ run ] completed with state SUCCESS. Commit: 7e2d420
/LLM/main/L0_MergeRequest_PR pipeline #35198 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45057 [ run ] triggered by Bot. Commit: 7e2d420 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45057 [ run ] completed with state SUCCESS. Commit: 7e2d420
/LLM/main/L0_MergeRequest_PR pipeline #35360 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv JunyiXu-nv force-pushed the user/junyix/fix-trtllm-11579 branch 2 times, most recently from 3b062df to fd6a9c1 Compare May 11, 2026 02:45
Port the batch-inference support from the original PR onto the new
media/encoding + openai_video_routes layout introduced by NVIDIA#13635:

- tensorrt_llm/media/encoding.py: add save_images() and save_videos()
  free functions plus a shared _resolve_batch_paths() helper. Both
  accept either a path prefix or an explicit List[str] of per-item paths.
- tensorrt_llm/serve/openai_video_routes.py: sync and async video
  endpoints now call save_videos(). The async background task records
  every saved path on VideoJob.output_paths; delete_video() removes all
  of them. The sync endpoint still returns only the first file (OpenAI
  Videos API has no multi-file response yet).
- tensorrt_llm/serve/openai_protocol.py: add VideoJob.output_paths for
  the multi-output case.
- tensorrt_llm/serve/visual_gen_utils.py: map VideoGenerationRequest.n
  to VisualGenParams.num_images_per_prompt (already done for image
  requests on main).
- tests: cover save_images/save_videos in tests/unittest/media/test_encoding.py
  and add n=2 batch tests for sync and async video endpoints.

Signed-off-by: JunyiXu-nv <[email protected]>
@JunyiXu-nv JunyiXu-nv force-pushed the user/junyix/fix-trtllm-11579 branch from fd6a9c1 to 9d54687 Compare May 11, 2026 02:56
@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47650 [ run ] triggered by Bot. Commit: 9d54687 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47650 [ run ] completed with state SUCCESS. Commit: 9d54687
/LLM/main/L0_MergeRequest_PR pipeline #37554 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@JunyiXu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47834 [ run ] triggered by Bot. Commit: 9d54687 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47834 [ run ] completed with state SUCCESS. Commit: 9d54687
/LLM/main/L0_MergeRequest_PR pipeline #37717 completed with status: 'SUCCESS'

CI Report

Link to invocation

@JunyiXu-nv JunyiXu-nv merged commit defc515 into NVIDIA:main May 12, 2026
6 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants