[TRTLLM-11579][feat] VisualGen batch inference support in serve module#12350
Conversation
📝 WalkthroughWalkthroughThis PR adds batch support for media storage in the TensorRT-LLM serving stack. Two new batch APIs ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Server as OpenAI Server
participant VisualGen as Visual Generator
participant Storage as MediaStorage
participant Disk as File System
Client->>Server: Request with n=2 (images/videos)
Server->>VisualGen: generate(num_images_per_prompt=2)
VisualGen->>VisualGen: Generate batched output
VisualGen-->>Server: Return (batch, H, W, C) tensor
Server->>Storage: save_images/save_videos(batched_tensor, prefix, format)
loop For each batch item
Storage->>Disk: Write {prefix}_0.{ext}, {prefix}_1.{ext}, ...
end
Disk-->>Storage: File paths written
Storage-->>Server: Return [path_0, path_1, ...]
Server->>Server: Construct response with paths
Server-->>Client: Return list of images/videos
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can customize the tone of the review comments and chat replies.Configure the |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tensorrt_llm/serve/openai_server.py (2)
1920-1933: Only first video path stored in job metadata.When
n > 1, all videos are saved to disk but onlysaved_paths[0]is stored injob.output_path. The other generated videos exist on disk (as{video_id}_1.mp4, etc.) but aren't tracked in the job metadata.For MVP this is acceptable, but consider storing all paths (e.g., as a list) if clients need to retrieve all generated videos via the
/v1/videos/{video_id}/contentendpoint.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/serve/openai_server.py` around lines 1920 - 1933, The job metadata only stores the first saved video path (saved_paths[0]) after calling MediaStorage.save_videos, so when multiple videos are generated the rest are not tracked; update the code that handles VIDEO_STORE.get(video_id) to assign all returned paths to the job (e.g., set job.output_path to the full saved_paths list or a new job.output_paths field) and persist the change (ensure VIDEO_STORE.save/update is called if required), referencing MediaStorage.save_videos, saved_paths, VIDEO_STORE, job.output_path (or add job.output_paths) and video_id to locate where to change the assignment and storage logic.
1738-1748: Batch videos saved but only first returned to client.
save_videos()persists all videos in the batch, butFileResponsereturns onlysaved_paths[0]. This is the expected behavior for the sync endpoint (returning a single file), but clients requestingn > 1might expect multiple videos.Consider documenting this behavior or providing an alternative response format (e.g., JSON with URLs) when
n > 1is requested in the synchronous endpoint.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/serve/openai_server.py` around lines 1738 - 1748, The sync endpoint currently saves all batch videos via MediaStorage.save_videos (saved_paths) but always returns only the first file (actual_output_path) to the client; update the handler in openai_server.py to detect when the request asked for n > 1 and either (a) return a JSON response containing the list of saved_paths (or accessible URLs) instead of a single FileResponse, or (b) document the current behavior clearly and add an optional query flag to force single-file vs multi-file responses; make the change around the block that calls MediaStorage.save_videos and constructs the FileResponse so you either build and return a JSON payload with all saved_paths when request.n > 1 or keep FileResponse for single outputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tensorrt_llm/serve/openai_server.py`:
- Around line 1920-1933: The job metadata only stores the first saved video path
(saved_paths[0]) after calling MediaStorage.save_videos, so when multiple videos
are generated the rest are not tracked; update the code that handles
VIDEO_STORE.get(video_id) to assign all returned paths to the job (e.g., set
job.output_path to the full saved_paths list or a new job.output_paths field)
and persist the change (ensure VIDEO_STORE.save/update is called if required),
referencing MediaStorage.save_videos, saved_paths, VIDEO_STORE, job.output_path
(or add job.output_paths) and video_id to locate where to change the assignment
and storage logic.
- Around line 1738-1748: The sync endpoint currently saves all batch videos via
MediaStorage.save_videos (saved_paths) but always returns only the first file
(actual_output_path) to the client; update the handler in openai_server.py to
detect when the request asked for n > 1 and either (a) return a JSON response
containing the list of saved_paths (or accessible URLs) instead of a single
FileResponse, or (b) document the current behavior clearly and add an optional
query flag to force single-file vs multi-file responses; make the change around
the block that calls MediaStorage.save_videos and constructs the FileResponse so
you either build and return a JSON payload with all saved_paths when request.n >
1 or keep FileResponse for single outputs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3259c7d1-a87a-4759-8edd-bee7d9e2bde7
📒 Files selected for processing (5)
tensorrt_llm/serve/media_storage.pytensorrt_llm/serve/openai_server.pytensorrt_llm/serve/visual_gen_utils.pytests/unittest/_torch/visual_gen/test_media_storage.pytests/unittest/_torch/visual_gen/test_trtllm_serve_endpoints.py
|
/bot run |
|
PR_Github #39584 [ run ] triggered by Bot. Commit: |
|
PR_Github #39584 [ run ] completed with state
|
|
/bot run |
|
PR_Github #39686 [ run ] triggered by Bot. Commit: |
|
PR_Github #39686 [ run ] completed with state
|
6922a16 to
d559788
Compare
|
/bot run |
|
PR_Github #40021 [ run ] triggered by Bot. Commit: |
|
PR_Github #40021 [ run ] completed with state
|
|
/bot run |
|
PR_Github #40084 [ run ] triggered by Bot. Commit: |
|
PR_Github #40084 [ run ] completed with state
|
|
/bot run |
|
PR_Github #40240 [ run ] triggered by Bot. Commit: |
|
PR_Github #40240 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #40405 [ run ] triggered by Bot. Commit: |
|
PR_Github #43154 [ run ] completed with state
|
311165d to
f4feee0
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #43334 [ run ] triggered by Bot. Commit: |
|
PR_Github #43334 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #44594 [ run ] triggered by Bot. Commit: |
|
PR_Github #44594 [ run ] completed with state
|
|
/bot run |
|
PR_Github #44857 [ run ] triggered by Bot. Commit: |
|
PR_Github #44857 [ run ] completed with state
|
|
/bot run |
|
PR_Github #45057 [ run ] triggered by Bot. Commit: |
|
PR_Github #45057 [ run ] completed with state
|
3b062df to
fd6a9c1
Compare
Port the batch-inference support from the original PR onto the new media/encoding + openai_video_routes layout introduced by NVIDIA#13635: - tensorrt_llm/media/encoding.py: add save_images() and save_videos() free functions plus a shared _resolve_batch_paths() helper. Both accept either a path prefix or an explicit List[str] of per-item paths. - tensorrt_llm/serve/openai_video_routes.py: sync and async video endpoints now call save_videos(). The async background task records every saved path on VideoJob.output_paths; delete_video() removes all of them. The sync endpoint still returns only the first file (OpenAI Videos API has no multi-file response yet). - tensorrt_llm/serve/openai_protocol.py: add VideoJob.output_paths for the multi-output case. - tensorrt_llm/serve/visual_gen_utils.py: map VideoGenerationRequest.n to VisualGenParams.num_images_per_prompt (already done for image requests on main). - tests: cover save_images/save_videos in tests/unittest/media/test_encoding.py and add n=2 batch tests for sync and async video endpoints. Signed-off-by: JunyiXu-nv <[email protected]>
fd6a9c1 to
9d54687
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #47650 [ run ] triggered by Bot. Commit: |
|
PR_Github #47650 [ run ] completed with state
|
|
/bot run |
|
PR_Github #47834 [ run ] triggered by Bot. Commit: |
|
PR_Github #47834 [ run ] completed with state |
NVIDIA#12350) Signed-off-by: JunyiXu-nv <[email protected]>
Summary by CodeRabbit
Release Notes
New Features
nparameter for video generation requests to control output quantity, matching existing image generation behavior.Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.