Workshop on Video Generative Models: Benchmarks and Evaluation

The rapid advancement of video generative models underscores the critical need for robust evaluation methodologies capable of rigorously assessing instruction adherence, physical plausibility, human fidelity, and creativity. However, prevailing metrics and benchmarks remain constrained, predominantly prioritizing semantic alignment while often overlooking subtle yet critical artifacts, such as structural distortions, unnatural motion dynamics, and weak temporal coherence, that persist even in state-of-the-art systems.

Therefore, the VGBE workshop seeks to pioneer next-generation evaluation methodologies characterized by fine-grained granularity, physical grounding, and alignment with human perception. By establishing multi-dimensional, explainable, and standardized benchmarks, we aim to bridge the gap between generation and assessment, thereby accelerating the maturation of video generative models and facilitating their reliable deployment in real-world applications.

Topics

🏆 Workshop Paper Awards

Best Paper Award + $400

Best Paper Runner-Up Award + $300

Recognizing outstanding contributions in workshop paper submissions.

Novel Metrics and Evaluation Methods

Spatiotemporal & Causal Integrity: Quantifying motion realism, object permanence, and causal logic consistency over time.
Perceptual Quality Assessment: Learning-based metrics for detecting visual artifacts, hallucinations, and alignment with human subjectivity.
Explainable Automated Judges: Leveraging Multimodal LLMs (VLMs) for scalable, fine-grained, and interpretable critique.
Instruction Adherence Metrics: Rigorous evaluation of prompt fidelity, spatial conditioning, and complex constraint satisfaction.

Datasets and Benchmarks

Narrative & Multi-Shot Suites: Curated datasets assessing character persistence, scene transitions, and long-horizon consistency.
Physics-Grounded Challenge Sets: Scenarios isolating fluid dynamics, collisions, and kinematic anomalies to stress-test "World Simulators."
Human Preference Data: Large-scale, fine-grained annotations capturing multi-dimensional judgments (e.g., aesthetics vs. realism).
Standardized Protocols: Unified data splits and reproducible frameworks to ensure transparent and comparable benchmarking.

Developing video generative applications in vertical domains

Domain Adaptation & Personalization: Efficient fine-tuning and Low-Rank Adaptation (LoRA) strategies for specialized verticals (e.g., medical, cinematic).
Simulation for Embodied AI: Leveraging video generative models as world simulators for robotics perception, planning, and Sim2Real transfer.
Interactive & Human-in-the-Loop: User-centric frameworks incorporating iterative feedback for creative workflows and gaming.
Immersive 4D Generation: Lifting video diffusion priors to synthesize spatially consistent scenes and dynamic assets for AR/VR environments.
Deployment Efficiency: Optimizing inference latency, memory footprint, and cost for scalable industrial applications.

Challenges

Submissions will be evaluated on the test set using the metrics defined in the associated paper, with human evaluation conducted for each task as needed.

Image-to-Video Consistent Generation

Objective: Maintain visual preservation and spatiotemporal consistency from an image and text prompt.
Awards:
- 1st Place:$1,000+ Certificate
- 2nd Place:$600+ Certificate
- 3rd Place:$300+ Certificate
Data Usage: Please follow the Dataset License for data access and usage.

Participate Now

Competition Timeline

Competition starts	February 19, 2026
~~Results and Code Submission deadline~~	~~April 01, 2026~~
Results and Code Submission deadline	April 05, 2026

Generic Instructional Video Editing | Website (for more detailed)

Objective: Edit input videos from natural language instructions while preserving quality and fidelity.
Awards:
- Highest Score Award:$500+ Certificate
- Innovation Award:$500+ Certificate

Competition Timeline

Competition starts	February 20, 2026
~~Results Submission deadline~~	~~March 25, 2026~~
Results Submission deadline	April 05, 2026

Participate Now

Physics-aware Video Instance Removal | Website (for more detailed)

Objective: Remove target instances and restore realistic environment dynamics with minimal artifacts.
Awards:
- Highest Score Award:$500+ Certificate
- Innovation Award:$500+ Certificate

Competition Timeline

Competition starts	February 20, 2026
~~Results Submission deadline~~	~~March 25, 2026~~
Results Submission deadline	April 05, 2026

Participate Now

Keynote Speakers

Organizers

Schedule

Date: Thursday, June 4, 2026 Location: Colorado Convention Center, Mile High 3B Virtual Link: TBD

Time	Session	Speaker / Host	Topic / Notes
9:00 - 9:10 AM	Morning Opening	Organizing Committee	Welcome & Workshop Overview
9:10 - 9:25 AM	Oral Presentation	-	Inferring Dynamic Physical Properties from Video Foundation Models
9:30 - 10:00 AM	Keynote	Mike Shou	Video World Model for Robot Learning
10:00 - 10:30 AM	Keynote	Yan Wang	TBD
10:30 - 10:50 AM	Coffee Break	-	-
10:50 - 11:20 AM	Keynote	Yaoyao Liu	Enable Explicit 3D/4D Controls for Pre-trained Generative Models
11:20 - 11:30 AM	Challenge Winners Ceremony	-	-
11:30 - 11:50 AM	Challenge Winner Solutions	-	TBD
11:50 AM - 1:30 PM	Lunch Break	-	-
1:30 - 1:40 PM	Afternoon Opening	Organizing Committee	-
1:40 - 2:10 PM	Keynote	Alan Bovik	Two Experiments on the Perception of GenAI Pictures
2:10 - 2:40 PM	Keynote	Zhuang Liu	Building and Evaluating Fully Open Generative Models
2:40 - 3:00 PM	Coffee Break	-	-
3:00 - 3:30 PM	Keynote	Ming-Hsuan Yang	Toward World Models: Geometry, View Synthesis, and Visual Reasoning
3:30 - 4:00 PM	Keynote	Jiajun Wu	TBD
4:00 - 4:15 PM	Oral Presentation	-	Physics-Aware Video Instance Removal Benchmark
4:15 - 4:30 PM	Oral Presentation	-	Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos
4:30 - 4:45 PM	Oral Presentation	-	Risk-Controllable Multi-View Diffusion for Driving Scenario Generation
4:45 - 5:00 PM	Oral Presentation	-	TBD
5:00 - 5:15 PM	Paper Awards Ceremony	Organizing Committee	-
5:15 - 5:30 PM	Closing Remarks & Group Photo	Organizing Committee	-

Accepted Papers

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
Distilling Geometry Priors for 3D-Consistent Video Generation
Inferring Dynamic Physical Properties from Video Foundation Models
Physics-Aware Video Instance Removal Benchmark
VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?
AIGVE-MACS: Unified Multi-Aspect Commenting and Scoring Model for AI-Generated Video Evaluation
Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don’t Know Galileo’s Principle…for now
Tempered Self-Similarity Alignment for Physically Plausible Video Generation
V-PartSwap: Motion-Consistent Facial Part Transfer in Videos via Alignment-Aware Diffusion
Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos
Risk-Controllable Multi-View Diffusion for Driving Scenario Generation
Test-Time Domain Adaptation for Interactive Video Generation
The Evaluation Imperative for Video Generative Models: A Survey on Metrics, Benchmarks, and Trustworthiness

The First Workshop on

Video Generative Models: Benchmarks and Evaluation

Topics

Novel Metrics and Evaluation Methods

Datasets and Benchmarks

Developing video generative applications in vertical domains

Challenges

Image-to-Video Consistent Generation

Competition Timeline

Generic Instructional Video Editing | Website (for more detailed)

Competition Timeline

Physics-aware Video Instance Removal | Website (for more detailed)

Competition Timeline

Keynote Speakers

Organizers

Schedule

Accepted Papers