CVPR 2026 Tutorial


Building GenAI-based Simulation Environment for
End-to-End Autonomous Driving

Half-day Tutorial | In-person



Overview


End-to-end autonomous driving systems require simulation environments capable of exposing models to diverse, realistic, and safety-critical long-tail events that rarely appear in real-world data. Traditional simulators—relying on scripted scenarios, simplified traffic logic, and static 3D assets—capture only a narrow slice of real traffic complexity and fail to exercise modern data-driven AV stacks in a meaningful, system-level manner. As end-to-end policies blur the boundaries between perception, prediction, and planning, new generative, data-first, closed-loop simulation workflows are needed to bridge the gap between real-world distributions and synthetic environments.

This tutorial aims to demonstrate how generative AI and world models can build end-to-end simulation pipelines that directly support learning-based AV systems. We focus on practical, reproducible methods involving city-scale digital twins, data-driven traffic behavior models, generative corner-case synthesis, and sensor-level simulation tailored for perception and end-to-end policies. Participants will gain both conceptual understanding and hands-on entry points—code, tools, datasets, and minimal templates—to design or extend their own generative simulation systems.

This tutorial walks through the complete pipeline for generative end-to-end AV simulation. We begin by defining what distinguishes end-to-end simulation from classical AV simulators and how policy-driven requirements reshape simulation design. We then introduce world modeling and city-scale digital twins, covering data-driven reconstruction of road layouts, traffic rules, and naturalistic human driving behavior. Next, we discuss generative modeling of rare and adversarial scenarios derived from crash reports, regulations, or textual descriptions. We follow with sensor and video simulation, comparing graphics engines, neural rendering, and video foundation models for producing realistic, multi-view, and temporally consistent sensor data. Finally, we integrate these components into a full pipeline and discuss system-level evaluation, failure analysis, and open challenges in validating generative simulation and aligning with safety standards.




Speakers


To be announced.




Schedule


Title Speaker Time
Introduction & Motivation TBD TBD
Module 1: World Modeling & Digital Twins TBD TBD
Module 2: Generative Corner-Case & Scenario Synthesis TBD TBD
Break - TBD
Module 3: Sensor Simulation, Video Generation & End-to-End Pipelines TBD TBD
Module 4: Testing Open-Source AV Stacks TBD TBD
Closing Discussion TBD TBD

Organizers


Henry Liu
University of Michigan
Howie Sun
SaferDrive AI
Jun Gao
University of Michigan / NVIDIA
Shuo Feng
Tsinghua University
Xintao Yan
University of Hong Kong
Jiawei Wang
University of Michigan

Related Publications & Resources