Merge branch 'main' of https://huggingface.co/AnthonyGosselin/Ctrl-Crash

Files changed (6) hide show

.gitattributes +4 -0
README.md +69 -3
architecture_figure.png +3 -0
etc/genvid_57_11_04453.gif +3 -0
etc/genvid_64_48_08386.gif +3 -0
etc/genvid_87_21_08924.gif +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+architecture_figure.png filter=lfs diff=lfs merge=lfs -text
+etc/genvid_57_11_04453.gif filter=lfs diff=lfs merge=lfs -text
+etc/genvid_64_48_08386.gif filter=lfs diff=lfs merge=lfs -text
+etc/genvid_87_21_08924.gif filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,69 @@
----
-license: cc-by-4.0
----

+---
+license: cc-by-4.0
+base_model:
+- stabilityai/stable-video-diffusion-img2vid-xt-1-1
+---
+# Model Card for Ctrl-Crash
+Generate car crash videos from an initial frame, using bounding-box and crash type control signals.
+<p align="center">
+  <table cellspacing="0" cellpadding="0">
+    <tr>
+      <td><img src="etc/genvid_57_11_04453.gif" width="512"></td>
+      <td><img src="etc/genvid_64_48_08386.gif" width="512"></td>
+      <td><img src="etc/genvid_87_21_08924.gif" width="512"></td>
+    </tr>
+  </table>
+</p>
+(Above) Examples of generated crashes
+## Model Details
+<p align="left">
+<img src="architecture_figure.png" width=800>
+</p>
+<!-- TODO: Provide a longer summary of what this model is. -->
+- Visit the **project page** for demos: https://anthonygosselin.github.io/Ctrl-Crash-ProjectPage/
+- Visit the **repository** to get started: https://github.com/AnthonyGosselin/Ctrl-Crash
+- Read the **paper** for more details: https://arxiv.org/abs/2506.00227
+This model uses the Stability AI Image-to-Video model (SVD 1.1) as a base model: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+<!-- TODO: Here we can describe the different operation modes (Reconstruction, Prediction and counterfactuals) -->
+Ctrl-Crash supports different task settings, each enabled by varying the available control signals, namely:
+- **(1) Crash Reconstruction**: Given an initial image, full bounding box sequence, and a crash type, the model reconstructs a consistent video combining the visual context of the initial frame with agent motion derived from the bounding boxes.
+- **(2) Crash Prediction**: Given the initial frame and only a few initial bounding box frames (e.g., 0–9), the model predicts the future motion of agents in a way that aligns with the target crash type.
+- **(3) Crash Counterfactuals**: Extending the prediction task, this mode varies the crash type signal while keeping other inputs fixed, enabling the generation of multiple plausible outcomes for the same scene—supporting counterfactual safety reasoning.
+## Bias, Risks, and Limitations
+Despite its strong performance, our approach has several limitations, which motivates future work in this direction.
+- Counterfactual outcomes can be hard to generate when initial scene conditions conflict with the desired crash type.
+- The model relies heavily on bounding boxes, making it sensitive to tracking errors—especially in fully conditioned reconstruction.
+- With no bounding boxes conditioning, motion direction can be ambiguous, and 2D boxes struggle to capture rotation or orientation, limiting realism in behaviors like spinouts
+- Does not support text conditioning
+**BibTeX:**
+```bibtex
+@misc{gosselin2025ctrlcrashcontrollablediffusionrealistic,
+      title={Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes},
+      author={Anthony Gosselin and Ge Ya Luo and Luis Lara and Florian Golemo and Derek Nowrouzezahrai and Liam Paull and Alexia Jolicoeur-Martineau and Christopher Pal},
+      year={2025},
+      eprint={2506.00227},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2506.00227},
+}
+```

architecture_figure.png ADDED Viewed

Git LFS Details

SHA256: bc7fb8e3c488eaca947d612cb381338219d993330518db454ab18ae889e8513b
Pointer size: 131 Bytes
Size of remote file: 332 kB

etc/genvid_57_11_04453.gif ADDED Viewed

Git LFS Details

SHA256: 0af33f0855619ecea683552fdedb2f66df9ae6d953e9421e1e4dd03ffe8cde4a
Pointer size: 132 Bytes
Size of remote file: 1.88 MB

etc/genvid_64_48_08386.gif ADDED Viewed

Git LFS Details

SHA256: 2df6a977385d2dd05e6c624ae0d13f05ddfb832c302e44b4de97d5d347611a94
Pointer size: 132 Bytes
Size of remote file: 1.5 MB

etc/genvid_87_21_08924.gif ADDED Viewed

Git LFS Details

SHA256: bfb4e57880ce7d208a401eb1d78537173b60b20882d59a8313fe7e96ab1c4861
Pointer size: 132 Bytes
Size of remote file: 1.39 MB