AnthonyGosselin commited on
Commit
1bf151a
·
2 Parent(s): 6568aa9 e5e49d4

Merge branch 'main' of https://huggingface.co/AnthonyGosselin/Ctrl-Crash

Browse files
.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ architecture_figure.png filter=lfs diff=lfs merge=lfs -text
37
+ etc/genvid_57_11_04453.gif filter=lfs diff=lfs merge=lfs -text
38
+ etc/genvid_64_48_08386.gif filter=lfs diff=lfs merge=lfs -text
39
+ etc/genvid_87_21_08924.gif filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,69 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ base_model:
4
+ - stabilityai/stable-video-diffusion-img2vid-xt-1-1
5
+ ---
6
+
7
+ # Model Card for Ctrl-Crash
8
+
9
+ Generate car crash videos from an initial frame, using bounding-box and crash type control signals.
10
+
11
+ <p align="center">
12
+ <table cellspacing="0" cellpadding="0">
13
+ <tr>
14
+ <td><img src="etc/genvid_57_11_04453.gif" width="512"></td>
15
+ <td><img src="etc/genvid_64_48_08386.gif" width="512"></td>
16
+ <td><img src="etc/genvid_87_21_08924.gif" width="512"></td>
17
+ </tr>
18
+ </table>
19
+ </p>
20
+ (Above) Examples of generated crashes
21
+
22
+
23
+ ## Model Details
24
+
25
+ <p align="left">
26
+ <img src="architecture_figure.png" width=800>
27
+ </p>
28
+
29
+ <!-- TODO: Provide a longer summary of what this model is. -->
30
+
31
+ - Visit the **project page** for demos: https://anthonygosselin.github.io/Ctrl-Crash-ProjectPage/
32
+ - Visit the **repository** to get started: https://github.com/AnthonyGosselin/Ctrl-Crash
33
+ - Read the **paper** for more details: https://arxiv.org/abs/2506.00227
34
+
35
+
36
+ This model uses the Stability AI Image-to-Video model (SVD 1.1) as a base model: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1
37
+
38
+ ## Uses
39
+
40
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
41
+
42
+ <!-- TODO: Here we can describe the different operation modes (Reconstruction, Prediction and counterfactuals) -->
43
+
44
+ Ctrl-Crash supports different task settings, each enabled by varying the available control signals, namely:
45
+ - **(1) Crash Reconstruction**: Given an initial image, full bounding box sequence, and a crash type, the model reconstructs a consistent video combining the visual context of the initial frame with agent motion derived from the bounding boxes.
46
+ - **(2) Crash Prediction**: Given the initial frame and only a few initial bounding box frames (e.g., 0–9), the model predicts the future motion of agents in a way that aligns with the target crash type.
47
+ - **(3) Crash Counterfactuals**: Extending the prediction task, this mode varies the crash type signal while keeping other inputs fixed, enabling the generation of multiple plausible outcomes for the same scene—supporting counterfactual safety reasoning.
48
+
49
+ ## Bias, Risks, and Limitations
50
+
51
+ Despite its strong performance, our approach has several limitations, which motivates future work in this direction.
52
+
53
+ - Counterfactual outcomes can be hard to generate when initial scene conditions conflict with the desired crash type.
54
+ - The model relies heavily on bounding boxes, making it sensitive to tracking errors—especially in fully conditioned reconstruction.
55
+ - With no bounding boxes conditioning, motion direction can be ambiguous, and 2D boxes struggle to capture rotation or orientation, limiting realism in behaviors like spinouts
56
+ - Does not support text conditioning
57
+
58
+ **BibTeX:**
59
+ ```bibtex
60
+ @misc{gosselin2025ctrlcrashcontrollablediffusionrealistic,
61
+ title={Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes},
62
+ author={Anthony Gosselin and Ge Ya Luo and Luis Lara and Florian Golemo and Derek Nowrouzezahrai and Liam Paull and Alexia Jolicoeur-Martineau and Christopher Pal},
63
+ year={2025},
64
+ eprint={2506.00227},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.CV},
67
+ url={https://arxiv.org/abs/2506.00227},
68
+ }
69
+ ```
architecture_figure.png ADDED

Git LFS Details

  • SHA256: bc7fb8e3c488eaca947d612cb381338219d993330518db454ab18ae889e8513b
  • Pointer size: 131 Bytes
  • Size of remote file: 332 kB
etc/genvid_57_11_04453.gif ADDED

Git LFS Details

  • SHA256: 0af33f0855619ecea683552fdedb2f66df9ae6d953e9421e1e4dd03ffe8cde4a
  • Pointer size: 132 Bytes
  • Size of remote file: 1.88 MB
etc/genvid_64_48_08386.gif ADDED

Git LFS Details

  • SHA256: 2df6a977385d2dd05e6c624ae0d13f05ddfb832c302e44b4de97d5d347611a94
  • Pointer size: 132 Bytes
  • Size of remote file: 1.5 MB
etc/genvid_87_21_08924.gif ADDED

Git LFS Details

  • SHA256: bfb4e57880ce7d208a401eb1d78537173b60b20882d59a8313fe7e96ab1c4861
  • Pointer size: 132 Bytes
  • Size of remote file: 1.39 MB