rammurmu commited on
Commit
047a4ce
Β·
verified Β·
1 Parent(s): b7c1a41

Update README.md

Browse files

Add an custom content

Files changed (1) hide show
  1. README.md +34 -28
README.md CHANGED
@@ -1,32 +1,38 @@
1
  ---
2
- emoji: πŸŽ₯
3
- title: 'Self Forcing Wan 2.1 '
4
- short_description: Real-time video generation
5
  sdk: gradio
 
 
 
 
 
 
6
  ---
7
  <p align="center">
8
- <h1 align="center">Self Forcing</h1>
9
  <h3 align="center">Bridging the Train-Test Gap in Autoregressive Video Diffusion</h3>
10
  </p>
11
  <p align="center">
12
  <p align="center">
13
- <a href="https://www.xunhuang.me/">Xun Huang</a><sup>1</sup>
14
  Β·
15
- <a href="https://zhengqili.github.io/">Zhengqi Li</a><sup>1</sup>
16
  Β·
17
- <a href="https://guandehe.github.io/">Guande He</a><sup>2</sup>
18
  Β·
19
- <a href="https://mingyuanzhou.github.io/">Mingyuan Zhou</a><sup>2</sup>
20
  Β·
21
- <a href="https://research.adobe.com/person/eli-shechtman/">Eli Shechtman</a><sup>1</sup><br>
22
- <sup>1</sup>Adobe Research <sup>2</sup>UT Austin
23
  </p>
24
  <h3 align="center"><a href="https://arxiv.org/abs/2506.08009">Paper</a> | <a href="https://self-forcing.github.io">Website</a> | <a href="https://huggingface.co/gdhe17/Self-Forcing/tree/main">Models (HuggingFace)</a></h3>
25
  </p>
26
 
27
  ---
28
 
29
- Self Forcing trains autoregressive video diffusion models by **simulating the inference process during training**, performing autoregressive rollout with KV caching. It resolves the train-test distribution mismatch and enables **real-time, streaming video generation on a single RTX 4090** while matching the quality of state-of-the-art diffusion models.
30
 
31
  ---
32
 
@@ -45,8 +51,8 @@ Other hardware setup could also work but hasn't been tested.
45
  ## Installation
46
  Create a conda environment and install dependencies:
47
  ```
48
- conda create -n self_forcing python=3.10 -y
49
- conda activate self_forcing
50
  pip install -r requirements.txt
51
  pip install flash-attn --no-build-isolation
52
  python setup.py develop
@@ -56,7 +62,7 @@ python setup.py develop
56
  ### Download checkpoints
57
  ```
58
  huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir-use-symlinks False --local-dir wan_models/Wan2.1-T2V-1.3B
59
- huggingface-cli download gdhe17/Self-Forcing checkpoints/self_forcing_dmd.pt --local-dir .
60
  ```
61
 
62
  ### GUI demo
@@ -72,9 +78,9 @@ Note:
72
  Example inference script using the chunk-wise autoregressive checkpoint trained with DMD:
73
  ```
74
  python inference.py \
75
- --config_path configs/self_forcing_dmd.yaml \
76
- --output_folder videos/self_forcing_dmd \
77
- --checkpoint_path checkpoints/self_forcing_dmd.pt \
78
  --data_path prompts/MovieGenVideoBench_extended.txt \
79
  --use_ema
80
  ```
@@ -83,33 +89,33 @@ Other config files and corresponding checkpoints can be found in [configs](confi
83
  ## Training
84
  ### Download text prompts and ODE initialized checkpoint
85
  ```
86
- huggingface-cli download gdhe17/Self-Forcing checkpoints/ode_init.pt --local-dir .
87
- huggingface-cli download gdhe17/Self-Forcing vidprom_filtered_extended.txt --local-dir prompts
88
  ```
89
- Note: Our training algorithm (except for the GAN version) is data-free (**no video data is needed**). For now, we directly provide the ODE initialization checkpoint and will add more instructions on how to perform ODE initialization in the future (which is identical to the process described in the [CausVid](https://github.com/tianweiy/CausVid) repo).
90
 
91
- ### Self Forcing Training with DMD
92
  ```
93
  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
94
  --rdzv_backend=c10d \
95
  --rdzv_endpoint $MASTER_ADDR \
96
  train.py \
97
- --config_path configs/self_forcing_dmd.yaml \
98
- --logdir logs/self_forcing_dmd \
99
  --disable-wandb
100
  ```
101
  Our training run uses 600 iterations and completes in under 2 hours using 64 H100 GPUs. By implementing gradient accumulation, it should be possible to reproduce the results in less than 16 hours using 8 H100 GPUs.
102
 
103
  ## Acknowledgements
104
- This codebase is built on top of the open-source implementation of [CausVid](https://github.com/tianweiy/CausVid) by [Tianwei Yin](https://tianweiy.github.io/) and the [Wan2.1](https://github.com/Wan-Video/Wan2.1) repo.
105
 
106
  ## Citation
107
  If you find this codebase useful for your research, please kindly cite our paper:
108
  ```
109
- @article{huang2025selfforcing,
110
- title={Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion},
111
- author={Huang, Xun and Li, Zhengqi and He, Guande and Zhou, Mingyuan and Shechtman, Eli},
112
- journal={arXiv preprint arXiv:2506.08009},
113
  year={2025}
114
  }
115
  ```
 
1
  ---
2
+ emoji: πŸ‘€
3
+ title: 'RunAsh AI Live Video Streaming '
4
+ short_description: Real-time video strgeneration
5
  sdk: gradio
6
+ license: apache-2.0
7
+ colorFrom: red
8
+ colorTo: yellow
9
+ pinned: true
10
+ thumbnail: >-
11
+ https://cdn-uploads.huggingface.co/production/uploads/6799f4b5a2b48413dd18a8dd/VC40nrxiqjcoyZISss85V.jpeg
12
  ---
13
  <p align="center">
14
+ <h1 align="center">RunAsh AI Live Video Streaming</h1>
15
  <h3 align="center">Bridging the Train-Test Gap in Autoregressive Video Diffusion</h3>
16
  </p>
17
  <p align="center">
18
  <p align="center">
19
+ <a href="https://.me/">Ram Murmu</a><sup>1</sup>
20
  Β·
21
+ <a href="https://.github.io/">Vaibhav Murmu</a><sup>1</sup>
22
  Β·
23
+ <a href="https://.github.io/"></a><sup></sup>
24
  Β·
25
+ <a href="https://.github.io/"></a><sup></sup>
26
  Β·
27
+ <a href="https://research.adobe.com/person/eli-shechtman/"></a><sup>1</sup><br>
28
+ <sup>1</sup>RunAsh AI Research <sup>2</sup>
29
  </p>
30
  <h3 align="center"><a href="https://arxiv.org/abs/2506.08009">Paper</a> | <a href="https://self-forcing.github.io">Website</a> | <a href="https://huggingface.co/gdhe17/Self-Forcing/tree/main">Models (HuggingFace)</a></h3>
31
  </p>
32
 
33
  ---
34
 
35
+ RunAsh AI trains autoregressive video diffusion models by **simulating the inference process during training**, performing autoregressive rollout with KV caching. It resolves the train-test distribution mismatch and enables **real-time, streaming video generation on a single RTX 4090** while matching the quality of state-of-the-art diffusion models.
36
 
37
  ---
38
 
 
51
  ## Installation
52
  Create a conda environment and install dependencies:
53
  ```
54
+ conda create -n runash_ai python=3.10 -y
55
+ conda activate runash_ai
56
  pip install -r requirements.txt
57
  pip install flash-attn --no-build-isolation
58
  python setup.py develop
 
62
  ### Download checkpoints
63
  ```
64
  huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir-use-symlinks False --local-dir wan_models/Wan2.1-T2V-1.3B
65
+ huggingface-cli download gdhe17/runash-ai checkpoints/runash_ai_dmd.pt --local-dir .
66
  ```
67
 
68
  ### GUI demo
 
78
  Example inference script using the chunk-wise autoregressive checkpoint trained with DMD:
79
  ```
80
  python inference.py \
81
+ --config_path configs/runash_ai_dmd.yaml \
82
+ --output_folder videos/runash_ai_dmd \
83
+ --checkpoint_path checkpoints/runash_ai_dmd.pt \
84
  --data_path prompts/MovieGenVideoBench_extended.txt \
85
  --use_ema
86
  ```
 
89
  ## Training
90
  ### Download text prompts and ODE initialized checkpoint
91
  ```
92
+ huggingface-cli download gdhe17/runash-ai checkpoints/ode_init.pt --local-dir .
93
+ huggingface-cli download gdhe17/runash-ai vidprom_filtered_extended.txt --local-dir prompts
94
  ```
95
+ Note: Our training algorithm (except for the GAN version) is data-free (**no video data is needed**). For now, we directly provide the ODE initialization checkpoint and will add more instructions on how to perform ODE initialization in the future (which is identical to the process described in the [RunAsh](https://github.com/) repo).
96
 
97
+ ### RunAsh AI Training with DMD
98
  ```
99
  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
100
  --rdzv_backend=c10d \
101
  --rdzv_endpoint $MASTER_ADDR \
102
  train.py \
103
+ --config_path configs/runash_ai_dmd.yaml \
104
+ --logdir logs/runash_ai_dmd \
105
  --disable-wandb
106
  ```
107
  Our training run uses 600 iterations and completes in under 2 hours using 64 H100 GPUs. By implementing gradient accumulation, it should be possible to reproduce the results in less than 16 hours using 8 H100 GPUs.
108
 
109
  ## Acknowledgements
110
+ This codebase is built on top of the open-source implementation of [RunAsh](https://github.com/runash-ai) by [Ram Murmu](https://github.com/rammurmu) and the [Wan2.1](https://github.com/Wan-Video/Wan2.1) repo.
111
 
112
  ## Citation
113
  If you find this codebase useful for your research, please kindly cite our paper:
114
  ```
115
+ @article{rammurmu 2025 runash ai,
116
+ title={runash ai: Bridging the Train-Test Gap in Autoregressive Video Diffusion},
117
+ author={Ram murmu, and Vaibhav Murmu },
118
+ journal={arXiv preprint arXiv:},
119
  year={2025}
120
  }
121
  ```