Improve model card and add pipeline tag
Browse filesThis PR improves the model card by adding the `text-to-video` pipeline tag and organizing the content for better discoverability. It includes links to the official paper and GitHub repository, highlights the core methodology (Batch-Centered Noise Injection and Spectrum-Aware Contextual Noise), and provides the standard inference command as documented in the source repository.
README.md
CHANGED
|
@@ -1,17 +1,44 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
# CAT-LVDM Checkpoints
|
| 6 |
|
| 7 |
This repository contains model checkpoints for **CAT-LVDM: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation**.
|
| 8 |
|
|
|
|
|
|
|
| 9 |
## π Paper
|
| 10 |
|
| 11 |
-
**Title**: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation
|
| 12 |
**Authors**: Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang
|
| 13 |
**arXiv**: [https://arxiv.org/abs/2505.21545](https://arxiv.org/abs/2505.21545)
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
## π BibTeX
|
| 16 |
|
| 17 |
```bibtex
|
|
@@ -26,14 +53,8 @@ This repository contains model checkpoints for **CAT-LVDM: Corruption-Aware Trai
|
|
| 26 |
}
|
| 27 |
```
|
| 28 |
|
| 29 |
-
## π Project Links
|
| 30 |
-
|
| 31 |
-
- π₯ [Project Page](https://github.com/chikap421/catlvdm)
|
| 32 |
-
- π€ [Models](https://huggingface.co/Chikap421/catlvdm-checkpoints)
|
| 33 |
-
- π [Dataset](https://huggingface.co/datasets/TempoFunk/webvid-10M)
|
| 34 |
-
|
| 35 |
---
|
| 36 |
|
| 37 |
## π Citation
|
| 38 |
|
| 39 |
-
If you use this model, please cite the paper using the BibTeX above.
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: text-to-video
|
| 4 |
---
|
| 5 |
|
| 6 |
# CAT-LVDM Checkpoints
|
| 7 |
|
| 8 |
This repository contains model checkpoints for **CAT-LVDM: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation**.
|
| 9 |
|
| 10 |
+
CAT-LVDM is a training framework designed to improve the robustness of latent video diffusion models against noisy conditioning through structured, data-aligned noise injection.
|
| 11 |
+
|
| 12 |
## π Paper
|
| 13 |
|
| 14 |
+
**Title**: [Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation](https://huggingface.co/papers/2505.21545)
|
| 15 |
**Authors**: Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang
|
| 16 |
**arXiv**: [https://arxiv.org/abs/2505.21545](https://arxiv.org/abs/2505.21545)
|
| 17 |
|
| 18 |
+
## π Project Links
|
| 19 |
+
|
| 20 |
+
- π₯ [Official Code & Project Page](https://github.com/chikap421/catlvdm)
|
| 21 |
+
- π€ [Model Weights](https://huggingface.co/Chikap421/catlvdm-checkpoints)
|
| 22 |
+
- π [Dataset (WebVid-10M)](https://huggingface.co/datasets/TempoFunk/webvid-10M)
|
| 23 |
+
|
| 24 |
+
## π οΈ Methodology
|
| 25 |
+
|
| 26 |
+
CAT-LVDM introduces two key operators tailored for video diffusion to preserve temporal coherence and semantic fidelity:
|
| 27 |
+
- **Batch-Centered Noise Injection (BCNI)**: Perturbs embeddings along intra-batch semantic axes.
|
| 28 |
+
- **Spectrum-Aware Contextual Noise (SACN)**: Injects spectral noise aligned with principal low-frequency components.
|
| 29 |
+
|
| 30 |
+
The checkpoints provided here cover various corruption settings (embedding-level and text-level) across multiple noise levels (2.5% to 20%).
|
| 31 |
+
|
| 32 |
+
## π Usage
|
| 33 |
+
|
| 34 |
+
To run inference with these pre-trained CAT-LVDM checkpoints, please refer to the instructions in the [official GitHub repository](https://github.com/chikap421/catlvdm).
|
| 35 |
+
|
| 36 |
+
### Basic Inference
|
| 37 |
+
```bash
|
| 38 |
+
# Run inference with deepspeed
|
| 39 |
+
bash scripts/inference_deepspeed.sh
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
## π BibTeX
|
| 43 |
|
| 44 |
```bibtex
|
|
|
|
| 53 |
}
|
| 54 |
```
|
| 55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
---
|
| 57 |
|
| 58 |
## π Citation
|
| 59 |
|
| 60 |
+
If you use this model or code in your research, please cite the paper using the BibTeX above.
|