nielsr HF Staff commited on
Commit
a42a50f
Β·
verified Β·
1 Parent(s): f2f4c8a

Improve model card and add pipeline tag

Browse files

This PR improves the model card by adding the `text-to-video` pipeline tag and organizing the content for better discoverability. It includes links to the official paper and GitHub repository, highlights the core methodology (Batch-Centered Noise Injection and Spectrum-Aware Contextual Noise), and provides the standard inference command as documented in the source repository.

Files changed (1) hide show
  1. README.md +29 -8
README.md CHANGED
@@ -1,17 +1,44 @@
1
  ---
2
  license: mit
 
3
  ---
4
 
5
  # CAT-LVDM Checkpoints
6
 
7
  This repository contains model checkpoints for **CAT-LVDM: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation**.
8
 
 
 
9
  ## πŸ“„ Paper
10
 
11
- **Title**: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation
12
  **Authors**: Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang
13
  **arXiv**: [https://arxiv.org/abs/2505.21545](https://arxiv.org/abs/2505.21545)
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ## πŸ“š BibTeX
16
 
17
  ```bibtex
@@ -26,14 +53,8 @@ This repository contains model checkpoints for **CAT-LVDM: Corruption-Aware Trai
26
  }
27
  ```
28
 
29
- ## πŸ”— Project Links
30
-
31
- - πŸ”₯ [Project Page](https://github.com/chikap421/catlvdm)
32
- - πŸ€— [Models](https://huggingface.co/Chikap421/catlvdm-checkpoints)
33
- - πŸ“ [Dataset](https://huggingface.co/datasets/TempoFunk/webvid-10M)
34
-
35
  ---
36
 
37
  ## πŸ“ Citation
38
 
39
- If you use this model, please cite the paper using the BibTeX above.
 
1
  ---
2
  license: mit
3
+ pipeline_tag: text-to-video
4
  ---
5
 
6
  # CAT-LVDM Checkpoints
7
 
8
  This repository contains model checkpoints for **CAT-LVDM: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation**.
9
 
10
+ CAT-LVDM is a training framework designed to improve the robustness of latent video diffusion models against noisy conditioning through structured, data-aligned noise injection.
11
+
12
  ## πŸ“„ Paper
13
 
14
+ **Title**: [Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation](https://huggingface.co/papers/2505.21545)
15
  **Authors**: Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang
16
  **arXiv**: [https://arxiv.org/abs/2505.21545](https://arxiv.org/abs/2505.21545)
17
 
18
+ ## πŸ”— Project Links
19
+
20
+ - πŸ”₯ [Official Code & Project Page](https://github.com/chikap421/catlvdm)
21
+ - πŸ€— [Model Weights](https://huggingface.co/Chikap421/catlvdm-checkpoints)
22
+ - πŸ“ [Dataset (WebVid-10M)](https://huggingface.co/datasets/TempoFunk/webvid-10M)
23
+
24
+ ## πŸ› οΈ Methodology
25
+
26
+ CAT-LVDM introduces two key operators tailored for video diffusion to preserve temporal coherence and semantic fidelity:
27
+ - **Batch-Centered Noise Injection (BCNI)**: Perturbs embeddings along intra-batch semantic axes.
28
+ - **Spectrum-Aware Contextual Noise (SACN)**: Injects spectral noise aligned with principal low-frequency components.
29
+
30
+ The checkpoints provided here cover various corruption settings (embedding-level and text-level) across multiple noise levels (2.5% to 20%).
31
+
32
+ ## πŸš€ Usage
33
+
34
+ To run inference with these pre-trained CAT-LVDM checkpoints, please refer to the instructions in the [official GitHub repository](https://github.com/chikap421/catlvdm).
35
+
36
+ ### Basic Inference
37
+ ```bash
38
+ # Run inference with deepspeed
39
+ bash scripts/inference_deepspeed.sh
40
+ ```
41
+
42
  ## πŸ“š BibTeX
43
 
44
  ```bibtex
 
53
  }
54
  ```
55
 
 
 
 
 
 
 
56
  ---
57
 
58
  ## πŸ“ Citation
59
 
60
+ If you use this model or code in your research, please cite the paper using the BibTeX above.