Improve model card and add pipeline tag

This PR improves the model card by adding the `text-to-video` pipeline tag and organizing the content for better discoverability. It includes links to the official paper and GitHub repository, highlights the core methodology (Batch-Centered Noise Injection and Spectrum-Aware Contextual Noise), and provides the standard inference command as documented in the source repository.

Files changed (1) hide show

README.md +29 -8

README.md CHANGED Viewed

@@ -1,17 +1,44 @@
 ---
 license: mit
 ---
 # CAT-LVDM Checkpoints
 This repository contains model checkpoints for **CAT-LVDM: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation**.
 ## 📄 Paper
-**Title**: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation
 **Authors**: Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang
 **arXiv**: [https://arxiv.org/abs/2505.21545](https://arxiv.org/abs/2505.21545)
 ## 📚 BibTeX
 ```bibtex
@@ -26,14 +53,8 @@ This repository contains model checkpoints for **CAT-LVDM: Corruption-Aware Trai
 }
 ```
-## 🔗 Project Links
-- 🔥 [Project Page](https://github.com/chikap421/catlvdm)
-- 🤗 [Models](https://huggingface.co/Chikap421/catlvdm-checkpoints)
-- 📁 [Dataset](https://huggingface.co/datasets/TempoFunk/webvid-10M)
 ---
 ## 📝 Citation
-If you use this model, please cite the paper using the BibTeX above.

 ---
 license: mit
+pipeline_tag: text-to-video
 ---
 # CAT-LVDM Checkpoints
 This repository contains model checkpoints for **CAT-LVDM: Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation**.
+CAT-LVDM is a training framework designed to improve the robustness of latent video diffusion models against noisy conditioning through structured, data-aligned noise injection.
 ## 📄 Paper
+**Title**: [Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation](https://huggingface.co/papers/2505.21545)
 **Authors**: Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang
 **arXiv**: [https://arxiv.org/abs/2505.21545](https://arxiv.org/abs/2505.21545)
+## 🔗 Project Links
+- 🔥 [Official Code & Project Page](https://github.com/chikap421/catlvdm)
+- 🤗 [Model Weights](https://huggingface.co/Chikap421/catlvdm-checkpoints)
+- 📁 [Dataset (WebVid-10M)](https://huggingface.co/datasets/TempoFunk/webvid-10M)
+## 🛠️ Methodology
+CAT-LVDM introduces two key operators tailored for video diffusion to preserve temporal coherence and semantic fidelity:
+- **Batch-Centered Noise Injection (BCNI)**: Perturbs embeddings along intra-batch semantic axes.
+- **Spectrum-Aware Contextual Noise (SACN)**: Injects spectral noise aligned with principal low-frequency components.
+The checkpoints provided here cover various corruption settings (embedding-level and text-level) across multiple noise levels (2.5% to 20%).
+## 🚀 Usage
+To run inference with these pre-trained CAT-LVDM checkpoints, please refer to the instructions in the [official GitHub repository](https://github.com/chikap421/catlvdm).
+### Basic Inference
+```bash
+# Run inference with deepspeed
+bash scripts/inference_deepspeed.sh
+```
 ## 📚 BibTeX
 ```bibtex
 }
 ```
 ---
 ## 📝 Citation
+If you use this model or code in your research, please cite the paper using the BibTeX above.