Improve model card metadata and content

Hi! I'm Niels from the Hugging Face community science team.

This PR improves the model card for DecMem. Key changes include:
- Updating the `pipeline_tag` to `text-to-video` to ensure the model is correctly categorized and discoverable in the Hub's gallery.
- Ensuring the paper, project page, and code repository are clearly linked.
- Maintaining the installation and inference instructions provided in the official repository.

Files changed (1) hide show

README.md +9 -7

README.md CHANGED Viewed

@@ -1,15 +1,15 @@
 ---
-pipeline_tag: video-to-video
-license: apache-2.0
-language:
-- en
 base_model:
 - Wan-AI/Wan2.1-T2V-1.3B
 ---
 # DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
-We propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation.
 [**Project Page**](https://jeffreyyzh.github.io/DecMem-Page/) | [**Paper**](https://arxiv.org/abs/2605.31336) | [**Code**](https://github.com/KlingAIResearch/DecMem)
@@ -23,7 +23,7 @@ huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B \
     --local-dir wan_models/Wan2.1-T2V-1.3B
 ```
-Download DecMem trained checkpoints from HuggingFace:
 ```bash
 huggingface-cli download KlingTeam/DecMem --local-dir checkpoints
@@ -38,7 +38,9 @@ checkpoints/
 ## Quick start
-We provide the example video-pose pairs for quick inference. The inference is Block-by-block causal denoising manner with KV cache.
 ```bash
 bash scripts/infer_example.sh

 ---
 base_model:
 - Wan-AI/Wan2.1-T2V-1.3B
+language:
+- en
+license: apache-2.0
+pipeline_tag: text-to-video
 ---
 # DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
+DecMem is a decoupled memory architecture designed for consistent, long-horizon world generation. It employs **Sparse Global Memory** for efficient fine-grained access to global history and **Anchored Local Memory** for stable and high-quality extrapolation. This approach enables minute-level controllable long video generation with high fidelity and consistency.
 [**Project Page**](https://jeffreyyzh.github.io/DecMem-Page/) | [**Paper**](https://arxiv.org/abs/2605.31336) | [**Code**](https://github.com/KlingAIResearch/DecMem)
     --local-dir wan_models/Wan2.1-T2V-1.3B
 ```
+Download DecMem trained checkpoints:
 ```bash
 huggingface-cli download KlingTeam/DecMem --local-dir checkpoints
 ## Quick start
+We provide example video-pose pairs for quick inference. The inference is performed in a block-by-block causal denoising manner with KV cache.
+To run the inference, follow the installation instructions in the [official repository](https://github.com/KlingAIResearch/DecMem) and run:
 ```bash
 bash scripts/infer_example.sh