Add comprehensive model card for Many-for-Many

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +115 -0
README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: any-to-any
4
+ library_name: diffusers
5
+ tags:
6
+ - many-for-many
7
+ - diffusion-model
8
+ - video-generation
9
+ - image-generation
10
+ - text-to-video
11
+ - image-to-video
12
+ - video-to-video
13
+ - image-manipulation
14
+ - video-manipulation
15
+ ---
16
+
17
+ # Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
18
+
19
+ <div align="center">
20
+ <img src="https://huggingface.co/LetsThink/MfM-Pipeline-8B/resolve/main/assets/MfM_logo.jpeg" alt="MfM-logo" width="50%">
21
+ </div>
22
+
23
+ [\ud83d\udcda Paper](https://huggingface.co/papers/2506.01758) | [\ud83c\udf10 Project Page](https://leeruibin.github.io/MfMPage/) | [\ud83d\udcbb Code](https://github.com/SandAI-org/MAGI-1) | [\ud83e\udd17 Model](https://huggingface.co/LetsThink/MfM-Pipeline-8B)
24
+
25
+ **Many-for-Many (MfM)** is a novel unified framework designed to train a single model capable of performing over 10 different visual generation and manipulation tasks, encompassing both images and videos. This approach addresses the high cost of training strong text-to-video foundation models by leveraging diverse existing datasets across various tasks.
26
+
27
+ Specifically, MfM designs a lightweight adapter to unify different conditions across tasks and employs a joint image-video learning strategy to progressively train the model from scratch. This leads to a unified visual generation and manipulation model with improved video generation performance. Additionally, depth maps are introduced as a condition to help the model better perceive 3D space in visual generation.
28
+
29
+ Two versions of the model are available (8B and 2B), each capable of performing a wide array of tasks. The 8B model demonstrates highly competitive performance in video generation tasks compared to open-source and even commercial engines.
30
+
31
+ ## \u2728 Key Features
32
+ * **Unified Framework**: Trains a single model for over 10 different image and video generation and manipulation tasks.
33
+ * **Efficient Design**: Utilizes a lightweight adapter to unify diverse conditions and a joint image-video learning strategy for progressive training.
34
+ * **Depth-Aware Generation**: Incorporates depth maps as a condition to enhance the model's perception of 3D space.
35
+ * **Versatile Capabilities**: Supports tasks like text-to-video (T2V), image-to-video (I2V), video-to-video (V2V), and various image/video manipulation.
36
+ * **Competitive Performance**: The 8B model delivers highly competitive results in video generation.
37
+
38
+ ## \ud83d\udd25 Latest News
39
+
40
+ - Inference code and model weights has been released, have fun with MfM ⭐⭐.
41
+
42
+ ## \ud83d\ude80 Inference
43
+
44
+ ### 1. Install the requirements
45
+ ```bash
46
+ pip install -r requirements.txt
47
+ ```
48
+ *Note: The `requirements.txt` file and `infer_mfm_pipeline.py` script can be found in the original [GitHub repository](https://github.com/SandAI-org/MAGI-1).*
49
+
50
+ ### 2. Download the pipeline from Hugging Face
51
+
52
+ ```python
53
+ from huggingface_hub import snapshot_download
54
+
55
+ # For the 8B model:
56
+ snapshot_download(repo_id="LetsThink/MfM-Pipeline-8B", local_dir="your_local_path/MfM-Pipeline-8B")
57
+
58
+ # For the 2B model:
59
+ # snapshot_download(repo_id="LetsThink/MfM-Pipeline-2B", local_dir="your_local_path/MfM-Pipeline-2B")
60
+ ```
61
+
62
+ ### 3. Run Inference
63
+
64
+ You can refer to the inference script in `scripts/inference.sh` from the cloned GitHub repository. Replace `PIPELINE_PATH` with the local directory where you downloaded the model.
65
+
66
+ Example for text-to-video (T2V) generation:
67
+ ```bash
68
+ PIPELINE_PATH=your_local_path/MfM-Pipeline-8B # or your_local_path/MfM-Pipeline-2B
69
+ OUTPUT_DIR=outputs
70
+ TASK=t2v # Change task for different applications (e.g., i2v, v2v, inpaint)
71
+
72
+ python infer_mfm_pipeline.py \
73
+ --pipeline_path $PIPELINE_PATH \
74
+ --output_dir $OUTPUT_DIR \
75
+ --task $TASK \
76
+ --crop_type keep_res \
77
+ --num_inference_steps 30 \
78
+ --guidance_scale 9 \
79
+ --motion_score 5 \
80
+ --num_samples 1 \
81
+ --upscale 4 \
82
+ --noise_aug_strength 0.0 \
83
+ --t2v_inputs your_prompt.txt # Path to a text file with your prompts
84
+ ```
85
+
86
+ ## \ud83d\uddbc\ufe0f Visual Results
87
+
88
+ <div align="center">
89
+ <img src='https://huggingface.co/LetsThink/MfM-Pipeline-8B/resolve/main/assets/visual_result.png' alt="Visual Results">
90
+ </div>
91
+
92
+ ## \ud83d\udcfa Demo Video
93
+
94
+ <div align="center">
95
+ <video src="https://github.com/user-attachments/assets/f1ddd1fd-1c2b-44e7-94dc-9f62963ab147" width="70%" controls> </video>
96
+ </div>
97
+
98
+ ## \ud83d\udcee Architecture
99
+
100
+ <div align="center">
101
+ <img src='https://huggingface.co/LetsThink/MfM-Pipeline-8B/resolve/main/assets/arch.png' alt="Architecture Diagram">
102
+ </div>
103
+
104
+ ## \u270d\ufe0f Citation
105
+
106
+ If you find our code or model useful in your research, please cite:
107
+
108
+ ```bibtex
109
+ @article{yang2025MfM,
110
+ title={Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks},
111
+ author={Tao Yang, Ruibin Li, Yangming Shi, Yuqi Zhang, Qide Dong, Haoran Cheng, Weiguo Feng, Shilei Wen, Bingyue Peng, Lei Zhang},
112
+ year={2025},
113
+ booktitle={arXiv preprint arXiv:2506.01758},
114
+ }
115
+ ```