MfM-Pipeline-2B / README.md
nielsr's picture
nielsr HF Staff
Add comprehensive model card for Many-for-Many
0e2d8e4 verified
|
raw
history blame
5.03 kB
metadata
license: apache-2.0
pipeline_tag: any-to-any
library_name: diffusers
tags:
  - many-for-many
  - diffusion-model
  - video-generation
  - image-generation
  - text-to-video
  - image-to-video
  - video-to-video
  - image-manipulation
  - video-manipulation

Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks

MfM-logo

\ud83d\udcda Paper | \ud83c\udf10 Project Page | \ud83d\udcbb Code | \ud83e\udd17 Model

Many-for-Many (MfM) is a novel unified framework designed to train a single model capable of performing over 10 different visual generation and manipulation tasks, encompassing both images and videos. This approach addresses the high cost of training strong text-to-video foundation models by leveraging diverse existing datasets across various tasks.

Specifically, MfM designs a lightweight adapter to unify different conditions across tasks and employs a joint image-video learning strategy to progressively train the model from scratch. This leads to a unified visual generation and manipulation model with improved video generation performance. Additionally, depth maps are introduced as a condition to help the model better perceive 3D space in visual generation.

Two versions of the model are available (8B and 2B), each capable of performing a wide array of tasks. The 8B model demonstrates highly competitive performance in video generation tasks compared to open-source and even commercial engines.

\u2728 Key Features

  • Unified Framework: Trains a single model for over 10 different image and video generation and manipulation tasks.
  • Efficient Design: Utilizes a lightweight adapter to unify diverse conditions and a joint image-video learning strategy for progressive training.
  • Depth-Aware Generation: Incorporates depth maps as a condition to enhance the model's perception of 3D space.
  • Versatile Capabilities: Supports tasks like text-to-video (T2V), image-to-video (I2V), video-to-video (V2V), and various image/video manipulation.
  • Competitive Performance: The 8B model delivers highly competitive results in video generation.

\ud83d\udd25 Latest News

  • Inference code and model weights has been released, have fun with MfM ⭐⭐.

\ud83d\ude80 Inference

1. Install the requirements

pip install -r requirements.txt

Note: The requirements.txt file and infer_mfm_pipeline.py script can be found in the original GitHub repository.

2. Download the pipeline from Hugging Face

from huggingface_hub import snapshot_download

# For the 8B model:
snapshot_download(repo_id="LetsThink/MfM-Pipeline-8B", local_dir="your_local_path/MfM-Pipeline-8B")

# For the 2B model:
# snapshot_download(repo_id="LetsThink/MfM-Pipeline-2B", local_dir="your_local_path/MfM-Pipeline-2B")

3. Run Inference

You can refer to the inference script in scripts/inference.sh from the cloned GitHub repository. Replace PIPELINE_PATH with the local directory where you downloaded the model.

Example for text-to-video (T2V) generation:

PIPELINE_PATH=your_local_path/MfM-Pipeline-8B # or your_local_path/MfM-Pipeline-2B
OUTPUT_DIR=outputs
TASK=t2v # Change task for different applications (e.g., i2v, v2v, inpaint)

python infer_mfm_pipeline.py \
        --pipeline_path $PIPELINE_PATH \
        --output_dir $OUTPUT_DIR \
        --task $TASK \
        --crop_type keep_res \
        --num_inference_steps 30 \
        --guidance_scale 9 \
        --motion_score 5 \
        --num_samples 1 \
        --upscale 4 \
        --noise_aug_strength 0.0 \
        --t2v_inputs your_prompt.txt # Path to a text file with your prompts

\ud83d\uddbc\ufe0f Visual Results

Visual Results

\ud83d\udcfa Demo Video

\ud83d\udcee Architecture

Architecture Diagram

\u270d\ufe0f Citation

If you find our code or model useful in your research, please cite:

@article{yang2025MfM,
  title={Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks},
  author={Tao Yang, Ruibin Li, Yangming Shi, Yuqi Zhang, Qide Dong, Haoran Cheng, Weiguo Feng, Shilei Wen, Bingyue Peng, Lei Zhang},
  year={2025},
  booktitle={arXiv preprint arXiv:2506.01758},
}