--- pipeline_tag: any-to-any library_name: diffusers license: apache-2.0 --- # Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
## Demo Video
## Usage
You can load the model using the `diffusers` library and perform various generation tasks.
First, ensure you have the necessary requirements installed:
```bash
pip install -r requirements.txt
```
Then, you can download the pipeline from Hugging Face Hub and use it for inference:
```python
from huggingface_hub import snapshot_download
from diffusers import DiffusionPipeline
import torch
import os
# Define a local directory to download the model
local_dir = "./MfM-Pipeline-8B"
# Download the pipeline from Hugging Face Hub
# You can use "LetsThink/MfM-Pipeline-2B" for the 2B version
snapshot_download(repo_id="LetsThink/MfM-Pipeline-8B", local_dir=local_dir)
# Load the pipeline. Since MfMPipeline is a custom class, we need trust_remote_code=True.
pipe = DiffusionPipeline.from_pretrained(local_dir, torch_dtype=torch.float16, trust_remote_code=True)
pipe.to("cuda") # or your preferred device like "cpu"
# Example: Text-to-Video generation (task="t2v")
prompt = "A majestic eagle flying over snow-capped mountains."
output_dir = "outputs"
task = "t2v" # The model supports multiple tasks like "t2v", "i2v", "i2i", etc.
# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)
# Run inference
# Parameters like num_frames, num_inference_steps, guidance_scale, motion_score
# are crucial and may vary per task. Refer to the official GitHub repository
# for recommended values and detailed usage for different tasks.
video_frames = pipe(
prompt=prompt,
task=task,
crop_type="keep_res",
num_inference_steps=30,
guidance_scale=9,
motion_score=5,
num_samples=1,
upscale=4,
noise_aug_strength=0.0,
# t2v_inputs expects a path to a file with prompts, here we pass prompt directly.
# For full functionality as in infer_mfm_pipeline.py, you might need to adapt.
).images[0] # The pipeline returns a list of generated results, take the first one
# You can save the video frames as a GIF or MP4 using libraries like imageio or moviepy
# Example using imageio (install with: pip install imageio imageio-ffmpeg)
# import imageio
# output_video_path = os.path.join(output_dir, "generated_video.mp4")
# imageio.mimsave(output_video_path, video_frames, fps=8)
# print(f"Generated video saved to {output_video_path}")
```
## Citation
If you find our code or model useful in your research, please cite:
```bibtex
@article{yang2025MfM,
title={Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks},
author={Tao Yang, Ruibin Li, Yangming Shi, Yuqi Zhang, Qide Dong, Haoran Cheng, Weiguo Feng, Shilei Wen, Bingyue Peng, Lei Zhang},
year={2025},
booktitle={arXiv preprint arXiv:2506.01758},
}
```