mimo-1.0

Build error

App Files Files Community

mimo-1.0 / README.md

minhho

Clean deployment: All fixes without binary files

6f2c7f0 3 months ago

preview code

raw

history blame contribute delete

2.44 kB

	---
	title: MIMO - Character Video Synthesis
	emoji: 🎭
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.7.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	python_version: "3.10"
	---

	# MIMO - Controllable Character Video Synthesis

	🎬 Complete Implementation - Optimized for HuggingFace Spaces

	Transform character images into animated videos with controllable motion and advanced video editing capabilities.

	## 🚀 Quick Start

	1. Setup Models: Click "Setup Models" button (downloads required models)
	2. Load Model: Click "Load Model" button (initializes MIMO pipeline)
	3. Upload Image: Character image (person, anime, cartoon, etc.)
	4. Choose Template (Optional): Select motion template or use reference image only
	5. Generate: Create animated video

	> Note on Templates: Video templates are optional. See [TEMPLATES_SETUP.md](TEMPLATES_SETUP.md) for adding custom templates.

	## ⚡ Why This Approach?

	To prevent HuggingFace Spaces build timeout, we use progressive loading:
	- Minimal dependencies at startup (fast build)
	- Runtime installation of heavy packages (TensorFlow, OpenCV)
	- Full features available after one-time setup

	## Features

	### 🎭 Character Animation Mode
	- Simple character animation with motion templates
	- Based on `run_animate.py` from original repository
	- Fast generation (512x512, 20 steps)

	### 🎬 Video Character Editing Mode
	- Advanced editing with background preservation
	- Human segmentation and occlusion handling
	- Based on `run_edit.py` from original repository
	- High quality (784x784, 25 steps)

	## Available Templates

	Sports: basketball_gym, nba_dunk, nba_pass, football
	Action: kungfu_desert, kungfu_match, parkour, BruceLee
	Dance: dance_indoor, irish_dance
	Synthetic: syn_basketball, syn_dancing

	## Technical Details

	- Models: Stable Diffusion v1.5 + 3D UNet + Pose Guider
	- GPU: Auto-detection (T4/A10G/A100) with FP16/FP32
	- Resolution: 512x512 (Animation), 784x784 (Editing)
	- Processing: 2-5 minutes depending on template
	- Video I/O: PyAV (`av` pip package) for frame decoding/encoding

	## Credits

	Paper: [MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling](https://arxiv.org/abs/2409.16160)
	Authors: Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo (Alibaba Group)
	Conference: CVPR 2025
	Code: [GitHub](https://github.com/menyifang/MIMO)