Spaces:
Build error
Build error
| title: MIMO - Character Video Synthesis | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.7.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| python_version: "3.10" | |
| # MIMO - Controllable Character Video Synthesis | |
| **π¬ Complete Implementation - Optimized for HuggingFace Spaces** | |
| Transform character images into animated videos with controllable motion and advanced video editing capabilities. | |
| ## π Quick Start | |
| 1. **Setup Models**: Click "Setup Models" button (downloads required models) | |
| 2. **Load Model**: Click "Load Model" button (initializes MIMO pipeline) | |
| 3. **Upload Image**: Character image (person, anime, cartoon, etc.) | |
| 4. **Choose Template** (Optional): Select motion template or use reference image only | |
| 5. **Generate**: Create animated video | |
| > **Note on Templates**: Video templates are optional. See [TEMPLATES_SETUP.md](TEMPLATES_SETUP.md) for adding custom templates. | |
| ## β‘ Why This Approach? | |
| To prevent HuggingFace Spaces build timeout, we use **progressive loading**: | |
| - **Minimal dependencies** at startup (fast build) | |
| - **Runtime installation** of heavy packages (TensorFlow, OpenCV) | |
| - **Full features** available after one-time setup | |
| ## Features | |
| ### π Character Animation Mode | |
| - Simple character animation with motion templates | |
| - Based on `run_animate.py` from original repository | |
| - Fast generation (512x512, 20 steps) | |
| ### π¬ Video Character Editing Mode | |
| - Advanced editing with background preservation | |
| - Human segmentation and occlusion handling | |
| - Based on `run_edit.py` from original repository | |
| - High quality (784x784, 25 steps) | |
| ## Available Templates | |
| **Sports:** basketball_gym, nba_dunk, nba_pass, football | |
| **Action:** kungfu_desert, kungfu_match, parkour, BruceLee | |
| **Dance:** dance_indoor, irish_dance | |
| **Synthetic:** syn_basketball, syn_dancing | |
| ## Technical Details | |
| - **Models:** Stable Diffusion v1.5 + 3D UNet + Pose Guider | |
| - **GPU:** Auto-detection (T4/A10G/A100) with FP16/FP32 | |
| - **Resolution:** 512x512 (Animation), 784x784 (Editing) | |
| - **Processing:** 2-5 minutes depending on template | |
| - **Video I/O:** PyAV (`av` pip package) for frame decoding/encoding | |
| ## Credits | |
| **Paper:** [MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling](https://arxiv.org/abs/2409.16160) | |
| **Authors:** Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo (Alibaba Group) | |
| **Conference:** CVPR 2025 | |
| **Code:** [GitHub](https://github.com/menyifang/MIMO) |