Add model card and metadata
#4
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,5 +1,45 @@
|
|
| 1 |
---
|
| 2 |
-
license: agpl-3.0
|
| 3 |
base_model:
|
| 4 |
- Wan-AI/Wan2.1-T2V-14B
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Wan-AI/Wan2.1-T2V-14B
|
| 4 |
+
license: agpl-3.0
|
| 5 |
+
pipeline_tag: image-to-video
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# MoCha: End-to-End Video Character Replacement without Structural Guidance
|
| 9 |
+
|
| 10 |
+
[**Paper**](https://arxiv.org/abs/2601.08587) | [**Project Page**](https://orange-3dv-team.github.io/MoCha) | [**Github**](https://github.com/Orange-3DV-Team/MoCha)
|
| 11 |
+
|
| 12 |
+
MoCha is a pioneering framework for controllable video character replacement that allows users to replace a character in a video with a provided identity using only a single arbitrary frame mask.
|
| 13 |
+
|
| 14 |
+
Unlike prior reconstruction-based methods, MoCha does not require per-frame segmentation masks or explicit structural guidance like skeletons or depth maps. This makes it more robust in complex scenarios involving occlusions, unusual poses, or challenging illumination.
|
| 15 |
+
|
| 16 |
+
## Key Features
|
| 17 |
+
- **End-to-End Replacement**: Bypasses the need for per-frame masks and structural guidance.
|
| 18 |
+
- **Identity Preservation**: Uses a condition-aware RoPE and RL-based post-training to enhance facial identity and adapt multi-modal inputs.
|
| 19 |
+
- **Robustness**: Handles character-object interactions and complex scenarios better than previous state-of-the-art methods.
|
| 20 |
+
- **Data Construction**: Trained on specialized high-fidelity datasets including UE5-rendered videos and expression-driven portrait animations.
|
| 21 |
+
|
| 22 |
+
## Usage
|
| 23 |
+
|
| 24 |
+
To use MoCha, please refer to the official [GitHub repository](https://github.com/Orange-3DV-Team/MoCha) for environment setup and inference scripts.
|
| 25 |
+
|
| 26 |
+
The basic inference workflow requires:
|
| 27 |
+
1. **Source Video**: The original video with the character to be replaced.
|
| 28 |
+
2. **Designation Mask**: A mask for the first frame marking the character.
|
| 29 |
+
3. **Reference Images**: Images of the new character identity.
|
| 30 |
+
|
| 31 |
+
```shell
|
| 32 |
+
python inference_mocha.py --data_path path/to/your/data.csv
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
## Citation
|
| 36 |
+
If you find MoCha helpful for your research, please cite:
|
| 37 |
+
```bibtex
|
| 38 |
+
@inproceedings{orange2025mocha,
|
| 39 |
+
title={MoCha: End-to-End Video Character Replacement without Structural Guidance},
|
| 40 |
+
author={Zhengbo Xu, Jie Ma, Ziheng Wang, Zhan Peng, Jun Liang, Jing Li},
|
| 41 |
+
journal={arXiv preprint arXiv:2601.08587},
|
| 42 |
+
year={2026},
|
| 43 |
+
url={https://github.com/Orange-3DV-Team/MoCha}
|
| 44 |
+
}
|
| 45 |
+
```
|