Orange-3DV-Team
/

MoCha

Model card Files Files and versions

xet

Community

Add model card and metadata

by nielsr HF Staff - opened Jan 14

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+42

-2

Files changed (1) hide show

README.md +42 -2

README.md CHANGED Viewed

@@ -1,5 +1,45 @@
 ---
-license: agpl-3.0
 base_model:
 - Wan-AI/Wan2.1-T2V-14B
----

 ---
 base_model:
 - Wan-AI/Wan2.1-T2V-14B
+license: agpl-3.0
+pipeline_tag: image-to-video
+---
+# MoCha: End-to-End Video Character Replacement without Structural Guidance
+[**Paper**](https://arxiv.org/abs/2601.08587) | [**Project Page**](https://orange-3dv-team.github.io/MoCha) | [**Github**](https://github.com/Orange-3DV-Team/MoCha)
+MoCha is a pioneering framework for controllable video character replacement that allows users to replace a character in a video with a provided identity using only a single arbitrary frame mask.
+Unlike prior reconstruction-based methods, MoCha does not require per-frame segmentation masks or explicit structural guidance like skeletons or depth maps. This makes it more robust in complex scenarios involving occlusions, unusual poses, or challenging illumination.
+## Key Features
+- **End-to-End Replacement**: Bypasses the need for per-frame masks and structural guidance.
+- **Identity Preservation**: Uses a condition-aware RoPE and RL-based post-training to enhance facial identity and adapt multi-modal inputs.
+- **Robustness**: Handles character-object interactions and complex scenarios better than previous state-of-the-art methods.
+- **Data Construction**: Trained on specialized high-fidelity datasets including UE5-rendered videos and expression-driven portrait animations.
+## Usage
+To use MoCha, please refer to the official [GitHub repository](https://github.com/Orange-3DV-Team/MoCha) for environment setup and inference scripts.
+The basic inference workflow requires:
+1. **Source Video**: The original video with the character to be replaced.
+2. **Designation Mask**: A mask for the first frame marking the character.
+3. **Reference Images**: Images of the new character identity.
+```shell
+python inference_mocha.py --data_path path/to/your/data.csv
+```
+## Citation
+If you find MoCha helpful for your research, please cite:
+```bibtex
+@inproceedings{orange2025mocha,
+  title={MoCha: End-to-End Video Character Replacement without Structural Guidance},
+  author={Zhengbo Xu, Jie Ma, Ziheng Wang, Zhan Peng, Jun Liang, Jing Li},
+  journal={arXiv preprint arXiv:2601.08587},
+  year={2026},
+  url={https://github.com/Orange-3DV-Team/MoCha}
+}
+```