Add model card and metadata

#4
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +42 -2
README.md CHANGED
@@ -1,5 +1,45 @@
1
  ---
2
- license: agpl-3.0
3
  base_model:
4
  - Wan-AI/Wan2.1-T2V-14B
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - Wan-AI/Wan2.1-T2V-14B
4
+ license: agpl-3.0
5
+ pipeline_tag: image-to-video
6
+ ---
7
+
8
+ # MoCha: End-to-End Video Character Replacement without Structural Guidance
9
+
10
+ [**Paper**](https://arxiv.org/abs/2601.08587) | [**Project Page**](https://orange-3dv-team.github.io/MoCha) | [**Github**](https://github.com/Orange-3DV-Team/MoCha)
11
+
12
+ MoCha is a pioneering framework for controllable video character replacement that allows users to replace a character in a video with a provided identity using only a single arbitrary frame mask.
13
+
14
+ Unlike prior reconstruction-based methods, MoCha does not require per-frame segmentation masks or explicit structural guidance like skeletons or depth maps. This makes it more robust in complex scenarios involving occlusions, unusual poses, or challenging illumination.
15
+
16
+ ## Key Features
17
+ - **End-to-End Replacement**: Bypasses the need for per-frame masks and structural guidance.
18
+ - **Identity Preservation**: Uses a condition-aware RoPE and RL-based post-training to enhance facial identity and adapt multi-modal inputs.
19
+ - **Robustness**: Handles character-object interactions and complex scenarios better than previous state-of-the-art methods.
20
+ - **Data Construction**: Trained on specialized high-fidelity datasets including UE5-rendered videos and expression-driven portrait animations.
21
+
22
+ ## Usage
23
+
24
+ To use MoCha, please refer to the official [GitHub repository](https://github.com/Orange-3DV-Team/MoCha) for environment setup and inference scripts.
25
+
26
+ The basic inference workflow requires:
27
+ 1. **Source Video**: The original video with the character to be replaced.
28
+ 2. **Designation Mask**: A mask for the first frame marking the character.
29
+ 3. **Reference Images**: Images of the new character identity.
30
+
31
+ ```shell
32
+ python inference_mocha.py --data_path path/to/your/data.csv
33
+ ```
34
+
35
+ ## Citation
36
+ If you find MoCha helpful for your research, please cite:
37
+ ```bibtex
38
+ @inproceedings{orange2025mocha,
39
+ title={MoCha: End-to-End Video Character Replacement without Structural Guidance},
40
+ author={Zhengbo Xu, Jie Ma, Ziheng Wang, Zhan Peng, Jun Liang, Jing Li},
41
+ journal={arXiv preprint arXiv:2601.08587},
42
+ year={2026},
43
+ url={https://github.com/Orange-3DV-Team/MoCha}
44
+ }
45
+ ```