Human2Robot Finetune (Final)
This is the final version of the human-to-robot video generation model, finetuned for multi-view robot manipulation output.
Model Description
- Input: Human hand manipulation video + text prompt (3 prompt styles supported)
- Output: 3-view robot manipulation video
- Base model: Wan2.2-TI2V-5B
Best Checkpoint
step=21100.ckptis the best performing checkpoint.
Checkpoints
| Checkpoint | Steps | Note |
|---|---|---|
| step=20000.ckpt | 20000 | |
| step=20100.ckpt | 20100 | |
| ... | ... | |
| step=21100.ckpt | 21100 | Best |
| step=21200.ckpt | 21200 | |
| step=21300.ckpt | 21300 |
Directory Structure
human2robot_finetune/
βββ checkpoints/ # Model checkpoints (step=20000 ~ step=21300)
βββ val_samples/ # Validation sample videos
β βββ keyframe_comparison/ # Keyframe comparison images
βββ README.md