Human2Robot Finetune (Final)

This is the final version of the human-to-robot video generation model, finetuned for multi-view robot manipulation output.

Model Description

Input: Human hand manipulation video + text prompt (3 prompt styles supported)
Output: 3-view robot manipulation video
Base model: Wan2.2-TI2V-5B

Best Checkpoint

step=21100.ckpt is the best performing checkpoint.

Checkpoints

Checkpoint	Steps	Note
step=20000.ckpt	20000
step=20100.ckpt	20100
...	...
step=21100.ckpt	21100	Best
step=21200.ckpt	21200
step=21300.ckpt	21300

Directory Structure

human2robot_finetune/
├── checkpoints/          # Model checkpoints (step=20000 ~ step=21300)
├── val_samples/          # Validation sample videos
│   └── keyframe_comparison/  # Keyframe comparison images
└── README.md