Enderfga's picture
Add README for human2robot_finetune
0daeaab verified

Human2Robot Finetune (Final)

This is the final version of the human-to-robot video generation model, finetuned for multi-view robot manipulation output.

Model Description

  • Input: Human hand manipulation video + text prompt (3 prompt styles supported)
  • Output: 3-view robot manipulation video
  • Base model: Wan2.2-TI2V-5B

Best Checkpoint

step=21100.ckpt is the best performing checkpoint.

Checkpoints

Checkpoint Steps Note
step=20000.ckpt 20000
step=20100.ckpt 20100
... ...
step=21100.ckpt 21100 Best
step=21200.ckpt 21200
step=21300.ckpt 21300

Directory Structure

human2robot_finetune/
β”œβ”€β”€ checkpoints/          # Model checkpoints (step=20000 ~ step=21300)
β”œβ”€β”€ val_samples/          # Validation sample videos
β”‚   └── keyframe_comparison/  # Keyframe comparison images
└── README.md