Wan2.1 LoRA SFT for Robot Video Generation

Fine-tuned from Wan2.1-I2V-14B-480P using VideoX-Fun framework on RoboTwin dataset.

Training Details

Parameter	Value
Framework	VideoX-Fun
Base Model	Wan2.1-I2V-14B-480P
Training Data	RoboTwin aloha-agilex_clean_50 (2,500 videos, 50 tasks)
LoRA Rank	32
LoRA Alpha	16
LoRA Targets	q, k, v, ffn.0, ffn.2
Learning Rate	1e-4 (constant with warmup)
Warmup Steps	100
Precision	bf16
Resolution	640 × 640
Frames	81 per video
Batch Size	1 per GPU (2 GPUs)
Total Steps	12,500 (planned), 2,200 (completed)

File	Training Steps	Description
checkpoint-1200.safetensors	1,200	~1 epoch of training
checkpoint-2200.safetensors	2,200	Latest checkpoint, used for 1000-video inference

Each checkpoint has a ComfyUI-compatible version.

Inference with VideoX-Fun:

SFT-Wan2.1 shows significant improvements in background consistency (+20.8), subject consistency (+22.4), and flow score (+5.1).

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(115)

this model