push-that-thing
/

cosmos-task1-task2

cosmos-predict2

video-generation

Model card Files Files and versions

cosmos-task1-task2 / README.md

tlefur's picture

Upload README.md with huggingface_hub

f4f576a verified 19 days ago

|

history blame contribute delete

3.41 kB

	---
	license: apache-2.0
	library_name: cosmos-predict2
	tags:
	- video-generation
	- world-model
	- cosmos-predict2
	- lora
	- robotics
	base_model: nvidia/Cosmos-Predict2-2B-Video2World
	---

	# cosmos-task1-task2

	LoRA fine-tunes of [Cosmos-Predict2-2B-Video2World](https://huggingface.co/nvidia/Cosmos-Predict2-2B-Video2World) on the `push-that-thing` task1 and task2 datasets.

	The checkpoints in this repo are unfused — the LoRA adapters (`lora_A`, `lora_B`) are still stored separately from the base layer weights. You must run the one-shot fusion step below before using them with the `run_video2world.py` inference pipeline.

	## Repository layout

	Each iteration of training writes four parallel files. They are uploaded as-is:

	```
	model/iter_<NNNNNNNNN>.pt # net.* + net_ema.* weights with LoRA adapters
	optim/iter_<NNNNNNNNN>.pt # optimizer state (resume only)
	scheduler/iter_<NNNNNNNNN>.pt # LR scheduler state (resume only)
	trainer/iter_<NNNNNNNNN>.pt # grad scaler + iteration counter (resume only)
	```

	For inference you only need `model/iter_<NNNNNNNNN>.pt`. The other three folders are only required to resume training from this iteration.

	## Inference: fuse, then run

	The inference pipeline does not apply LoRA adapters at load time, so an unfused checkpoint will load but produce garbage outputs. Fuse it first.

	### 1. Download

	```bash
	ITER=000002500 # set to the iteration you want
	hf download push-that-thing/cosmos-task1-task2 \
	model/iter_${ITER}.pt --local-dir ./ckpts
	```

	### 2. Fuse

	`fuse_lora_ckpt.py` lives in [push-that-thing/pdt-mimic](https://github.com/push-that-thing/pdt-mimic) under the `mimic-video` submodule. Clone with submodules first:

	```bash
	git clone --recurse-submodules https://github.com/push-that-thing/pdt-mimic.git
	# or, if already cloned:
	git submodule update --init --recursive

	python pdt-mimic/mimic-video/model/scripts/fuse_lora_ckpt.py \
	./ckpts/model/iter_${ITER}.pt
	# writes ./ckpts/model/iter_${ITER}_fused.pt
	```

	Fusion is deterministic: it walks every key, finds matching `lora_A` / `lora_B` pairs, computes `base + (alpha / rank) * B @ A`, and replaces the `base_layer` entry with the merged tensor. Both `net.` (regular) and `net_ema.` (EMA) weights are fused in the same pass.

	### 3. Run video2world

	```bash
	python pdt-mimic/mimic-video/model/scripts/run_video2world.py \
	--dit_path ./ckpts/model/iter_${ITER}_fused.pt \
	--input_path /path/to/conditioning.mp4 \
	--num_conditional_frames 5 \
	--prompt "Push the white object to the right into the goal white circle." \
	--save_path ./out.mp4
	```

	## Important: ALPHA must match training

	`fuse_lora_ckpt.py` hardcodes `ALPHA = 32`. These checkpoints were trained with the same value, so the default works as-is. If you ever re-train with a different LoRA alpha you must update that constant before fusing or the merged weights will be scaled incorrectly.

	## Resuming training

	To resume training from a given iteration, download all four folders for that iteration and place them under `<job_dir>/checkpoints/{model,optim,scheduler,trainer}/iter_<NNNNNNNNN>.pt`, then write `iter_<NNNNNNNNN>.pt` into `<job_dir>/checkpoints/latest_checkpoint.txt`. The Cosmos `Checkpointer` will pick it up automatically.

	Do not resume from a fused checkpoint — fusion deletes the `lora_A`/`lora_B` keys that the optimizer state references.