--- license: apache-2.0 library_name: cosmos-predict2 tags: - video-generation - world-model - cosmos-predict2 - lora - robotics base_model: nvidia/Cosmos-Predict2-2B-Video2World --- # cosmos-task1-task2 LoRA fine-tunes of [Cosmos-Predict2-2B-Video2World](https://huggingface.co/nvidia/Cosmos-Predict2-2B-Video2World) on the `push-that-thing` task1 and task2 datasets. The checkpoints in this repo are **unfused** — the LoRA adapters (`lora_A`, `lora_B`) are still stored separately from the base layer weights. You must run the one-shot fusion step below before using them with the `run_video2world.py` inference pipeline. ## Repository layout Each iteration of training writes four parallel files. They are uploaded as-is: ``` model/iter_.pt # net.* + net_ema.* weights with LoRA adapters optim/iter_.pt # optimizer state (resume only) scheduler/iter_.pt # LR scheduler state (resume only) trainer/iter_.pt # grad scaler + iteration counter (resume only) ``` For **inference** you only need `model/iter_.pt`. The other three folders are only required to resume training from this iteration. ## Inference: fuse, then run The inference pipeline does not apply LoRA adapters at load time, so an unfused checkpoint will load but produce garbage outputs. Fuse it first. ### 1. Download ```bash ITER=000002500 # set to the iteration you want hf download push-that-thing/cosmos-task1-task2 \ model/iter_${ITER}.pt --local-dir ./ckpts ``` ### 2. Fuse `fuse_lora_ckpt.py` lives in [push-that-thing/pdt-mimic](https://github.com/push-that-thing/pdt-mimic) under the `mimic-video` submodule. Clone with submodules first: ```bash git clone --recurse-submodules https://github.com/push-that-thing/pdt-mimic.git # or, if already cloned: git submodule update --init --recursive python pdt-mimic/mimic-video/model/scripts/fuse_lora_ckpt.py \ ./ckpts/model/iter_${ITER}.pt # writes ./ckpts/model/iter_${ITER}_fused.pt ``` Fusion is deterministic: it walks every key, finds matching `lora_A` / `lora_B` pairs, computes `base + (alpha / rank) * B @ A`, and replaces the `base_layer` entry with the merged tensor. Both `net.*` (regular) and `net_ema.*` (EMA) weights are fused in the same pass. ### 3. Run video2world ```bash python pdt-mimic/mimic-video/model/scripts/run_video2world.py \ --dit_path ./ckpts/model/iter_${ITER}_fused.pt \ --input_path /path/to/conditioning.mp4 \ --num_conditional_frames 5 \ --prompt "Push the white object to the right into the goal white circle." \ --save_path ./out.mp4 ``` ## Important: ALPHA must match training `fuse_lora_ckpt.py` hardcodes `ALPHA = 32`. These checkpoints were trained with the same value, so the default works as-is. If you ever re-train with a different LoRA alpha you must update that constant before fusing or the merged weights will be scaled incorrectly. ## Resuming training To resume training from a given iteration, download all four folders for that iteration and place them under `/checkpoints/{model,optim,scheduler,trainer}/iter_.pt`, then write `iter_.pt` into `/checkpoints/latest_checkpoint.txt`. The Cosmos `Checkpointer` will pick it up automatically. Do **not** resume from a fused checkpoint — fusion deletes the `lora_A`/`lora_B` keys that the optimizer state references.