Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: cosmos-predict2
|
| 4 |
+
tags:
|
| 5 |
+
- video-generation
|
| 6 |
+
- world-model
|
| 7 |
+
- cosmos-predict2
|
| 8 |
+
- lora
|
| 9 |
+
- robotics
|
| 10 |
+
base_model: nvidia/Cosmos-Predict2-2B-Video2World
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# cosmos-task1-task2
|
| 14 |
+
|
| 15 |
+
LoRA fine-tunes of [Cosmos-Predict2-2B-Video2World](https://huggingface.co/nvidia/Cosmos-Predict2-2B-Video2World) on the `push-that-thing` task1 and task2 datasets.
|
| 16 |
+
|
| 17 |
+
The checkpoints in this repo are **unfused** — the LoRA adapters (`lora_A`, `lora_B`) are still stored separately from the base layer weights. You must run the one-shot fusion step below before using them with the `run_video2world.py` inference pipeline.
|
| 18 |
+
|
| 19 |
+
## Repository layout
|
| 20 |
+
|
| 21 |
+
Each iteration of training writes four parallel files. They are uploaded as-is:
|
| 22 |
+
|
| 23 |
+
```
|
| 24 |
+
model/iter_<NNNNNNNNN>.pt # net.* + net_ema.* weights with LoRA adapters
|
| 25 |
+
optim/iter_<NNNNNNNNN>.pt # optimizer state (resume only)
|
| 26 |
+
scheduler/iter_<NNNNNNNNN>.pt # LR scheduler state (resume only)
|
| 27 |
+
trainer/iter_<NNNNNNNNN>.pt # grad scaler + iteration counter (resume only)
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
For **inference** you only need `model/iter_<NNNNNNNNN>.pt`. The other three folders are only required to resume training from this iteration.
|
| 31 |
+
|
| 32 |
+
## Inference: fuse, then run
|
| 33 |
+
|
| 34 |
+
The inference pipeline does not apply LoRA adapters at load time, so an unfused checkpoint will load but produce garbage outputs. Fuse it first.
|
| 35 |
+
|
| 36 |
+
### 1. Download
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
ITER=000002500 # set to the iteration you want
|
| 40 |
+
hf download push-that-thing/cosmos-task1-task2 \
|
| 41 |
+
model/iter_${ITER}.pt --local-dir ./ckpts
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
### 2. Fuse
|
| 45 |
+
|
| 46 |
+
`fuse_lora_ckpt.py` lives in [push-that-thing/pdt-mimic](https://github.com/push-that-thing/pdt-mimic) under the `mimic-video` submodule. Clone with submodules first:
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
git clone --recurse-submodules https://github.com/push-that-thing/pdt-mimic.git
|
| 50 |
+
# or, if already cloned:
|
| 51 |
+
git submodule update --init --recursive
|
| 52 |
+
|
| 53 |
+
python pdt-mimic/mimic-video/model/scripts/fuse_lora_ckpt.py \
|
| 54 |
+
./ckpts/model/iter_${ITER}.pt
|
| 55 |
+
# writes ./ckpts/model/iter_${ITER}_fused.pt
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Fusion is deterministic: it walks every key, finds matching `lora_A` / `lora_B` pairs, computes `base + (alpha / rank) * B @ A`, and replaces the `base_layer` entry with the merged tensor. Both `net.*` (regular) and `net_ema.*` (EMA) weights are fused in the same pass.
|
| 59 |
+
|
| 60 |
+
### 3. Run video2world
|
| 61 |
+
|
| 62 |
+
```bash
|
| 63 |
+
python pdt-mimic/mimic-video/model/scripts/run_video2world.py \
|
| 64 |
+
--dit_path ./ckpts/model/iter_${ITER}_fused.pt \
|
| 65 |
+
--input_path /path/to/conditioning.mp4 \
|
| 66 |
+
--num_conditional_frames 5 \
|
| 67 |
+
--prompt "Push the white object to the right into the goal white circle." \
|
| 68 |
+
--save_path ./out.mp4
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Important: ALPHA must match training
|
| 72 |
+
|
| 73 |
+
`fuse_lora_ckpt.py` hardcodes `ALPHA = 32`. These checkpoints were trained with the same value, so the default works as-is. If you ever re-train with a different LoRA alpha you must update that constant before fusing or the merged weights will be scaled incorrectly.
|
| 74 |
+
|
| 75 |
+
## Resuming training
|
| 76 |
+
|
| 77 |
+
To resume training from a given iteration, download all four folders for that iteration and place them under `<job_dir>/checkpoints/{model,optim,scheduler,trainer}/iter_<NNNNNNNNN>.pt`, then write `iter_<NNNNNNNNN>.pt` into `<job_dir>/checkpoints/latest_checkpoint.txt`. The Cosmos `Checkpointer` will pick it up automatically.
|
| 78 |
+
|
| 79 |
+
Do **not** resume from a fused checkpoint — fusion deletes the `lora_A`/`lora_B` keys that the optimizer state references.
|