tlefur commited on
Commit
f4f576a
·
verified ·
1 Parent(s): 7e2e56b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: cosmos-predict2
4
+ tags:
5
+ - video-generation
6
+ - world-model
7
+ - cosmos-predict2
8
+ - lora
9
+ - robotics
10
+ base_model: nvidia/Cosmos-Predict2-2B-Video2World
11
+ ---
12
+
13
+ # cosmos-task1-task2
14
+
15
+ LoRA fine-tunes of [Cosmos-Predict2-2B-Video2World](https://huggingface.co/nvidia/Cosmos-Predict2-2B-Video2World) on the `push-that-thing` task1 and task2 datasets.
16
+
17
+ The checkpoints in this repo are **unfused** — the LoRA adapters (`lora_A`, `lora_B`) are still stored separately from the base layer weights. You must run the one-shot fusion step below before using them with the `run_video2world.py` inference pipeline.
18
+
19
+ ## Repository layout
20
+
21
+ Each iteration of training writes four parallel files. They are uploaded as-is:
22
+
23
+ ```
24
+ model/iter_<NNNNNNNNN>.pt # net.* + net_ema.* weights with LoRA adapters
25
+ optim/iter_<NNNNNNNNN>.pt # optimizer state (resume only)
26
+ scheduler/iter_<NNNNNNNNN>.pt # LR scheduler state (resume only)
27
+ trainer/iter_<NNNNNNNNN>.pt # grad scaler + iteration counter (resume only)
28
+ ```
29
+
30
+ For **inference** you only need `model/iter_<NNNNNNNNN>.pt`. The other three folders are only required to resume training from this iteration.
31
+
32
+ ## Inference: fuse, then run
33
+
34
+ The inference pipeline does not apply LoRA adapters at load time, so an unfused checkpoint will load but produce garbage outputs. Fuse it first.
35
+
36
+ ### 1. Download
37
+
38
+ ```bash
39
+ ITER=000002500 # set to the iteration you want
40
+ hf download push-that-thing/cosmos-task1-task2 \
41
+ model/iter_${ITER}.pt --local-dir ./ckpts
42
+ ```
43
+
44
+ ### 2. Fuse
45
+
46
+ `fuse_lora_ckpt.py` lives in [push-that-thing/pdt-mimic](https://github.com/push-that-thing/pdt-mimic) under the `mimic-video` submodule. Clone with submodules first:
47
+
48
+ ```bash
49
+ git clone --recurse-submodules https://github.com/push-that-thing/pdt-mimic.git
50
+ # or, if already cloned:
51
+ git submodule update --init --recursive
52
+
53
+ python pdt-mimic/mimic-video/model/scripts/fuse_lora_ckpt.py \
54
+ ./ckpts/model/iter_${ITER}.pt
55
+ # writes ./ckpts/model/iter_${ITER}_fused.pt
56
+ ```
57
+
58
+ Fusion is deterministic: it walks every key, finds matching `lora_A` / `lora_B` pairs, computes `base + (alpha / rank) * B @ A`, and replaces the `base_layer` entry with the merged tensor. Both `net.*` (regular) and `net_ema.*` (EMA) weights are fused in the same pass.
59
+
60
+ ### 3. Run video2world
61
+
62
+ ```bash
63
+ python pdt-mimic/mimic-video/model/scripts/run_video2world.py \
64
+ --dit_path ./ckpts/model/iter_${ITER}_fused.pt \
65
+ --input_path /path/to/conditioning.mp4 \
66
+ --num_conditional_frames 5 \
67
+ --prompt "Push the white object to the right into the goal white circle." \
68
+ --save_path ./out.mp4
69
+ ```
70
+
71
+ ## Important: ALPHA must match training
72
+
73
+ `fuse_lora_ckpt.py` hardcodes `ALPHA = 32`. These checkpoints were trained with the same value, so the default works as-is. If you ever re-train with a different LoRA alpha you must update that constant before fusing or the merged weights will be scaled incorrectly.
74
+
75
+ ## Resuming training
76
+
77
+ To resume training from a given iteration, download all four folders for that iteration and place them under `<job_dir>/checkpoints/{model,optim,scheduler,trainer}/iter_<NNNNNNNNN>.pt`, then write `iter_<NNNNNNNNN>.pt` into `<job_dir>/checkpoints/latest_checkpoint.txt`. The Cosmos `Checkpointer` will pick it up automatically.
78
+
79
+ Do **not** resume from a fused checkpoint — fusion deletes the `lora_A`/`lora_B` keys that the optimizer state references.