pravsels commited on
Commit
e966896
·
verified ·
1 Parent(s): 9fa0a1c

Upload LAM fine-tuned checkpoint (epoch 17, best val_loss=9.68e-5)

Browse files
Files changed (3) hide show
  1. README.md +56 -0
  2. best.pt +3 -0
  3. lam_finetune_isambard.yaml +30 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LAM Fine-Tuned on Bin-Pick-Pack
2
+
3
+ Fine-tuned [DreamDojo LAM](https://arxiv.org/abs/2504.02024) (Latent Action Model, 710M params) on the [bin_pick_pack_coffee_capsules](https://huggingface.co/datasets/villekuosmanen/bin_pick_pack_coffee_capsules) manipulation dataset.
4
+
5
+ ## Training Details
6
+
7
+ - **Base model**: LAM_400k.ckpt (pre-trained on GR1 humanoid data)
8
+ - **Dataset**: villekuosmanen/bin_pick_pack_coffee_capsules (42846 train pairs, 4819 val pairs)
9
+ - **Resolution**: 240x320
10
+ - **Epochs**: 57 completed (best at epoch 17, stopped early)
11
+ - **Batch size**: 32
12
+ - **Learning rate**: 1e-5
13
+ - **Weight decay**: 0.01
14
+ - **KL beta**: 1e-6
15
+ - **Gradient clipping**: 0.3
16
+ - **Hardware**: NVIDIA GH200 (Isambard HPC)
17
+ - **Training time**: ~12h
18
+
19
+ ## Results
20
+
21
+ | Metric | Epoch 0 | Epoch 17 (best) | Epoch 56 (final) |
22
+ |--------|---------|-----------------|------------------|
23
+ | train_loss | 0.000154 | 0.000076 | 0.000058 |
24
+ | val_loss | 0.000137 | 0.000097 | 0.000105 |
25
+ | val_mse | 0.000107 | 0.000080 | 0.000092 |
26
+ | val_kl | 29.35 | 16.57 | 12.83 |
27
+
28
+ Val loss improved until epoch 17 then plateaued around 1.0e-4. Train loss continued decreasing. Mild overfitting but no divergence.
29
+
30
+ ## Checkpoint
31
+
32
+ - **File**: `best.pt` (params only, 2.84 GB)
33
+ - **Contents**: `model_state_dict`, `epoch`, `step`, `best_loss`
34
+ - **SHA-256**: `72e746704080266c7c6aa265035de3bd2132b9ad2783dbfe8d9fc82670a838dc`
35
+
36
+ Verify with:
37
+ ```bash
38
+ sha256sum best.pt
39
+ ```
40
+
41
+ ## Usage
42
+
43
+ ```python
44
+ import torch
45
+
46
+ ckpt = torch.load("best.pt", map_location="cpu", weights_only=False)
47
+ model.load_state_dict(ckpt["model_state_dict"])
48
+ ```
49
+
50
+ ## Config
51
+
52
+ See `lam_finetune_isambard.yaml` for the full training configuration.
53
+
54
+ ## W&B
55
+
56
+ Training curves: [wandb.ai/pravsels/lam-finetune/runs/afu3164m](https://wandb.ai/pravsels/lam-finetune/runs/afu3164m)
best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:72e746704080266c7c6aa265035de3bd2132b9ad2783dbfe8d9fc82670a838dc
3
+ size 2839391882
lam_finetune_isambard.yaml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LAM fine-tuning config for Isambard GH200.
2
+ # Heavy artifacts live on scratch; repo code stays under /home.
3
+
4
+ dataset:
5
+ repo_id: villekuosmanen/bin_pick_pack_coffee_capsules
6
+ root: /scratch/u6cr/pravsels.u6cr/rsl_rl_rwm/data/lerobot
7
+ hf_home: /scratch/u6cr/pravsels.u6cr/rsl_rl_rwm/huggingface
8
+
9
+ training:
10
+ ckpt_path: /scratch/u6cr/pravsels.u6cr/rsl_rl_rwm/checkpoints/LAM_400k.ckpt
11
+ resolution_h: 240
12
+ resolution_w: 320
13
+ batch_size: 32
14
+ max_epochs: 100
15
+ learning_rate: 0.00001
16
+ weight_decay: 0.01
17
+ beta: 0.000001
18
+ grad_clip: 0.3
19
+ val_ratio: 0.1
20
+ split_seed: 0
21
+ num_workers: 8
22
+ device: cuda
23
+ output_dir: /scratch/u6cr/pravsels.u6cr/rsl_rl_rwm/runs/lam_finetune
24
+ save_every_n_epochs: 10
25
+ log_every_n_steps: 50
26
+ wandb:
27
+ enabled: true
28
+ project: lam-finetune
29
+ entity: pravsels
30
+ mode: offline