gpudad commited on
Commit
a019f2a
Β·
verified Β·
1 Parent(s): a9b3348

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - robotics
4
+ - imitation-learning
5
+ - xvla
6
+ - so101
7
+ - pick-and-place
8
+ license: apache-2.0
9
+ ---
10
+
11
+ # X-VLA SO-101 Phase II - All Checkpoints
12
+
13
+ Fine-tuned X-VLA model checkpoints for SO-101 robot arm pick-and-place task.
14
+
15
+ ## Model Details
16
+
17
+ - **Base model:** [lerobot/xvla-base](https://huggingface.co/lerobot/xvla-base)
18
+ - **Training steps:** 200,000 total
19
+ - **Task:** Pick up cube and place in bin
20
+ - **Robot:** SO-101 single arm
21
+ - **Action space:** Delta position control (4D: x, y, z, gripper)
22
+ - **Domain ID:** 0 (WidowX-compatible)
23
+
24
+ ## Available Checkpoints
25
+
26
+ | Checkpoint | Steps | Path |
27
+ |------------|-------|------|
28
+ | 020000 | 20,000 | `020000/pretrained_model/` |
29
+ | 040000 | 40,000 | `040000/pretrained_model/` |
30
+ | 060000 | 60,000 | `060000/pretrained_model/` |
31
+ | 080000 | 80,000 | `080000/pretrained_model/` |
32
+ | 100000 | 100,000 | `100000/pretrained_model/` |
33
+ | 120000 | 120,000 | `120000/pretrained_model/` |
34
+ | 140000 | 140,000 | `140000/pretrained_model/` |
35
+ | 160000 | 160,000 | `160000/pretrained_model/` |
36
+ | 180000 | 180,000 | `180000/pretrained_model/` |
37
+ | 200000 | 200,000 | `200000/pretrained_model/` |
38
+
39
+ ## Training Configuration
40
+
41
+ - **Frozen:** Vision encoder, Language encoder
42
+ - **Trained:** Policy transformer, Soft prompts, Action heads
43
+ - **Loss:** L1 for XYZ, BCE for gripper
44
+ - **LR:** 1e-4 β†’ 1e-5 with warmup
45
+
46
+ ## Best Checkpoint
47
+
48
+ The **200000** checkpoint is recommended - it achieves:
49
+ | Phase | Status |
50
+ |-------|--------|
51
+ | Approach cube | βœ… Works |
52
+ | Grasp cube | βœ… Works |
53
+ | Place in bin | ⚠️ Partial |
54
+
55
+ ## Usage
56
+
57
+ ```python
58
+ from lerobot.common.policies.xvla.modeling_xvla import XVLAPolicy
59
+
60
+ # Load best checkpoint (200k)
61
+ policy = XVLAPolicy.from_pretrained(
62
+ "gpudad/xvla-so101-phase2-checkpoints",
63
+ subfolder="200000/pretrained_model"
64
+ )
65
+
66
+ # Or load an earlier checkpoint
67
+ policy = XVLAPolicy.from_pretrained(
68
+ "gpudad/xvla-so101-phase2-checkpoints",
69
+ subfolder="100000/pretrained_model"
70
+ )
71
+ ```
72
+
73
+ ## Evaluation Tips
74
+
75
+ - Use `n_action_steps=4` for faster re-querying (better performance)
76
+ - Model works best with 128x128 images (front + wrist cameras)
77
+ - Language instruction: "pick up the cube and place it in the bin"
78
+
79
+ ## Files Structure
80
+
81
+ ```
82
+ β”œβ”€β”€ 020000/
83
+ β”‚ └── pretrained_model/
84
+ β”‚ β”œβ”€β”€ model.safetensors
85
+ β”‚ β”œβ”€β”€ config.json
86
+ β”‚ └── ...
87
+ β”œβ”€β”€ 040000/
88
+ β”‚ └── pretrained_model/
89
+ β”œβ”€β”€ ...
90
+ └── 200000/
91
+ └── pretrained_model/
92
+ ```
93
+
94
+ ## Citation
95
+
96
+ Based on X-VLA from LeRobot.