ThomasTheMaker commited on
Commit
3a71a0f
Β·
verified Β·
1 Parent(s): a7eb19f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - robotics
5
+ - vjepa2
6
+ - dm_control
7
+ - world-model
8
+ - teach-by-showing
9
+ ---
10
+
11
+ # V-JEPA 2 Robot Multi-Task Dataset & Models
12
+
13
+ Vision-based robot control data using **V-JEPA 2** (ViT-L) latent representations
14
+ from DeepMind Control Suite environments.
15
+
16
+ ## πŸ“Š Dataset
17
+
18
+ | Task | Episodes | Transitions | Latent Dim | Action Dim | Success Rate |
19
+ |------|----------|-------------|------------|------------|-------------|
20
+ | reacher_easy | 1,000 | 200,000 | 1024 | 2 | 28.9% |
21
+ | point_mass_easy | 1,000 | 200,000 | 1024 | 2 | 0.6% |
22
+ | cartpole_swingup | 1,000 | 200,000 | 1024 | 1 | 0.0% |
23
+
24
+ Each `.npz` file contains:
25
+ - `z_t` β€” V-JEPA 2 latent state embeddings (N Γ— 1024)
26
+ - `a_t` β€” actions taken (N Γ— action_dim)
27
+ - `z_next` β€” next-state latent embeddings (N Γ— 1024)
28
+ - `rewards` β€” per-step rewards (N,)
29
+
30
+ ## πŸ€– Models
31
+
32
+ For each task, we provide:
33
+ - **5Γ— Dynamics Ensemble** β€” `dyn_0.pt` to `dyn_4.pt` (MLP: z + a β†’ z_next, ~1.58M params each)
34
+ - **1Γ— Reward Model** β€” `reward.pt` (MLP: z + a β†’ reward, ~329K params)
35
+
36
+ ### Architecture
37
+ - Dynamics: `Linear(1024+a_dim, 512) β†’ LN β†’ ReLU β†’ Γ—3 β†’ Linear(512, 1024)` + residual connection
38
+ - Reward: `Linear(1024+a_dim, 256) β†’ ReLU β†’ Γ—2 β†’ Linear(256, 1)`
39
+ - Ensemble diversity (weight cosine sim): ~0.60
40
+
41
+ ## πŸ—οΈ How It Was Built
42
+
43
+ 1. Expert policies collect episodes in dm_control environments
44
+ 2. Each frame rendered at 224Γ—224, encoded with V-JEPA 2 ViT-L (8-frame sliding windows)
45
+ 3. Dynamics ensemble trained with random data splits + different seeds
46
+ 4. Reward model trained to predict per-step rewards from z_t + a_t
47
+
48
+ ## πŸ“ˆ Training Details
49
+
50
+ - **GPU:** NVIDIA A100-SXM4-80GB (Prime Intellect)
51
+ - **Total time:** 5.4 hours
52
+ - **Total cost:** ~$7
53
+ - **Dynamics val loss:** ~0.0008 (reacher, point_mass), ~0.0002 (cartpole)
54
+ - **Temporal coherence:** >0.998 for all tasks
55
+
56
+ ## 🎯 Purpose
57
+
58
+ These world models are designed for **"teach-by-showing"** β€” demonstrating a task via video,
59
+ then using the learned dynamics + CEM planning to reproduce the shown behavior.