Ryukijano commited on
Commit
0265070
·
verified ·
1 Parent(s): 4b509f1

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +71 -0
  2. config.json +36 -0
  3. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - robotics
5
+ - v-jepa
6
+ - nested-learning
7
+ - flow-matching
8
+ - imitation-learning
9
+ - pytorch
10
+ - lerobot
11
+ ---
12
+
13
+ # V-JEPA Nested Agent (L40 Training)
14
+
15
+ This model is a **V-JEPA 2 based agent** equipped with **Nested Learning memory** and a **Flow Matching action head**, trained on the LeRobot `so101-table-cleanup` dataset.
16
+
17
+ ## Architecture
18
+
19
+ The agent consists of three main components:
20
+
21
+ 1. **Vision Encoder**: V-JEPA 2 (Vision-Joint Embedding Predictive Architecture)
22
+ - Processes video input `(B, T, C, H, W)`
23
+ - Pretrained representation for robust visual features.
24
+ - ~327M parameters.
25
+
26
+ 2. **Memory**: Nested Learning Module
27
+ - Learns hierarchical temporal abstractions.
28
+ - Captures long-horizon dependencies in the task.
29
+ - ~813K parameters.
30
+
31
+ 3. **Action Head**: Flow Matching (Diffusion-like)
32
+ - Conditional Flow Matching policy.
33
+ - Predicts action trajectories based on visual and memory embeddings.
34
+ - Uses `max_state_dim=14` and `action_dim=7`.
35
+
36
+ ## Training Details
37
+
38
+ - **Dataset**: `so101-table-cleanup` (LeRobot)
39
+ - **Hardware**: 2x NVIDIA L40 GPUs
40
+ - **Framework**: PyTorch, HuggingFace Trainer
41
+ - **Precision**: bfloat16
42
+
43
+ ## Usage
44
+
45
+ This model requires the custom `VJEPANestedAgent` code structure to load.
46
+
47
+ ```python
48
+ from gr00t.model.vjepa_nested_pipeline import VJEPANestedAgent, VJEPANestedConfig
49
+ from transformers import AutoConfig, AutoModel
50
+
51
+ # Load config
52
+ config = VJEPANestedConfig.from_pretrained("cbjp404/vjepa-nested-agent-l40")
53
+
54
+ # Initialize model
55
+ model = VJEPANestedAgent(config)
56
+
57
+ # Load weights (example using safetensors)
58
+ from safetensors.torch import load_file
59
+ state_dict = load_file("model.safetensors")
60
+ model.load_state_dict(state_dict)
61
+ ```
62
+
63
+ ## Inputs
64
+
65
+ The model expects a dictionary input with:
66
+ - `video`: `(B, T, C, H, W)`
67
+ - `state`: `(B, state_dim)` (padded to 14)
68
+ - `action`: `(B, T, action_dim)` (for training)
69
+ - `action_mask`: `(B, T, 1)`
70
+ - `embodiment_id`: `(B,)`
71
+
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "action_dim": 7,
3
+ "action_head_cfg": {
4
+ "action_dim": 7,
5
+ "action_horizon": 16,
6
+ "backbone_embedding_dim": 256,
7
+ "diffusion_model_cfg": {
8
+ "attention_head_dim": 64,
9
+ "cross_attention_dim": 256,
10
+ "num_attention_heads": 4,
11
+ "output_dim": 256
12
+ },
13
+ "hidden_size": 256,
14
+ "input_embedding_dim": 256,
15
+ "max_state_dim": 14,
16
+ "use_vlln": false,
17
+ "vl_self_attention_cfg": {}
18
+ },
19
+ "action_horizon": 16,
20
+ "architectures": [
21
+ "VJEPANestedAgent"
22
+ ],
23
+ "backbone_cfg": {},
24
+ "compute_dtype": "bfloat16",
25
+ "dtype": "float32",
26
+ "encoder_dim": 256,
27
+ "model_type": "vjepa_nested_agent",
28
+ "nested_hidden_dim": 256,
29
+ "nested_max_episodes": 100,
30
+ "nested_memory_dim": 128,
31
+ "nested_num_skills": 32,
32
+ "nested_skill_dim": 64,
33
+ "transformers_version": "4.57.3",
34
+ "use_nested_learning": true,
35
+ "vjepa_checkpoint": "/scratch/cbjp404/AI-Endo/vjepa2/vitl.pt"
36
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee97afe2e38279940385990c1cb7c4ff8234e8459b14ac7e0d93379dece8530d
3
+ size 1316056820