Ryukijano
/

vjepa-nested-agent-l40

+---
+license: mit
+tags:
+- robotics
+- v-jepa
+- nested-learning
+- flow-matching
+- imitation-learning
+- pytorch
+- lerobot
+---
+# V-JEPA Nested Agent (L40 Training)
+This model is a **V-JEPA 2 based agent** equipped with **Nested Learning memory** and a **Flow Matching action head**, trained on the LeRobot `so101-table-cleanup` dataset.
+## Architecture
+The agent consists of three main components:
+1.  **Vision Encoder**: V-JEPA 2 (Vision-Joint Embedding Predictive Architecture)
+    -   Processes video input `(B, T, C, H, W)`
+    -   Pretrained representation for robust visual features.
+    -   ~327M parameters.
+2.  **Memory**: Nested Learning Module
+    -   Learns hierarchical temporal abstractions.
+    -   Captures long-horizon dependencies in the task.
+    -   ~813K parameters.
+3.  **Action Head**: Flow Matching (Diffusion-like)
+    -   Conditional Flow Matching policy.
+    -   Predicts action trajectories based on visual and memory embeddings.
+    -   Uses `max_state_dim=14` and `action_dim=7`.
+## Training Details
+-   **Dataset**: `so101-table-cleanup` (LeRobot)
+-   **Hardware**: 2x NVIDIA L40 GPUs
+-   **Framework**: PyTorch, HuggingFace Trainer
+-   **Precision**: bfloat16
+## Usage
+This model requires the custom `VJEPANestedAgent` code structure to load.
+```python
+from gr00t.model.vjepa_nested_pipeline import VJEPANestedAgent, VJEPANestedConfig
+from transformers import AutoConfig, AutoModel
+# Load config
+config = VJEPANestedConfig.from_pretrained("cbjp404/vjepa-nested-agent-l40")
+# Initialize model
+model = VJEPANestedAgent(config)
+# Load weights (example using safetensors)
+from safetensors.torch import load_file
+state_dict = load_file("model.safetensors")
+model.load_state_dict(state_dict)
+```
+## Inputs
+The model expects a dictionary input with:
+-   `video`: `(B, T, C, H, W)`
+-   `state`: `(B, state_dim)` (padded to 14)
+-   `action`: `(B, T, action_dim)` (for training)
+-   `action_mask`: `(B, T, 1)`
+-   `embodiment_id`: `(B,)`

config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "action_dim": 7,
+  "action_head_cfg": {
+    "action_dim": 7,
+    "action_horizon": 16,
+    "backbone_embedding_dim": 256,
+    "diffusion_model_cfg": {
+      "attention_head_dim": 64,
+      "cross_attention_dim": 256,
+      "num_attention_heads": 4,
+      "output_dim": 256
+    },
+    "hidden_size": 256,
+    "input_embedding_dim": 256,
+    "max_state_dim": 14,
+    "use_vlln": false,
+    "vl_self_attention_cfg": {}
+  },
+  "action_horizon": 16,
+  "architectures": [
+    "VJEPANestedAgent"
+  ],
+  "backbone_cfg": {},
+  "compute_dtype": "bfloat16",
+  "dtype": "float32",
+  "encoder_dim": 256,
+  "model_type": "vjepa_nested_agent",
+  "nested_hidden_dim": 256,
+  "nested_max_episodes": 100,
+  "nested_memory_dim": 128,
+  "nested_num_skills": 32,
+  "nested_skill_dim": 64,
+  "transformers_version": "4.57.3",
+  "use_nested_learning": true,
+  "vjepa_checkpoint": "/scratch/cbjp404/AI-Endo/vjepa2/vitl.pt"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee97afe2e38279940385990c1cb7c4ff8234e8459b14ac7e0d93379dece8530d
+size 1316056820