calebescobedo commited on
Commit
2cbc7b0
·
verified ·
1 Parent(s): 5bb85ec

Add model card with inputs, outputs, and usage instructions

Browse files
Files changed (1) hide show
  1. README.md +136 -0
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Sensor Diffusion Policy - Epoch 220
2
+
3
+ Diffusion policy model trained on proximity sensor data with table camera images.
4
+
5
+ ## Model Details
6
+
7
+ - **Model Type**: Diffusion Policy
8
+ - **Training Epochs**: 220/300
9
+ - **Horizon**: 16 steps
10
+ - **Observation Steps**: 1
11
+ - **Action Steps**: 8
12
+
13
+ ## Inputs
14
+
15
+ The model expects the following inputs:
16
+
17
+ ### 1. `observation.state` (STATE)
18
+ - **Shape**: `(batch, 1, 7)`
19
+ - **Description**: Joint positions for 7-DOF arm
20
+ - **Normalization**: Min-max normalized using dataset statistics
21
+
22
+ ### 2. `observation.goal` (STATE)
23
+ - **Shape**: `(batch, 1, 3)`
24
+ - **Description**: Goal cartesian position (X, Y, Z in meters)
25
+ - **Normalization**: Min-max normalized using dataset statistics
26
+
27
+ ### 3. `observation.images.table_camera` (VISUAL)
28
+ - **Shape**: `(batch, 1, 3, 480, 640)`
29
+ - **Description**: RGB images from table camera
30
+ - **Normalization**: Mean-std normalized (normalized to [0, 1] then mean-std)
31
+
32
+ ### 4. `observation.proximity` (STATE)
33
+ - **Shape**: `(batch, 1, 128)`
34
+ - **Description**: Encoded proximity sensor latent (37 sensors × 8×8 depth maps → 128-dim)
35
+ - **Normalization**: Min-max normalized using dataset statistics
36
+ - **Note**: Requires pretrained ProximityAutoencoder encoder
37
+
38
+ ## Outputs
39
+
40
+ ### `action` (ACTION)
41
+ - **Shape**: `(batch, 7)`
42
+ - **Description**: Joint positions (7-DOF) for the next timestep
43
+ - **Type**: Joint positions (not velocities) - these are the next positions the robot should move to
44
+ - **Normalization**: Output is unnormalized (raw joint positions in radians)
45
+
46
+ ## Usage
47
+
48
+ ```python
49
+ from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
50
+ from lerobot.policies.factory import make_pre_post_processors
51
+ import torch
52
+
53
+ # Load model and processors
54
+ repo_id = "calebescobedo/sensor-diffusion-policy-table-camera-epoch220"
55
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
56
+
57
+ policy = DiffusionPolicy.from_pretrained(repo_id)
58
+ policy.eval()
59
+ policy.to(device)
60
+
61
+ preprocessor, postprocessor = make_pre_post_processors(
62
+ policy_cfg=policy.config,
63
+ pretrained_path=repo_id
64
+ )
65
+
66
+ # Prepare input batch
67
+ # Note: You need to encode proximity sensors using the ProximityAutoencoder first
68
+ batch = {
69
+ "observation.state": torch.tensor([...]), # Shape: (batch, 1, 7)
70
+ "observation.goal": torch.tensor([...]), # Shape: (batch, 1, 3)
71
+ "observation.images.table_camera": torch.tensor([...]), # Shape: (batch, 1, 3, 480, 640)
72
+ "observation.proximity": torch.tensor([...]), # Shape: (batch, 1, 128) - encoded
73
+ }
74
+
75
+ # Predict action
76
+ with torch.no_grad():
77
+ batch_processed = preprocessor(batch) # Normalizes inputs
78
+ actions = policy.select_action(batch_processed) # Returns normalized actions
79
+ actions = postprocessor(actions) # Unnormalizes to raw joint positions
80
+
81
+ # actions shape: (batch, 7) - joint positions in radians
82
+ ```
83
+
84
+ ## Proximity Sensor Encoding
85
+
86
+ The proximity sensors must be encoded before use. You need to load the ProximityAutoencoder:
87
+
88
+ ```python
89
+ from architectures.proximity_autoencoder import ProximityAutoencoder
90
+ import torch
91
+
92
+ # Load proximity encoder
93
+ encoder_path = "path/to/proximity_autoencoder.pth"
94
+ ae_model = ProximityAutoencoder(num_sensors=37, depth_channels=1, latent_dim=128, use_attention=True)
95
+ ae_model.load_state_dict(torch.load(encoder_path, map_location='cpu'))
96
+ proximity_encoder = ae_model.encoder
97
+ proximity_encoder.eval()
98
+
99
+ # Encode proximity sensors (37 sensors × 8×8 depth maps)
100
+ # raw_proximity shape: (batch, 37, 8, 8)
101
+ encoded_proximity = proximity_encoder(raw_proximity) # Shape: (batch, 128)
102
+ ```
103
+
104
+ ## Dataset Statistics
105
+
106
+ Dataset statistics are included in `config.json` under the `dataset_stats` key. These are used for normalization/unnormalization and were computed from the training dataset:
107
+ - `/home/caleb/datasets/sensor/roboset_20260117_014645/*.h5` (20 files, ~500 trajectories)
108
+
109
+ ## Training Details
110
+
111
+ - **Dataset**: Sensor dataset with proximity sensors and table camera
112
+ - **Training Loss**: ~0.003-0.004 (at epoch 220)
113
+ - **Optimizer**: Adam (LeRobot preset)
114
+ - **Learning Rate**: From LeRobot optimizer preset
115
+ - **Mixed Precision**: Enabled (AMP)
116
+ - **Data Augmentation**: State noise (30% prob, scale=0.005), Action noise (30% prob, scale=0.0005), Goal noise (30% prob)
117
+
118
+ ## Model Architecture
119
+
120
+ - **Vision Backbone**: ResNet18
121
+ - **Diffusion Steps**: 100
122
+ - **Noise Scheduler**: DDPM with squaredcos_cap_v2 beta schedule
123
+ - **Total Parameters**: ~261M
124
+
125
+ ## Citation
126
+
127
+ If you use this model, please cite:
128
+ ```bibtex
129
+ @misc{sensor-diffusion-policy-epoch220,
130
+ author = {Caleb Escobedo},
131
+ title = {Sensor Diffusion Policy - Epoch 220},
132
+ year = {2025},
133
+ publisher = {HuggingFace},
134
+ howpublished = {\url{https://huggingface.co/calebescobedo/sensor-diffusion-policy-table-camera-epoch220}}
135
+ }
136
+ ```