| # Sensor Diffusion Policy - Epoch 220 | |
| Diffusion policy model trained on proximity sensor data with table camera images. | |
| ## Model Details | |
| - **Model Type**: Diffusion Policy | |
| - **Training Epochs**: 220/300 | |
| - **Horizon**: 16 steps | |
| - **Observation Steps**: 1 | |
| - **Action Steps**: 8 | |
| ## Inputs | |
| The model expects the following inputs: | |
| ### 1. `observation.state` (STATE) | |
| - **Shape**: `(batch, 1, 7)` | |
| - **Description**: Joint positions for 7-DOF arm | |
| - **Normalization**: Min-max normalized using dataset statistics | |
| ### 2. `observation.goal` (STATE) | |
| - **Shape**: `(batch, 1, 3)` | |
| - **Description**: Goal cartesian position (X, Y, Z in meters) | |
| - **Normalization**: Min-max normalized using dataset statistics | |
| ### 3. `observation.images.table_camera` (VISUAL) | |
| - **Shape**: `(batch, 1, 3, 480, 640)` | |
| - **Description**: RGB images from table camera | |
| - **Normalization**: Mean-std normalized (normalized to [0, 1] then mean-std) | |
| ### 4. `observation.proximity` (STATE) | |
| - **Shape**: `(batch, 1, 128)` | |
| - **Description**: Encoded proximity sensor latent (37 sensors × 8×8 depth maps → 128-dim) | |
| - **Normalization**: Min-max normalized using dataset statistics | |
| - **Note**: Requires pretrained ProximityAutoencoder encoder | |
| ## Outputs | |
| ### `action` (ACTION) | |
| - **Shape**: `(batch, 7)` | |
| - **Description**: Joint positions (7-DOF) for the next timestep | |
| - **Type**: Joint positions (not velocities) - these are the next positions the robot should move to | |
| - **Normalization**: Output is unnormalized (raw joint positions in radians) | |
| ## Usage | |
| ```python | |
| from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy | |
| from lerobot.policies.factory import make_pre_post_processors | |
| import torch | |
| # Load model and processors | |
| repo_id = "calebescobedo/sensor-diffusion-policy-table-camera-epoch220" | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| policy = DiffusionPolicy.from_pretrained(repo_id) | |
| policy.eval() | |
| policy.to(device) | |
| preprocessor, postprocessor = make_pre_post_processors( | |
| policy_cfg=policy.config, | |
| pretrained_path=repo_id | |
| ) | |
| # Prepare input batch | |
| # Note: You need to encode proximity sensors using the ProximityAutoencoder first | |
| batch = { | |
| "observation.state": torch.tensor([...]), # Shape: (batch, 1, 7) | |
| "observation.goal": torch.tensor([...]), # Shape: (batch, 1, 3) | |
| "observation.images.table_camera": torch.tensor([...]), # Shape: (batch, 1, 3, 480, 640) | |
| "observation.proximity": torch.tensor([...]), # Shape: (batch, 1, 128) - encoded | |
| } | |
| # Predict action | |
| with torch.no_grad(): | |
| batch_processed = preprocessor(batch) # Normalizes inputs | |
| actions = policy.select_action(batch_processed) # Returns normalized actions | |
| actions = postprocessor(actions) # Unnormalizes to raw joint positions | |
| # actions shape: (batch, 7) - joint positions in radians | |
| ``` | |
| ## Proximity Sensor Encoding | |
| The proximity sensors must be encoded before use. You need to load the ProximityAutoencoder: | |
| ```python | |
| from architectures.proximity_autoencoder import ProximityAutoencoder | |
| import torch | |
| # Load proximity encoder | |
| encoder_path = "path/to/proximity_autoencoder.pth" | |
| ae_model = ProximityAutoencoder(num_sensors=37, depth_channels=1, latent_dim=128, use_attention=True) | |
| ae_model.load_state_dict(torch.load(encoder_path, map_location='cpu')) | |
| proximity_encoder = ae_model.encoder | |
| proximity_encoder.eval() | |
| # Encode proximity sensors (37 sensors × 8×8 depth maps) | |
| # raw_proximity shape: (batch, 37, 8, 8) | |
| encoded_proximity = proximity_encoder(raw_proximity) # Shape: (batch, 128) | |
| ``` | |
| ## Dataset Statistics | |
| Dataset statistics are included in `config.json` under the `dataset_stats` key. These are used for normalization/unnormalization and were computed from the training dataset: | |
| - `/home/caleb/datasets/sensor/roboset_20260117_014645/*.h5` (20 files, ~500 trajectories) | |
| ## Training Details | |
| - **Dataset**: Sensor dataset with proximity sensors and table camera | |
| - **Training Loss**: ~0.003-0.004 (at epoch 220) | |
| - **Optimizer**: Adam (LeRobot preset) | |
| - **Learning Rate**: From LeRobot optimizer preset | |
| - **Mixed Precision**: Enabled (AMP) | |
| - **Data Augmentation**: State noise (30% prob, scale=0.005), Action noise (30% prob, scale=0.0005), Goal noise (30% prob) | |
| ## Model Architecture | |
| - **Vision Backbone**: ResNet18 | |
| - **Diffusion Steps**: 100 | |
| - **Noise Scheduler**: DDPM with squaredcos_cap_v2 beta schedule | |
| - **Total Parameters**: ~261M | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @misc{sensor-diffusion-policy-epoch220, | |
| author = {Caleb Escobedo}, | |
| title = {Sensor Diffusion Policy - Epoch 220}, | |
| year = {2025}, | |
| publisher = {HuggingFace}, | |
| howpublished = {\url{https://huggingface.co/calebescobedo/sensor-diffusion-policy-table-camera-epoch220}} | |
| } | |
| ``` | |