# Sensor Diffusion Policy - Epoch 220 Diffusion policy model trained on proximity sensor data with table camera images. ## Model Details - **Model Type**: Diffusion Policy - **Training Epochs**: 220/300 - **Horizon**: 16 steps - **Observation Steps**: 1 - **Action Steps**: 8 ## Inputs The model expects the following inputs: ### 1. `observation.state` (STATE) - **Shape**: `(batch, 1, 7)` - **Description**: Joint positions for 7-DOF arm - **Normalization**: Min-max normalized using dataset statistics ### 2. `observation.goal` (STATE) - **Shape**: `(batch, 1, 3)` - **Description**: Goal cartesian position (X, Y, Z in meters) - **Normalization**: Min-max normalized using dataset statistics ### 3. `observation.images.table_camera` (VISUAL) - **Shape**: `(batch, 1, 3, 480, 640)` - **Description**: RGB images from table camera - **Normalization**: Mean-std normalized (normalized to [0, 1] then mean-std) ### 4. `observation.proximity` (STATE) - **Shape**: `(batch, 1, 128)` - **Description**: Encoded proximity sensor latent (37 sensors × 8×8 depth maps → 128-dim) - **Normalization**: Min-max normalized using dataset statistics - **Note**: Requires pretrained ProximityAutoencoder encoder ## Outputs ### `action` (ACTION) - **Shape**: `(batch, 7)` - **Description**: Joint positions (7-DOF) for the next timestep - **Type**: Joint positions (not velocities) - these are the next positions the robot should move to - **Normalization**: Output is unnormalized (raw joint positions in radians) ## Usage ```python from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy from lerobot.policies.factory import make_pre_post_processors import torch # Load model and processors repo_id = "calebescobedo/sensor-diffusion-policy-table-camera-epoch220" device = torch.device("cuda" if torch.cuda.is_available() else "cpu") policy = DiffusionPolicy.from_pretrained(repo_id) policy.eval() policy.to(device) preprocessor, postprocessor = make_pre_post_processors( policy_cfg=policy.config, pretrained_path=repo_id ) # Prepare input batch # Note: You need to encode proximity sensors using the ProximityAutoencoder first batch = { "observation.state": torch.tensor([...]), # Shape: (batch, 1, 7) "observation.goal": torch.tensor([...]), # Shape: (batch, 1, 3) "observation.images.table_camera": torch.tensor([...]), # Shape: (batch, 1, 3, 480, 640) "observation.proximity": torch.tensor([...]), # Shape: (batch, 1, 128) - encoded } # Predict action with torch.no_grad(): batch_processed = preprocessor(batch) # Normalizes inputs actions = policy.select_action(batch_processed) # Returns normalized actions actions = postprocessor(actions) # Unnormalizes to raw joint positions # actions shape: (batch, 7) - joint positions in radians ``` ## Proximity Sensor Encoding The proximity sensors must be encoded before use. You need to load the ProximityAutoencoder: ```python from architectures.proximity_autoencoder import ProximityAutoencoder import torch # Load proximity encoder encoder_path = "path/to/proximity_autoencoder.pth" ae_model = ProximityAutoencoder(num_sensors=37, depth_channels=1, latent_dim=128, use_attention=True) ae_model.load_state_dict(torch.load(encoder_path, map_location='cpu')) proximity_encoder = ae_model.encoder proximity_encoder.eval() # Encode proximity sensors (37 sensors × 8×8 depth maps) # raw_proximity shape: (batch, 37, 8, 8) encoded_proximity = proximity_encoder(raw_proximity) # Shape: (batch, 128) ``` ## Dataset Statistics Dataset statistics are included in `config.json` under the `dataset_stats` key. These are used for normalization/unnormalization and were computed from the training dataset: - `/home/caleb/datasets/sensor/roboset_20260117_014645/*.h5` (20 files, ~500 trajectories) ## Training Details - **Dataset**: Sensor dataset with proximity sensors and table camera - **Training Loss**: ~0.003-0.004 (at epoch 220) - **Optimizer**: Adam (LeRobot preset) - **Learning Rate**: From LeRobot optimizer preset - **Mixed Precision**: Enabled (AMP) - **Data Augmentation**: State noise (30% prob, scale=0.005), Action noise (30% prob, scale=0.0005), Goal noise (30% prob) ## Model Architecture - **Vision Backbone**: ResNet18 - **Diffusion Steps**: 100 - **Noise Scheduler**: DDPM with squaredcos_cap_v2 beta schedule - **Total Parameters**: ~261M ## Citation If you use this model, please cite: ```bibtex @misc{sensor-diffusion-policy-epoch220, author = {Caleb Escobedo}, title = {Sensor Diffusion Policy - Epoch 220}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/calebescobedo/sensor-diffusion-policy-table-camera-epoch220}} } ```