File size: 4,722 Bytes
2cbc7b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# Sensor Diffusion Policy - Epoch 220

Diffusion policy model trained on proximity sensor data with table camera images.

## Model Details

- **Model Type**: Diffusion Policy
- **Training Epochs**: 220/300
- **Horizon**: 16 steps
- **Observation Steps**: 1
- **Action Steps**: 8

## Inputs

The model expects the following inputs:

### 1. `observation.state` (STATE)
- **Shape**: `(batch, 1, 7)`
- **Description**: Joint positions for 7-DOF arm
- **Normalization**: Min-max normalized using dataset statistics

### 2. `observation.goal` (STATE)
- **Shape**: `(batch, 1, 3)`
- **Description**: Goal cartesian position (X, Y, Z in meters)
- **Normalization**: Min-max normalized using dataset statistics

### 3. `observation.images.table_camera` (VISUAL)
- **Shape**: `(batch, 1, 3, 480, 640)`
- **Description**: RGB images from table camera
- **Normalization**: Mean-std normalized (normalized to [0, 1] then mean-std)

### 4. `observation.proximity` (STATE)
- **Shape**: `(batch, 1, 128)`
- **Description**: Encoded proximity sensor latent (37 sensors × 8×8 depth maps → 128-dim)
- **Normalization**: Min-max normalized using dataset statistics
- **Note**: Requires pretrained ProximityAutoencoder encoder

## Outputs

### `action` (ACTION)
- **Shape**: `(batch, 7)`
- **Description**: Joint positions (7-DOF) for the next timestep
- **Type**: Joint positions (not velocities) - these are the next positions the robot should move to
- **Normalization**: Output is unnormalized (raw joint positions in radians)

## Usage

```python
from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
from lerobot.policies.factory import make_pre_post_processors
import torch

# Load model and processors
repo_id = "calebescobedo/sensor-diffusion-policy-table-camera-epoch220"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

policy = DiffusionPolicy.from_pretrained(repo_id)
policy.eval()
policy.to(device)

preprocessor, postprocessor = make_pre_post_processors(
    policy_cfg=policy.config,
    pretrained_path=repo_id
)

# Prepare input batch
# Note: You need to encode proximity sensors using the ProximityAutoencoder first
batch = {
    "observation.state": torch.tensor([...]),  # Shape: (batch, 1, 7)
    "observation.goal": torch.tensor([...]),    # Shape: (batch, 1, 3)
    "observation.images.table_camera": torch.tensor([...]),  # Shape: (batch, 1, 3, 480, 640)
    "observation.proximity": torch.tensor([...]),  # Shape: (batch, 1, 128) - encoded
}

# Predict action
with torch.no_grad():
    batch_processed = preprocessor(batch)  # Normalizes inputs
    actions = policy.select_action(batch_processed)  # Returns normalized actions
    actions = postprocessor(actions)  # Unnormalizes to raw joint positions

# actions shape: (batch, 7) - joint positions in radians
```

## Proximity Sensor Encoding

The proximity sensors must be encoded before use. You need to load the ProximityAutoencoder:

```python
from architectures.proximity_autoencoder import ProximityAutoencoder
import torch

# Load proximity encoder
encoder_path = "path/to/proximity_autoencoder.pth"
ae_model = ProximityAutoencoder(num_sensors=37, depth_channels=1, latent_dim=128, use_attention=True)
ae_model.load_state_dict(torch.load(encoder_path, map_location='cpu'))
proximity_encoder = ae_model.encoder
proximity_encoder.eval()

# Encode proximity sensors (37 sensors × 8×8 depth maps)
# raw_proximity shape: (batch, 37, 8, 8)
encoded_proximity = proximity_encoder(raw_proximity)  # Shape: (batch, 128)
```

## Dataset Statistics

Dataset statistics are included in `config.json` under the `dataset_stats` key. These are used for normalization/unnormalization and were computed from the training dataset:
- `/home/caleb/datasets/sensor/roboset_20260117_014645/*.h5` (20 files, ~500 trajectories)

## Training Details

- **Dataset**: Sensor dataset with proximity sensors and table camera
- **Training Loss**: ~0.003-0.004 (at epoch 220)
- **Optimizer**: Adam (LeRobot preset)
- **Learning Rate**: From LeRobot optimizer preset
- **Mixed Precision**: Enabled (AMP)
- **Data Augmentation**: State noise (30% prob, scale=0.005), Action noise (30% prob, scale=0.0005), Goal noise (30% prob)

## Model Architecture

- **Vision Backbone**: ResNet18
- **Diffusion Steps**: 100
- **Noise Scheduler**: DDPM with squaredcos_cap_v2 beta schedule
- **Total Parameters**: ~261M

## Citation

If you use this model, please cite:
```bibtex
@misc{sensor-diffusion-policy-epoch220,
  author = {Caleb Escobedo},
  title = {Sensor Diffusion Policy - Epoch 220},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/calebescobedo/sensor-diffusion-policy-table-camera-epoch220}}
}
```