Add model card with inputs, outputs, and usage instructions

2cbc7b0 verified 17 days ago

4.72 kB

	# Sensor Diffusion Policy - Epoch 220

	Diffusion policy model trained on proximity sensor data with table camera images.

	## Model Details

	- Model Type: Diffusion Policy
	- Training Epochs: 220/300
	- Horizon: 16 steps
	- Observation Steps: 1
	- Action Steps: 8

	## Inputs

	The model expects the following inputs:

	### 1. `observation.state` (STATE)
	- Shape: `(batch, 1, 7)`
	- Description: Joint positions for 7-DOF arm
	- Normalization: Min-max normalized using dataset statistics

	### 2. `observation.goal` (STATE)
	- Shape: `(batch, 1, 3)`
	- Description: Goal cartesian position (X, Y, Z in meters)
	- Normalization: Min-max normalized using dataset statistics

	### 3. `observation.images.table_camera` (VISUAL)
	- Shape: `(batch, 1, 3, 480, 640)`
	- Description: RGB images from table camera
	- Normalization: Mean-std normalized (normalized to [0, 1] then mean-std)

	### 4. `observation.proximity` (STATE)
	- Shape: `(batch, 1, 128)`
	- Description: Encoded proximity sensor latent (37 sensors × 8×8 depth maps → 128-dim)
	- Normalization: Min-max normalized using dataset statistics
	- Note: Requires pretrained ProximityAutoencoder encoder

	## Outputs

	### `action` (ACTION)
	- Shape: `(batch, 7)`
	- Description: Joint positions (7-DOF) for the next timestep
	- Type: Joint positions (not velocities) - these are the next positions the robot should move to
	- Normalization: Output is unnormalized (raw joint positions in radians)

	## Usage

	```python
	from lerobot.policies.diffusion.modeling_diffusion import DiffusionPolicy
	from lerobot.policies.factory import make_pre_post_processors
	import torch

	# Load model and processors
	repo_id = "calebescobedo/sensor-diffusion-policy-table-camera-epoch220"
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	policy = DiffusionPolicy.from_pretrained(repo_id)
	policy.eval()
	policy.to(device)

	preprocessor, postprocessor = make_pre_post_processors(
	policy_cfg=policy.config,
	pretrained_path=repo_id
	)

	# Prepare input batch
	# Note: You need to encode proximity sensors using the ProximityAutoencoder first
	batch = {
	"observation.state": torch.tensor([...]), # Shape: (batch, 1, 7)
	"observation.goal": torch.tensor([...]), # Shape: (batch, 1, 3)
	"observation.images.table_camera": torch.tensor([...]), # Shape: (batch, 1, 3, 480, 640)
	"observation.proximity": torch.tensor([...]), # Shape: (batch, 1, 128) - encoded
	}

	# Predict action
	with torch.no_grad():
	batch_processed = preprocessor(batch) # Normalizes inputs
	actions = policy.select_action(batch_processed) # Returns normalized actions
	actions = postprocessor(actions) # Unnormalizes to raw joint positions

	# actions shape: (batch, 7) - joint positions in radians
	```

	## Proximity Sensor Encoding

	The proximity sensors must be encoded before use. You need to load the ProximityAutoencoder:

	```python
	from architectures.proximity_autoencoder import ProximityAutoencoder
	import torch

	# Load proximity encoder
	encoder_path = "path/to/proximity_autoencoder.pth"
	ae_model = ProximityAutoencoder(num_sensors=37, depth_channels=1, latent_dim=128, use_attention=True)
	ae_model.load_state_dict(torch.load(encoder_path, map_location='cpu'))
	proximity_encoder = ae_model.encoder
	proximity_encoder.eval()

	# Encode proximity sensors (37 sensors × 8×8 depth maps)
	# raw_proximity shape: (batch, 37, 8, 8)
	encoded_proximity = proximity_encoder(raw_proximity) # Shape: (batch, 128)
	```

	## Dataset Statistics

	Dataset statistics are included in `config.json` under the `dataset_stats` key. These are used for normalization/unnormalization and were computed from the training dataset:
	- `/home/caleb/datasets/sensor/roboset_20260117_014645/*.h5` (20 files, ~500 trajectories)

	## Training Details

	- Dataset: Sensor dataset with proximity sensors and table camera
	- Training Loss: ~0.003-0.004 (at epoch 220)
	- Optimizer: Adam (LeRobot preset)
	- Learning Rate: From LeRobot optimizer preset
	- Mixed Precision: Enabled (AMP)
	- Data Augmentation: State noise (30% prob, scale=0.005), Action noise (30% prob, scale=0.0005), Goal noise (30% prob)

	## Model Architecture

	- Vision Backbone: ResNet18
	- Diffusion Steps: 100
	- Noise Scheduler: DDPM with squaredcos_cap_v2 beta schedule
	- Total Parameters: ~261M

	## Citation

	If you use this model, please cite:
	```bibtex
	@misc{sensor-diffusion-policy-epoch220,
	author = {Caleb Escobedo},
	title = {Sensor Diffusion Policy - Epoch 220},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/calebescobedo/sensor-diffusion-policy-table-camera-epoch220}}
	}
	```