Hybrid ACT+Diffusion β€” ALOHA Single-Arm (Left) β€” 13.4k steps

Custom HybridACTDiffusion policy: ACT visual encoder (ResNet18 + 4-layer Transformer, mean-pooled) feeds a Diffusion U-Net decoder (FiLM conditioning, DDPM training, DDIM 10-step inference). No VAE β€” diffusion handles multimodal action distributions directly.

This is the initial 13.4k-step Hybrid baseline (S002). For the longer 40k retrain, see JHeisler/aloha_solo_left_act_diffusion_40k.

Architecture

Images (cam_high, cam_left_wrist) + State (dim=9)
     β”‚
     β–Ό
ACT Encoder (ResNet18 β†’ 4-layer Transformer) β†’ mean-pool β†’ (B, 512) global cond vector
     β”‚
     β–Ό
Diffusion U-Net (DiffusionConditionalUnet1d, FiLM modulation, down_dims=(256,512))
     β”‚  DDPM training / DDIM 10-step inference
     β–Ό
Action chunks (chunk_size=100, action_dim=9)

Training Config

Field Value
Architecture HybridACTDiffusion (ACT encoder + Diffusion U-Net) β€” see lerobot/common/policies/hybrid_act_diffusion/
Dataset JHeisler/aloha_solo_left_4_6_26 β€” 50 episodes, 29,785 samples, 30 fps
State / action dim 9 / 9
Cameras cam_high, cam_left_wrist (3Γ—480Γ—640 each)
Steps 13,400
Batch size 24 (DOE winner)
Learning rate 3e-5
Total samples seen 321K (10.6 epochs)
AMP enabled
torch.compile enabled
Diffusion scheduler DDPM training (100 timesteps, squaredcos_cap_v2), DDIM at inference (10 steps)
Final loss (DDPM noise-pred MSE) 0.011–0.020
Final grad norm 0.2–0.7
Wall clock ~1h 16min on RTX A4500
LeRobot pin 96c7052777aca85d4e55dfba8f81586103ba8f61 (with custom hybrid_act_diffusion policy added)

Project Lineage

Workstream Model Steps Samples HF
S001 ACT 13,400 640K act_left
S002 Hybrid ACT+Diffusion 13,400 321K this repo
S003 ACT (shipped) 40,000 1.92M act_left_40k
S004 Hybrid ACT+Diffusion 40,000 1.12M act_diffusion_40k

Notes on loss comparability

DDPM noise-prediction MSE (this model) and ACT's L1+KL combo (S001/S003) are different loss surfaces β€” absolute loss values are NOT directly comparable across architectures. The right comparison is offline action L1 on held-out episodes or real-robot rollout success rate.

Usage

The custom policy class lives in this project's LeRobot fork. To use:

# Requires lerobot pinned to 96c7052 with hybrid_act_diffusion policy package added
from lerobot.common.policies.hybrid_act_diffusion.modeling_hybrid_act_diffusion import HybridACTDiffusionPolicy
policy = HybridACTDiffusionPolicy.from_pretrained("JHeisler/aloha_solo_left_act_diffusion")

Citation / Course

EN.525.681 school project β€” JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.

Code reference: HuggingFace LeRobot at commit 96c7052 with custom hybrid policy package.

Downloads last month
66
Video Preview
loading

Dataset used to train JHeisler/aloha_solo_left_act_diffusion