metadata
tags:
- robotics
- anima
- haptos
- tactile
- robot-flow-labs
- masked-autoencoder
- force-prediction
library_name: pytorch
pipeline_tag: robotics
license: apache-2.0
HAPTOS β General Tactile Representation Learning
Part of the ANIMA Perception Suite by Robot Flow Labs.
Paper
AnyTouch 2: General Optical Tactile Representation Learning for Dynamic Tactile Perception arXiv:2602.09617 GeWu-Lab
Architecture
ViT-Base Masked Autoencoder (MAE) for tactile image representation learning with:
- Encoder: 12 layers, 12 heads, dim 768 (106.4M params total)
- Decoder: 6 layers, 8 heads, dim 512 (paper-matched)
- Force head: MLP from CLS token -> 3D force vector (fx, fy, fz)
- Mask ratio: 75%
- Input: 224x224 tactile images
- Loss: MSE reconstruction + frame-diff recon + L1 force + delta-force + cross-sensor matching + action matching
Exported Formats
| Format | File | Size | Use Case |
|---|---|---|---|
| PyTorch (.pth) | pytorch/haptos_v1.pth |
376 MB | Training, fine-tuning |
| SafeTensors | pytorch/haptos_v1.safetensors |
376 MB | Fast loading, safe |
| ONNX | onnx/haptos_v1.onnx |
345 MB | Cross-platform inference |
| TensorRT FP16 | tensorrt/haptos_v1_fp16.trt |
175 MB | Edge deployment (Jetson/L4) |
| TensorRT FP32 | tensorrt/haptos_v1_fp32.trt |
345 MB | Full precision inference |
| Checkpoint | checkpoints/best.pth |
1.1 GB | Resume training (optimizer + scheduler state) |
Training Details
| Setting | Value |
|---|---|
| Hardware | 8x NVIDIA L4 (23.7 GB each) |
| VRAM Usage | 19.0 GB / 23.7 GB (80%) per GPU |
| Effective Batch | 192 (24/GPU x 8 GPUs) |
| Optimizer | AdamW (betas=0.9, 0.95) |
| Learning Rate | 3e-4 |
| LR Schedule | Warmup + Cosine Annealing with Warm Restarts (T0=28, T_mult=2) |
| Precision | bf16 mixed precision |
| Epochs | 40 |
| Best Val Loss | 0.0836 (epoch 52) |
| Test Loss | 0.0825 |
| Test Recon Loss | 0.0090 |
| Test Force Loss (L1) | 0.7347 |
Usage
import torch
from safetensors.torch import load_file
# Load weights
state_dict = load_file("pytorch/haptos_v1.safetensors")
# Build model
from anima_haptos.models.mae_cuda import TactileMAECuda
model = TactileMAECuda(
img_size=224, patch_size=16, embed_dim=768,
encoder_depth=12, num_heads=12,
decoder_dim=384, decoder_depth=4, decoder_heads=6,
mask_ratio=0.75, force_head=True, force_dim=3,
)
model.load_state_dict(state_dict)
model.eval()
# Extract features
img = torch.randn(1, 3, 224, 224)
features = model.get_encoder_features(img) # [1, 768]
force = model.force_head(features) # [1, 3] (fx, fy, fz)
Capabilities
- Pixel-level: Masked reconstruction of tactile images
- Physical-level: 3D contact force estimation (fx, fy, fz) with L1 supervision
- Multi-sensor: Works across GelSight, DIGIT, DuraGel, Tac3D
- Temporal: Processes tactile frame sequences
Checkpoint Contents
best.pth includes full state for resume:
model_state_dict,optimizer_state_dict,scheduler_state_dictearly_stopping_state_dict,scaler_state_dictepoch,global_step,val_loss
Files
βββ README.md
βββ paper.pdf
βββ pytorch/
β βββ haptos_v1.pth
β βββ haptos_v1.safetensors
βββ onnx/
β βββ haptos_v1.onnx
βββ tensorrt/
β βββ haptos_v1_fp16.trt
β βββ haptos_v1_fp32.trt
βββ checkpoints/
β βββ best.pth
βββ configs/
β βββ training.yaml
βββ logs/
βββ training_history.json
License
Apache 2.0 β Robot Flow Labs / AIFLOW LABS LIMITED