ACT-MODIFIED - MetaWorld MT-1 Shelf-Place
Model Description
This is a trained MODIFIED Action Chunking with Transformers (ACT) model for the MetaWorld MT-1 shelf-place-v3 task.
Architecture
Modified ACT uses images in both encoder and decoder (visual conditioning).
- Encoder: Takes image features + state (joints) + action history → latent distribution
- Decoder: Takes image features + state + latent sample → action chunk
- Advantage: Richer visual conditioning, more expressive latent space (25.43M parameters)
- Hypothesis: Should perform better with more training data
Training Details
- Task: MetaWorld MT-1 shelf-place-v3
- Single-task manipulation (place puck on shelf)
- Varying object positions (randomized)
- Observations:
- State: 39-dimensional (joint positions, velocities, gripper info)
- Images: 480×480 RGB (downsampled to 64×64 for processing)
- Action Space: 4D continuous [Δx, Δy, Δz, gripper]
- Training:
- Demonstrations: 10 expert episodes (100% success)
- Training samples: 4,500
- Epochs: 50
- Batch size: 8
- Learning rate: 1e-4
- Chunk size: 100 steps
Performance
- Success Rate: 0% (limited training data)
- Status: Converged, ready for evaluation with more data
Usage
Installation
# Clone repo and install
git clone https://huggingface.co/aryannzzz/act-metaworld-shelf-modified
pip install torch torchvision
Loading the Model
import torch
from pathlib import Path
# Load checkpoint
device = 'cuda' if torch.cuda.is_available() else 'cpu'
checkpoint = torch.load('model_modified.pt', map_location=device)
# Model config is in checkpoint['config']
model_config = checkpoint['config']
print("Model configuration:", model_config)
# The checkpoint contains:
# - model_state_dict: Model weights
# - config: Model architecture config
# - training_config: Training hyperparameters
Model Architecture Details
Configuration
{
"dataset": {
"batch_size": 8,
"num_workers": 2,
"val_split": 0.2
},
"model": {
"joint_dim": 39,
"action_dim": 4,
"hidden_dim": 256,
"latent_dim": 32,
"n_encoder_layers": 4,
"n_decoder_layers": 4,
"n_heads": 8,
"feedforward_dim": 1024,
"dropout": 0.1
},
"chunking": {
"chunk_size": 50,
"temporal_ensemble_weight": 0.01
},
"training": {
"epochs": 50,
"learning_rate": 0.0001,
"weight_decay": 0.0001,
"kl_weight": 10.0,
"grad_clip": 1.0
},
"env": {
"task": "shelf-place-v3",
"image_size": [
480,
480
],
"action_space": 4,
"state_space": 39
},
"logging": {
"use_wandb": false,
"log_every": 10,
"save_every": 10
}
}
Citation
If you use this model, please cite:
@article{zhao2023learning,
title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
author={Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
journal={arXiv preprint arXiv:2304.13705},
year={2023}
}
License
Apache License 2.0
Uploaded: 2025-12-11 22:12:29
Variant: modified
Repository: https://huggingface.co/aryannzzz/act-metaworld-shelf-modified
- Downloads last month
- -