naungth
/

pi0_sim_bns

vision-language-action

bolt-nut-sorting

Model card Files Files and versions

pi0_sim_bns / README.md

naungth's picture

Upload Pi-0 bolt nut sort model (step 9999)

6f400c9 verified 8 months ago

|

history blame contribute delete

3.05 kB

	---
	license: mit
	tags:
	- robotics
	- pi-zero
	- diffusion
	- vision-language-action
	- aloha
	- manipulation
	- bolt-nut-sorting
	base_model: google/paligemma-3b-pt-224
	library_name: openpi
	pipeline_tag: robotics
	---

	# Pi-0 Bolt Nut Sort Model

	This is a Pi-0 (Pi-Zero) model trained for bolt and nut sorting tasks using the OpenPI framework.

	## Model Description

	- Architecture: Pi-0 (diffusion-based vision-language-action model)
	- Base Model: PaLiGemma 3B with SigLIP vision encoder
	- Task: Sorting bolts and nuts into separate baskets
	- Robot: Dual-arm ALOHA setup
	- Action Space: 14-DoF (7 per arm: 6 joints + 1 gripper)
	- Training Steps: 9,999
	- Action Horizon: 50 steps
	- Image Resolution: 224x224

	## Dataset

	Trained on the `naungth/pi0_bolt_nut_sort` dataset with the task instruction:
	"sort the bolts and the nuts into separate baskets"

	## Usage

	### With OpenPI

	```python
	from openpi.policies import policy_config
	from openpi.training import config

	# Load the model configuration
	config_name = "pi0_bns"
	train_config = config.get_config(config_name)

	# Create policy from your local checkpoint
	policy = policy_config.create_trained_policy(
	train_config,
	"path/to/checkpoint",
	default_prompt="sort the bolts and the nuts into separate baskets"
	)

	# Use for inference
	observation = {
	"images": {
	"cam_high": image_array, # [H, W, 3] uint8
	"cam_left_wrist": left_wrist_image, # [H, W, 3] uint8
	"cam_right_wrist": right_wrist_image, # [H, W, 3] uint8
	},
	"state": joint_positions, # [14] float32
	"prompt": "sort the bolts and the nuts into separate baskets"
	}

	actions = policy.infer(observation)["actions"] # [50, 14]
	```

	### With Policy Server

	```bash
	# Start the policy server
	uv run scripts/serve_policy.py policy:checkpoint \
	--policy.config=pi0_bns \
	--policy.dir=path/to/checkpoint

	# Use with client
	from openpi_client import websocket_client_policy
	client = websocket_client_policy.WebsocketClientPolicy("localhost", 8000)
	actions = client.infer(observation)
	```

	## Model Architecture

	- Vision Encoder: SigLIP-So400m/14
	- Language Model: Gemma 2B + Gemma 300M (action expert)
	- Training: Diffusion-based action prediction
	- Input: Multi-camera RGB + proprioception + language instruction
	- Output: Future action sequence (50 timesteps)

	## Training Details

	- Framework: JAX/Flax with OpenPI
	- Optimizer: AdamW
	- Base Checkpoint: Pi-0 base model from Google
	- Fine-tuning: Task-specific fine-tuning on bolt nut sort data
	- Normalization: Dataset-specific state/action normalization

	## License

	MIT License

	## Citation

	If you use this model, please cite:

	```bibtex
	@article{pi0,
	title={Pi-Zero: A Diffusion-Based Policy for Robot Manipulation},
	author={TODO: Add authors},
	year={2024}
	}
	```

	## Acknowledgments

	- Built using the [OpenPI](https://github.com/google-deepmind/openpi) framework
	- Based on the Pi-0 architecture
	- Training data from bolt nut sorting demonstrations