ML-GOD's picture
Update README.md
0c0239f verified
metadata
library_name: opentau
tags:
  - robotics
  - vla
  - pi05
  - robocasa
  - manipulation
  - flow-matching
  - pytorch
base_model: williamyue/pi05_base
license: apache-2.0
datasets:
  - robocasa/CloseMicrowave
  - robocasa/CloseFridge
  - robocasa/CloseCabinet
repo_url: https://github.com/TensorAuto/OpenTau

Robocasa_navigatekitchen

A pi0.5 (π₀.₅) Vision-Language-Action (VLA) model, finetuned on the ROBOCASA robotic manipulation/navigation benchmark using the OpenTau training framework. This model is designed to follow natural language instructions to perform manipulation/navigation tasks in a simulated kitchen environment.

For full documentation, evaluation results, and inference code, please visit the repository:
👉 https://github.com/TensorAuto/OpenTau


Model Details

Description

  • Model Type: Vision-Language-Action (VLA) Model
  • Base Architecture: π₀.₅ (pi0.5) by Physical Intelligence
  • Backbone: PaliGemma-3B (VLM) + Gemma-300M (Action Expert)
  • Training Data: Robocasa Benchmark
  • Framework: OpenTau

Architecture

The pi0.5 architecture uses a flow-matching-based policy designed for open-world generalization. It combines a Visual Language Model (VLM) for high-level semantic understanding with a smaller "action expert" model that generates continuous joint trajectories (10-step action chunks) via flow matching.


Training and Evaluation

Dataset

This model was finetuned on the Robocasa benchmark dataset. The Robocasa suite consists of human-teleoperated and mimicgen demonstrations for manipulation and navigation, covering:

  • CloseMicrowave (Atomic)
  • CloseFridge (Atomic)
  • CloseCabinet (Atomic)

Results

Training on 100 Human demonstrations, our model achieves 98% , 80% and 65% success rate on CloseMicrowave, Close Fridge and Close Cabinet tasks respectively. For detailed usage instructions, success rates, baseline comparisons, and evaluation protocols, please refer to the OpenTau GitHub Repository.