Robotics
Safetensors
PyTorch
opentau
vla
pi05
robocasa
manipulation
flow-matching

Robocasa_navigatekitchen

A pi0.5 (ฯ€โ‚€.โ‚…) Vision-Language-Action (VLA) model, finetuned on the ROBOCASA robotic manipulation/navigation benchmark using the OpenTau training framework. This model is designed to follow natural language instructions to perform navigation tasks in a simulated kitchen environment.

For full documentation, evaluation results, and inference code, please visit the repository:
๐Ÿ‘‰ https://github.com/TensorAuto/OpenTau


Model Details

Description

  • Model Type: Vision-Language-Action (VLA) Model
  • Base Architecture: ฯ€โ‚€.โ‚… (pi0.5) by Physical Intelligence
  • Backbone: PaliGemma-3B (VLM) + Gemma-300M (Action Expert)
  • Training Data: Robocasa Benchmark
  • Framework: OpenTau

Architecture

The pi0.5 architecture uses a flow-matching-based policy designed for open-world generalization. It combines a Visual Language Model (VLM) for high-level semantic understanding with a smaller "action expert" model that generates continuous joint trajectories (10-step action chunks) via flow matching.


Training and Evaluation

Dataset

This model was finetuned on the Robocasa benchmark dataset. The Robocasa suite consists of human-teleoperated and mimicgen demonstrations for manipulation and navigation, covering:

  • CloseToasterOvenDoor (Atomic)
  • CloseDishwasher (Atomic)
  • CloseOven (Atomic)

Results

Training on 100 Human demonstrations, our model achieves 70% , 90% and 90% success rate on CloseToasterOvenDoor, Close Dishwasher and Close Oven tasks respectively. For detailed usage instructions, success rates, baseline comparisons, and evaluation protocols, please refer to the OpenTau GitHub Repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
4B params
Tensor type
F32
ยท
BF16
ยท
Video Preview
loading