moka_pot_RECAP_R1
A pi0 (π₀) RECAP Vision-Language-Action (VLA) model, finetuned on the LIBERO robotic manipulation benchmark using the OpenTau training framework. This model is designed to follow natural language instructions to perform manipulation tasks in a simulated tabletop environment. Achieves ~90% success rate measured over 320 episodes.
For full documentation, evaluation results, and inference code, please visit the repository:
👉 https://github.com/TensorAuto/OpenTau
Model Details
Description
- Model Type: Vision-Language-Action (VLA) Model
- Base Architecture: π₀ (pi0) by Physical Intelligence
- Backbone: PaliGemma-3B (VLM) + Gemma-300M (Action Expert) + RL indicator
- Training Data: Moka Pot Task on LIBERO (Lifelong Robot Learning) Benchmark
- Framework: OpenTau
Architecture
The PI0 RECAP architecture uses a flow-matching and Reinforcement Learning policy designed for open-world generalization. It combines a Visual Language Model (VLM) for high-level semantic understanding with a smaller "action expert" model that generates continuous joint trajectories (10-step action chunks) via flow matching. It uses RL to learn from good and bad episodes
Training and Evaluation
The Advantage Indicator (It) was set to True for only 10% of datapoints.
Dataset
This model was finetuned on the Moka Pot task in LIBERO 10 benchmark dataset and autonomous rollouts. It consists of around 29 expert teleoperated episodes and 212 autonomous rollouts under moka_pot_libero_sft policy and 320 autonomous rollouts under moka_pot_RECAP_R0 policy.
Results
For detailed usage instructions, success rates, baseline comparisons, and evaluation protocols, please refer to the OpenTau GitHub Repository. Achieves ~90% success rate measured over 320 episodes.
- Downloads last month
- 7