metadata
title: VLA-Arena
emoji: π€
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
π About VLA-Arena
VLA-Arena is an open-source benchmark designed for the systematic evaluation of Vision-Language-Action (VLA) models. It provides a complete and unified toolchain covering scene modeling, demonstration collection, model training, and evaluation.
Featuring 150+ tasks across 11 specialized suites, VLA-Arena assesses models through hierarchical difficulty levels (L0-L2) to ensure comprehensive metrics for safety, generalization, and efficiency.
ποΈ Key Evaluation Domains
VLA-Arena focuses on four critical dimensions to ensure robotic agents can operate effectively in the real world:
- π‘οΈ Safety: Evaluate the ability to operate reliably in the physical world while avoiding static/dynamic obstacles and hazards.
- π Distractor: Assess performance stability when facing environmental unpredictability and visual clutter.
- π― Extrapolation: Test the ability to generalize learned knowledge to novel situations, unseen objects, and new workflows.
- π Long Horizon: Challenge agents to combine long sequences of actions to achieve complex, multi-step goals.
π₯ Highlights
- End-to-End Toolchain: From scene construction to final evaluation metrics.
- Systematic Difficulty Scaling: Tasks range from basic object manipulation (L0) to complex, constraint-heavy scenarios (L2).
- Flexible Customization: Powered by CBDDL (Constrained Behavior Domain Definition Language) for easy task definition.
π Resources
- GitHub Repository: PKU-Alignment/VLA-Arena
- Documentation: Read the Docs
- License: Apache 2.0
Built with β€οΈ by the VLA-Arena Team