File size: 2,170 Bytes
49e27de
829f40c
 
 
 
49e27de
 
 
 
acbcc8a
829f40c
 
 
acbcc8a
829f40c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
title: VLA-Arena
emoji: πŸ€–
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
---

<!-- <div align="center">
  <img src="https://raw.githubusercontent.com/PKU-Alignment/VLA-Arena/main/image/structure.png" width="100%" alt="VLA-Arena Structure"/>
  <h1>VLA-Arena</h1>
  <h3>A Comprehensive Benchmark for Vision-Language-Action Models</h3>
</div> -->

<br>

## πŸ“– About VLA-Arena

**VLA-Arena** is an open-source benchmark designed for the systematic evaluation of Vision-Language-Action (VLA) models. It provides a complete and unified toolchain covering scene modeling, demonstration collection, model training, and evaluation.

Featuring **150+ tasks** across **11 specialized suites**, VLA-Arena assesses models through hierarchical difficulty levels (L0-L2) to ensure comprehensive metrics for safety, generalization, and efficiency.

## πŸ—οΈ Key Evaluation Domains

VLA-Arena focuses on four critical dimensions to ensure robotic agents can operate effectively in the real world:

- **πŸ›‘οΈ Safety**: Evaluate the ability to operate reliably in the physical world while avoiding static/dynamic obstacles and hazards.
- **πŸ”„ Distractor**: Assess performance stability when facing environmental unpredictability and visual clutter.
- **🎯 Extrapolation**: Test the ability to generalize learned knowledge to novel situations, unseen objects, and new workflows.
- **πŸ“ˆ Long Horizon**: Challenge agents to combine long sequences of actions to achieve complex, multi-step goals.

## πŸ”₯ Highlights

- **End-to-End Toolchain**: From scene construction to final evaluation metrics.
- **Systematic Difficulty Scaling**: Tasks range from basic object manipulation (L0) to complex, constraint-heavy scenarios (L2).
- **Flexible Customization**: Powered by CBDDL (Constrained Behavior Domain Definition Language) for easy task definition.

## πŸ”— Resources

* **GitHub Repository**: [PKU-Alignment/VLA-Arena](https://github.com/PKU-Alignment/VLA-Arena)
* **Documentation**: [Read the Docs](https://github.com/PKU-Alignment/VLA-Arena/tree/main/docs)
* **License**: Apache 2.0

---
<div align="center">
  Built with ❀️ by the VLA-Arena Team
</div>