A unified evaluation framework that simplifies embodied AI benchmarking with clean interfaces, supporting 25+ benchmarks and diverse model backends.