Overview

ACE-Brain-0 is a generalist multimodal foundation model designed to unify perception, reasoning, and decision-making across diverse embodied domains, including spatial cognition, autonomous driving, low-altitude sensing and embodied interaction. Built upon a unified multimodal large language model (MLLM) architecture, ACE-Brain learns a shared spatial reasoning substrate that enables generalization across heterogeneous physical environments and agent embodiments.

Extensive evaluation across 24 benchmarks demonstrates that ACE-Brain achieves state-of-the-art or competitive performance across multiple domains, validating its effectiveness as a unified embodied intelligence model.

Key Features

  • Unified multimodal foundation model for embodied intelligence
  • Strong spatial reasoning as a universal intelligence scaffold
  • Supports diverse embodiment platforms:
    • Spatial Cognition
    • Autonomous Driving
    • Low-Altitude Sensing
    • Embodied Interaction
  • Cross-domain generalization across perception, reasoning, and planning

Performance Highlights

ACE-Brain achieves strong performance across 24 benchmarks covering Spatial Cognition, Autonomous Driving, Low-Altitude Sensing and Embodied Interaction, consistently outperforming existing open-source embodied VLMs and remaining competitive with closed-source models.

The model shows robust capability in spatial reasoning, physical interaction understanding, task-oriented decision-making, and dynamic scene interpretation, enabling reliable performance across diverse real-world embodiment scenarios.

In driving and aerial domains, ACE-Brain demonstrates excellent performance in environment understanding, motion reasoning, and planning-aware prediction, highlighting its effectiveness in complex, large-scale, and safety-critical environments.

Despite its domain specialization, ACE-Brain maintains strong general multimodal reasoning ability, confirming that spatial intelligence-based training enhances overall visual-language intelligence rather than limiting generalization.

Spatial Benchmarks

Autonomous Driving Benchmarks

Low-Altitude Benchmarks

Embodied Benchmarks

Bold numbers indicate the best results, underlined numbers indicate the second-best results, and results marked with * are obtained using our evaluation framework.

Inference Example

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor

# default: Load the model on the available device(s)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "ACE-Brain/ACE-Brain-0-8B", dtype="auto", device_map="auto"
)

processor = AutoProcessor.from_pretrained("ACE-Brain/ACE-Brain-0-8B")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = inputs.to(model.device)

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Citation

@misc{gong2026acebrain0spatialintelligenceshared,
      title={ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments}, 
      author={Ziyang Gong and Zehang Luo and Anke Tang and Zhe Liu and Shi Fu and Zhi Hou and Ganlin Yang and Weiyun Wang and Xiaofeng Wang and Jianbo Liu and Gen Luo and Haolan Kang and Shuang Luo and Yue Zhou and Yong Luo and Li Shen and Xiaosong Jia and Yao Mu and Xue Yang and Chunxiao Liu and Junchi Yan and Hengshuang Zhao and Dacheng Tao and Xiaogang Wang},
      year={2026},
      eprint={2603.03198},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.03198}, 
}
Downloads last month
2
Safetensors
Model size
9B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ACE-Brain/ACE-Brain-0-8B

Unable to build the model tree, the base model loops to the model itself. Learn more.

Paper for ACE-Brain/ACE-Brain-0-8B