ACE-Brain
/

ACE-Brain-0-8B

@@ -38,7 +38,6 @@ Extensive evaluation across **24** benchmarks demonstrates that ACE-Brain achiev
 ## Key Features
 - Unified multimodal foundation model for embodied intelligence
 - Strong spatial reasoning as a universal intelligence scaffold
 - Supports diverse embodiment platforms:
@@ -49,18 +48,8 @@ Extensive evaluation across **24** benchmarks demonstrates that ACE-Brain achiev
 - Cross-domain generalization across perception, reasoning, and planning
 - Evaluated on 24 real-world embodied intelligence benchmarks
-## Core Capabilities
-<div align="center">
-  <img src="./assets/fig2.png" width=800>
-</div>
 ## Performance Highlights
-<div align="center">
-  <img src="./assets/radarchart.png" width=800>
-</div>
 ACE-Brain achieves strong performance across **24 benchmarks covering Spatial Intelligence, Embodied Interaction, Autonomous Driving, and Low-Altitude Sensing**, consistently outperforming existing open-source embodied VLMs and remaining competitive with closed-source models.
 The model shows robust capability in **spatial reasoning, physical interaction understanding, task-oriented decision-making, and dynamic scene interpretation**, enabling reliable performance across diverse real-world embodiment scenarios.
@@ -96,9 +85,54 @@ Despite its domain specialization, ACE-Brain maintains strong general multimodal
   <img src="./assets/table4.png" width=800>
 </div>
-> **Bold** numbers indicate the best results, <u>underlined</u> numbers indicate the second-best results, and results marked with \* are obtained using our evaluation framework.
 ## Citation

 ## Key Features
 - Unified multimodal foundation model for embodied intelligence
 - Strong spatial reasoning as a universal intelligence scaffold
 - Supports diverse embodiment platforms:
 - Cross-domain generalization across perception, reasoning, and planning
 - Evaluated on 24 real-world embodied intelligence benchmarks
 ## Performance Highlights
 ACE-Brain achieves strong performance across **24 benchmarks covering Spatial Intelligence, Embodied Interaction, Autonomous Driving, and Low-Altitude Sensing**, consistently outperforming existing open-source embodied VLMs and remaining competitive with closed-source models.
 The model shows robust capability in **spatial reasoning, physical interaction understanding, task-oriented decision-making, and dynamic scene interpretation**, enabling reliable performance across diverse real-world embodiment scenarios.
   <img src="./assets/table4.png" width=800>
 </div>
+> **Bold** numbers indicate the best results, <u>underlined</u> numbers indicate the second-best results, and results marked with \* are obtained using our evaluation framework.
+## Inference Example
+```
+from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
+# default: Load the model on the available device(s)
+model = Qwen3VLForConditionalGeneration.from_pretrained(
+    "ACE-Brain/ACE-Brain-8B", dtype="auto", device_map="auto"
+)
+processor = AutoProcessor.from_pretrained("ACE-Brain/ACE-Brain-8B")
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "image",
+                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
+            },
+            {"type": "text", "text": "Describe this image."},
+        ],
+    }
+]
+# Preparation for inference
+inputs = processor.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_dict=True,
+    return_tensors="pt"
+)
+inputs = inputs.to(model.device)
+# Inference: Generation of the output
+generated_ids = model.generate(**inputs, max_new_tokens=128)
+generated_ids_trimmed = [
+    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+output_text = processor.batch_decode(
+    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
+)
+print(output_text)
+```
 ## Citation