ACE-Brain commited on
Commit
e9036d3
·
verified ·
1 Parent(s): f756d24

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -12
README.md CHANGED
@@ -38,7 +38,6 @@ Extensive evaluation across **24** benchmarks demonstrates that ACE-Brain achiev
38
 
39
  ## Key Features
40
 
41
-
42
  - Unified multimodal foundation model for embodied intelligence
43
  - Strong spatial reasoning as a universal intelligence scaffold
44
  - Supports diverse embodiment platforms:
@@ -49,18 +48,8 @@ Extensive evaluation across **24** benchmarks demonstrates that ACE-Brain achiev
49
  - Cross-domain generalization across perception, reasoning, and planning
50
  - Evaluated on 24 real-world embodied intelligence benchmarks
51
 
52
- ## Core Capabilities
53
-
54
- <div align="center">
55
- <img src="./assets/fig2.png" width=800>
56
- </div>
57
-
58
  ## Performance Highlights
59
 
60
- <div align="center">
61
- <img src="./assets/radarchart.png" width=800>
62
- </div>
63
-
64
  ACE-Brain achieves strong performance across **24 benchmarks covering Spatial Intelligence, Embodied Interaction, Autonomous Driving, and Low-Altitude Sensing**, consistently outperforming existing open-source embodied VLMs and remaining competitive with closed-source models.
65
 
66
  The model shows robust capability in **spatial reasoning, physical interaction understanding, task-oriented decision-making, and dynamic scene interpretation**, enabling reliable performance across diverse real-world embodiment scenarios.
@@ -96,9 +85,54 @@ Despite its domain specialization, ACE-Brain maintains strong general multimodal
96
  <img src="./assets/table4.png" width=800>
97
  </div>
98
 
 
99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
 
101
- > **Bold** numbers indicate the best results, <u>underlined</u> numbers indicate the second-best results, and results marked with \* are obtained using our evaluation framework.
102
 
103
 
104
  ## Citation
 
38
 
39
  ## Key Features
40
 
 
41
  - Unified multimodal foundation model for embodied intelligence
42
  - Strong spatial reasoning as a universal intelligence scaffold
43
  - Supports diverse embodiment platforms:
 
48
  - Cross-domain generalization across perception, reasoning, and planning
49
  - Evaluated on 24 real-world embodied intelligence benchmarks
50
 
 
 
 
 
 
 
51
  ## Performance Highlights
52
 
 
 
 
 
53
  ACE-Brain achieves strong performance across **24 benchmarks covering Spatial Intelligence, Embodied Interaction, Autonomous Driving, and Low-Altitude Sensing**, consistently outperforming existing open-source embodied VLMs and remaining competitive with closed-source models.
54
 
55
  The model shows robust capability in **spatial reasoning, physical interaction understanding, task-oriented decision-making, and dynamic scene interpretation**, enabling reliable performance across diverse real-world embodiment scenarios.
 
85
  <img src="./assets/table4.png" width=800>
86
  </div>
87
 
88
+ > **Bold** numbers indicate the best results, <u>underlined</u> numbers indicate the second-best results, and results marked with \* are obtained using our evaluation framework.
89
 
90
+ ## Inference Example
91
+ ```
92
+ from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
93
+
94
+ # default: Load the model on the available device(s)
95
+ model = Qwen3VLForConditionalGeneration.from_pretrained(
96
+ "ACE-Brain/ACE-Brain-8B", dtype="auto", device_map="auto"
97
+ )
98
+
99
+ processor = AutoProcessor.from_pretrained("ACE-Brain/ACE-Brain-8B")
100
+
101
+ messages = [
102
+ {
103
+ "role": "user",
104
+ "content": [
105
+ {
106
+ "type": "image",
107
+ "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
108
+ },
109
+ {"type": "text", "text": "Describe this image."},
110
+ ],
111
+ }
112
+ ]
113
+
114
+ # Preparation for inference
115
+ inputs = processor.apply_chat_template(
116
+ messages,
117
+ tokenize=True,
118
+ add_generation_prompt=True,
119
+ return_dict=True,
120
+ return_tensors="pt"
121
+ )
122
+ inputs = inputs.to(model.device)
123
+
124
+ # Inference: Generation of the output
125
+ generated_ids = model.generate(**inputs, max_new_tokens=128)
126
+ generated_ids_trimmed = [
127
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
128
+ ]
129
+ output_text = processor.batch_decode(
130
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
131
+ )
132
+ print(output_text)
133
+
134
+ ```
135
 
 
136
 
137
 
138
  ## Citation