Add model card for Embodied-R1.5
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: robotics
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models
|
| 8 |
+
|
| 9 |
+
Embodied-R1.5 is a unified **Embodied Foundation Model (EFM)**, built on **Qwen3-VL-8B-Instruct**, that integrates comprehensive embodied reasoning within a single architecture. It unifies spatial cognition, task planning, and embodied pointing to enable general physical intelligence.
|
| 10 |
+
|
| 11 |
+
- **Paper:** [Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models](https://huggingface.co/papers/2606.11324)
|
| 12 |
+
- **Project Page:** [https://embodied-r.github.io/](https://embodied-r.github.io/)
|
| 13 |
+
- **Repository:** [https://github.com/pickxiguapi/Embodied-R1.5](https://github.com/pickxiguapi/Embodied-R1.5)
|
| 14 |
+
|
| 15 |
+
## Core Capabilities
|
| 16 |
+
- **Spatial cognition & reasoning:** Comprehend the semantic and spatial structure of the physical world.
|
| 17 |
+
- **Task planning & correction:** Organize execution logic while monitoring progress and correcting errors.
|
| 18 |
+
- **Embodied pointing & location:** Ground high-level reasoning in coordinates and trajectories.
|
| 19 |
+
- **Planner-Grounder-Corrector (PGC) Framework:** Enables the model to autonomously execute and self-correct over long-horizon tasks.
|
| 20 |
+
|
| 21 |
+
## Sample Usage
|
| 22 |
+
|
| 23 |
+
You can use the model for local inference. The following snippet requires the inference utility scripts provided in the [official repository](https://github.com/pickxiguapi/Embodied-R1.5).
|
| 24 |
+
|
| 25 |
+
```python
|
| 26 |
+
from inference.hf_example import HuggingFaceClient
|
| 27 |
+
|
| 28 |
+
client = HuggingFaceClient(model_path="IffYuan/Embodied-R1.5", device_map="auto", dtype="auto")
|
| 29 |
+
|
| 30 |
+
case = {
|
| 31 |
+
"prompt": "How many table lamps are in the image? Select from the following choices.
|
| 32 |
+
(A) 0
|
| 33 |
+
(B) 2
|
| 34 |
+
(C) 1
|
| 35 |
+
(D) 3",
|
| 36 |
+
"image": "test_assets/sample_2_image.png",
|
| 37 |
+
"type": "single_image",
|
| 38 |
+
}
|
| 39 |
+
result = client.inference(case, max_new_tokens=512)
|
| 40 |
+
print(result["generated_text"])
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
## Citation
|
| 44 |
+
|
| 45 |
+
If you find Embodied-R1.5 useful in your research, please cite:
|
| 46 |
+
|
| 47 |
+
```bibtex
|
| 48 |
+
@article{yuan2026embodiedr15,
|
| 49 |
+
title={Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models},
|
| 50 |
+
author={Yuan, Yifu and Huang, Yaoting and Yao, Xianze and Li, Yutong and Zhang, Shuoheng and Han, Linqi and Li, Pengyi and Sun, Jiangeng and Jia, Wenting and Zhang, Zhao and Liu, Yuhao and Liao, Ruihao and Hu, Yucheng and Wu, Qiyu and Li, Yuxiao and Dong, Zibin and Ni, Fei and Zheng, Yan and Gu, Shuyang and Ma, Yi and Tang, Hongyao and Hu, Han and Hao, Jianye},
|
| 51 |
+
journal={arXiv preprint},
|
| 52 |
+
year={2026}
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
@article{yuan2025embodied,
|
| 56 |
+
title={Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation},
|
| 57 |
+
author={Yuan, Yifu and Cui, Haiqin and Huang, Yaoting and Chen, Yibin and Ni, Fei offence, Dong, Zibin and Li, Pengyi and Zheng, Yan and Hao, Jianye},
|
| 58 |
+
journal={ICLR 2026},
|
| 59 |
+
year={2025}
|
| 60 |
+
}
|
| 61 |
+
```
|