File size: 3,168 Bytes

---
license: apache-2.0
---
<p align="center">
    <img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/logo.jpg?raw=true" width="150" style="margin-bottom: 0.2;"/>
<p>

<h3 align="center"><a href="" style="color:#9C276A">
RynnEC: Bringing MLLMs into Embodied World</a></h3>
<h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/alibaba-damo-academy/RynnEC">Github</a> to support us. 🙏🙏 </h2>


## 📰 News
* **[2025.08.17]**  🤗 RynnEC-7B model checkpoint has been released in Huggingface.
* **[2025.08.08]**  🔥🔥 Release our RynnEC-2B model, RynnEC-Bench and training code.



## 🌟 Introduction
**RynnEC** is a video multi-modal large language model (MLLM) specifically designed for embodied cognition
tasks. 

<p align="center">
    <img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/radar.png?raw=true" width="100%" style="margin-bottom: 0.2;"/>
<p>

## 📐Architecture
**RynnEC** can handle a variety of input types, including images, videos, visual prompts, and task instructions. Visual inputs are processed using a Vision Encoder equipped with an any-resolution strategy, while visual prompts are handled by a region encoder to extract fine-grained features. Textual inputs are seamlessly converted into a unified token stream through tokenization. For video segmentation tasks, a mask decoder is employed to transform the output segmentation embeddings into binary masks, ensuring precise and effective results.

<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67fcc97cede5c434e0cc37e3/FEdKco-A0nitu4drJZTDk.png" width="100%" style="margin-bottom: 0.2;"/>
<p>
  
## 🌎 Model Zoo

| Model                | Base Model   | HF Link                                                      |
| -------------------- | ------------ | ------------------------------------------------------------ |
| RynnEC-2B       | Qwen2.5-1.5B-Instruct   | [Alibaba-DAMO-Academy/RynnEC-2B](https://huggingface.co/Alibaba-DAMO-Academy/RynnEC-2B) |
| RynnEC-7B       | Qwen2.5-7B-Instruct   | [Alibaba-DAMO-Academy/RynnEC-7B](https://huggingface.co/Alibaba-DAMO-Academy/RynnEC-7B) |



## 📊 Main Results

Benchmark comparison across object cognition and spatial cognition. With a highly efficient **2B**-parameter architecture, **RynnEC-2B** achieves state-of-the-art (SOTA) performance on complex spatial cognition tasks.

<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67fcc97cede5c434e0cc37e3/XXmvypGmuiY9MJ6eYh9LL.png" width="100%" style="margin-bottom: 0.2;"/>
<p>
  

## 📑 Citation

If you find RynnEC useful for your research and applications, please cite using this BibTeX:
```bibtex
@misc{dang2025rynnecbringingmllmsembodied,
      title={RynnEC: Bringing MLLMs into Embodied World}, 
      author={Ronghao Dang and Yuqian Yuan and Yunxuan Mao and Kehan Li and Jiangpin Liu and Zhikai Wang and Xin Li and Fan Wang and Deli Zhao},
      year={2025},
      eprint={2508.14160},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.14160}, 
}
```