Improve model card: Add pipeline tag, library name, paper link, and usage example
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,17 +1,22 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
<p align="center">
|
| 5 |
<img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/logo.jpg?raw=true" width="150" style="margin-bottom: 0.2;"/>
|
| 6 |
<p>
|
| 7 |
|
| 8 |
-
<h3 align="center"><a href="" style="color:#9C276A">
|
| 9 |
RynnEC: Bringing MLLMs into Embodied World</a></h3>
|
| 10 |
<h5 align="center"> If our project helps you, please give us a star β on <a href="https://github.com/alibaba-damo-academy/RynnEC">Github</a> to support us. ππ </h2>
|
| 11 |
|
|
|
|
|
|
|
| 12 |
|
| 13 |
## π° News
|
| 14 |
-
* **[2025.08.08]**
|
| 15 |
|
| 16 |
|
| 17 |
|
|
@@ -51,4 +56,67 @@ Benchmark comparison across object cognition and spatial cognition. With a highl
|
|
| 51 |
|
| 52 |
If you find RynnEC useful for your research and applications, please cite using this BibTeX:
|
| 53 |
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: video-text-to-text
|
| 4 |
+
library_name: transformers
|
| 5 |
---
|
| 6 |
+
|
| 7 |
<p align="center">
|
| 8 |
<img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/logo.jpg?raw=true" width="150" style="margin-bottom: 0.2;"/>
|
| 9 |
<p>
|
| 10 |
|
| 11 |
+
<h3 align="center"><a href="https://huggingface.co/papers/2508.14160" style="color:#9C276A">
|
| 12 |
RynnEC: Bringing MLLMs into Embodied World</a></h3>
|
| 13 |
<h5 align="center"> If our project helps you, please give us a star β on <a href="https://github.com/alibaba-damo-academy/RynnEC">Github</a> to support us. ππ </h2>
|
| 14 |
|
| 15 |
+
This repository contains the RynnEC model presented in the paper [RynnEC: Bringing MLLMs into Embodied World](https://huggingface.co/papers/2508.14160).
|
| 16 |
+
For more details, please visit the [project page](https://huggingface.co/spaces/Alibaba-DAMO-Academy/RynnEC) and the [GitHub repository](https://github.com/alibaba-damo-academy/RynnEC).
|
| 17 |
|
| 18 |
## π° News
|
| 19 |
+
* **[2025.08.08]** π₯π₯ Release our RynnEC-2B model, RynnEC-Bench and training code.
|
| 20 |
|
| 21 |
|
| 22 |
|
|
|
|
| 56 |
|
| 57 |
If you find RynnEC useful for your research and applications, please cite using this BibTeX:
|
| 58 |
|
| 59 |
+
```bibtex
|
| 60 |
+
@article{wu2025rynnec,
|
| 61 |
+
title={RynnEC: Bringing MLLMs into Embodied World},
|
| 62 |
+
author={Wu, Zhiyong and Wu, Zhenyu and Ma, Weichen and Zhou, Bo and Shen, Junnan and Wu, Lemeng and Huang, Qichen and Yu, Runhui and Liu, Qiming and Jiang, Zibo and Zhang, Hongyang},
|
| 63 |
+
journal={arXiv preprint arXiv:2508.14160},
|
| 64 |
+
year={2025}
|
| 65 |
+
}
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## Usage
|
| 69 |
+
|
| 70 |
+
We provide a simple generation process for using our model. For more details, you could refer to the [Github repository](https://github.com/alibaba-damo-academy/RynnEC).
|
| 71 |
+
|
| 72 |
+
```python
|
| 73 |
+
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
|
| 74 |
+
from qwen_vl_utils import process_vision_info
|
| 75 |
+
|
| 76 |
+
# Default: Load the model on the available device(s)
|
| 77 |
+
model = Qwen2VLForConditionalGeneration.from_pretrained(
|
| 78 |
+
"Alibaba-DAMO-Academy/RynnEC-2B", torch_dtype="auto", device_map="auto"
|
| 79 |
+
)
|
| 80 |
+
processor = AutoProcessor.from_pretrained("Alibaba-DAMO-Academy/RynnEC-2B")
|
| 81 |
+
|
| 82 |
+
messages = [
|
| 83 |
+
{
|
| 84 |
+
"role": "user",
|
| 85 |
+
"content": [
|
| 86 |
+
{
|
| 87 |
+
"type": "image",
|
| 88 |
+
"image": "./examples/images/web_6f93090a-81f6-489e-bb35-1a2838b18c01.png",
|
| 89 |
+
},
|
| 90 |
+
{"type": "text", "text": "In this UI screenshot, what is the position of the element corresponding to the command \"switch language of current page\" (with bbox)?"},
|
| 91 |
+
],
|
| 92 |
+
}
|
| 93 |
+
]
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
# Preparation for inference
|
| 97 |
+
text = processor.apply_chat_template(
|
| 98 |
+
messages, tokenize=False, add_generation_prompt=True
|
| 99 |
+
)
|
| 100 |
+
image_inputs, video_inputs = process_vision_info(messages)
|
| 101 |
+
inputs = processor(
|
| 102 |
+
text=[text],
|
| 103 |
+
images=image_inputs,
|
| 104 |
+
videos=video_inputs,
|
| 105 |
+
padding=True,
|
| 106 |
+
return_tensors="pt",
|
| 107 |
+
)
|
| 108 |
+
inputs = inputs.to("cuda")
|
| 109 |
+
|
| 110 |
+
# Inference: Generation of the output
|
| 111 |
+
generated_ids = model.generate(**inputs, max_new_tokens=128)
|
| 112 |
+
|
| 113 |
+
generated_ids_trimmed = [
|
| 114 |
+
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
|
| 115 |
+
]
|
| 116 |
+
|
| 117 |
+
output_text = processor.batch_decode(
|
| 118 |
+
generated_ids_trimmed, skip_special_tokens=False, clean_up_tokenization_spaces=False
|
| 119 |
+
)
|
| 120 |
+
print(output_text)
|
| 121 |
+
# <|object_ref_start|>language switch<|object_ref_end|><|box_start|>(576,12),(592,42)<|box_end|><|im_end|>
|
| 122 |
+
```
|