Improve model card: Add pipeline tag, library name, paper link, and usage example
Browse filesThis PR significantly enhances the model card for `RynnEC-2B` by:
- Adding the `pipeline_tag: video-text-to-text` to improve discoverability on the Hub (e.g., via https://huggingface.co/models?pipeline_tag=video-text-to-text).
- Specifying `library_name: transformers` to correctly identify the library used for inference, enabling the "Use in Transformers" widget and clearer guidance.
- Updating the main title's link to point to the official Hugging Face paper page: [RynnEC: Bringing MLLMs into Embodied World](https://huggingface.co/papers/2508.14160).
- Adding explicit links to the paper, GitHub repository, and project page at the beginning of the model card.
- Completing the `Citation` section with the BibTeX entry from the official GitHub repository, correcting the year to 2025 based on the paper's arXiv ID.
- Including a `Usage` section with a practical Python code snippet to demonstrate how to load and use the model.
These improvements will make the model more discoverable, easier to understand, and more user-friendly on the Hugging Face Hub.
|
@@ -1,17 +1,22 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
<p align="center">
|
| 5 |
<img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/logo.jpg?raw=true" width="150" style="margin-bottom: 0.2;"/>
|
| 6 |
<p>
|
| 7 |
|
| 8 |
-
<h3 align="center"><a href="" style="color:#9C276A">
|
| 9 |
RynnEC: Bringing MLLMs into Embodied World</a></h3>
|
| 10 |
<h5 align="center"> If our project helps you, please give us a star β on <a href="https://github.com/alibaba-damo-academy/RynnEC">Github</a> to support us. ππ </h2>
|
| 11 |
|
|
|
|
|
|
|
| 12 |
|
| 13 |
## π° News
|
| 14 |
-
* **[2025.08.08]**
|
| 15 |
|
| 16 |
|
| 17 |
|
|
@@ -51,4 +56,67 @@ Benchmark comparison across object cognition and spatial cognition. With a highl
|
|
| 51 |
|
| 52 |
If you find RynnEC useful for your research and applications, please cite using this BibTeX:
|
| 53 |
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: video-text-to-text
|
| 4 |
+
library_name: transformers
|
| 5 |
---
|
| 6 |
+
|
| 7 |
<p align="center">
|
| 8 |
<img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/logo.jpg?raw=true" width="150" style="margin-bottom: 0.2;"/>
|
| 9 |
<p>
|
| 10 |
|
| 11 |
+
<h3 align="center"><a href="https://huggingface.co/papers/2508.14160" style="color:#9C276A">
|
| 12 |
RynnEC: Bringing MLLMs into Embodied World</a></h3>
|
| 13 |
<h5 align="center"> If our project helps you, please give us a star β on <a href="https://github.com/alibaba-damo-academy/RynnEC">Github</a> to support us. ππ </h2>
|
| 14 |
|
| 15 |
+
This repository contains the RynnEC model presented in the paper [RynnEC: Bringing MLLMs into Embodied World](https://huggingface.co/papers/2508.14160).
|
| 16 |
+
For more details, please visit the [project page](https://huggingface.co/spaces/Alibaba-DAMO-Academy/RynnEC) and the [GitHub repository](https://github.com/alibaba-damo-academy/RynnEC).
|
| 17 |
|
| 18 |
## π° News
|
| 19 |
+
* **[2025.08.08]** π₯π₯ Release our RynnEC-2B model, RynnEC-Bench and training code.
|
| 20 |
|
| 21 |
|
| 22 |
|
|
|
|
| 56 |
|
| 57 |
If you find RynnEC useful for your research and applications, please cite using this BibTeX:
|
| 58 |
|
| 59 |
+
```bibtex
|
| 60 |
+
@article{wu2025rynnec,
|
| 61 |
+
title={RynnEC: Bringing MLLMs into Embodied World},
|
| 62 |
+
author={Wu, Zhiyong and Wu, Zhenyu and Ma, Weichen and Zhou, Bo and Shen, Junnan and Wu, Lemeng and Huang, Qichen and Yu, Runhui and Liu, Qiming and Jiang, Zibo and Zhang, Hongyang},
|
| 63 |
+
journal={arXiv preprint arXiv:2508.14160},
|
| 64 |
+
year={2025}
|
| 65 |
+
}
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## Usage
|
| 69 |
+
|
| 70 |
+
We provide a simple generation process for using our model. For more details, you could refer to the [Github repository](https://github.com/alibaba-damo-academy/RynnEC).
|
| 71 |
+
|
| 72 |
+
```python
|
| 73 |
+
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
|
| 74 |
+
from qwen_vl_utils import process_vision_info
|
| 75 |
+
|
| 76 |
+
# Default: Load the model on the available device(s)
|
| 77 |
+
model = Qwen2VLForConditionalGeneration.from_pretrained(
|
| 78 |
+
"Alibaba-DAMO-Academy/RynnEC-2B", torch_dtype="auto", device_map="auto"
|
| 79 |
+
)
|
| 80 |
+
processor = AutoProcessor.from_pretrained("Alibaba-DAMO-Academy/RynnEC-2B")
|
| 81 |
+
|
| 82 |
+
messages = [
|
| 83 |
+
{
|
| 84 |
+
"role": "user",
|
| 85 |
+
"content": [
|
| 86 |
+
{
|
| 87 |
+
"type": "image",
|
| 88 |
+
"image": "./examples/images/web_6f93090a-81f6-489e-bb35-1a2838b18c01.png",
|
| 89 |
+
},
|
| 90 |
+
{"type": "text", "text": "In this UI screenshot, what is the position of the element corresponding to the command \"switch language of current page\" (with bbox)?"},
|
| 91 |
+
],
|
| 92 |
+
}
|
| 93 |
+
]
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
# Preparation for inference
|
| 97 |
+
text = processor.apply_chat_template(
|
| 98 |
+
messages, tokenize=False, add_generation_prompt=True
|
| 99 |
+
)
|
| 100 |
+
image_inputs, video_inputs = process_vision_info(messages)
|
| 101 |
+
inputs = processor(
|
| 102 |
+
text=[text],
|
| 103 |
+
images=image_inputs,
|
| 104 |
+
videos=video_inputs,
|
| 105 |
+
padding=True,
|
| 106 |
+
return_tensors="pt",
|
| 107 |
+
)
|
| 108 |
+
inputs = inputs.to("cuda")
|
| 109 |
+
|
| 110 |
+
# Inference: Generation of the output
|
| 111 |
+
generated_ids = model.generate(**inputs, max_new_tokens=128)
|
| 112 |
+
|
| 113 |
+
generated_ids_trimmed = [
|
| 114 |
+
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
|
| 115 |
+
]
|
| 116 |
+
|
| 117 |
+
output_text = processor.batch_decode(
|
| 118 |
+
generated_ids_trimmed, skip_special_tokens=False, clean_up_tokenization_spaces=False
|
| 119 |
+
)
|
| 120 |
+
print(output_text)
|
| 121 |
+
# <|object_ref_start|>language switch<|object_ref_end|><|box_start|>(576,12),(592,42)<|box_end|><|im_end|>
|
| 122 |
+
```
|