nielsr HF Staff commited on
Commit
9159ffb
Β·
verified Β·
1 Parent(s): a9eeb69

Improve model card: Add pipeline tag, library name, paper link, and usage example

Browse files

This PR significantly enhances the model card for `RynnEC-2B` by:

- Adding the `pipeline_tag: video-text-to-text` to improve discoverability on the Hub (e.g., via https://huggingface.co/models?pipeline_tag=video-text-to-text).
- Specifying `library_name: transformers` to correctly identify the library used for inference, enabling the "Use in Transformers" widget and clearer guidance.
- Updating the main title's link to point to the official Hugging Face paper page: [RynnEC: Bringing MLLMs into Embodied World](https://huggingface.co/papers/2508.14160).
- Adding explicit links to the paper, GitHub repository, and project page at the beginning of the model card.
- Completing the `Citation` section with the BibTeX entry from the official GitHub repository, correcting the year to 2025 based on the paper's arXiv ID.
- Including a `Usage` section with a practical Python code snippet to demonstrate how to load and use the model.

These improvements will make the model more discoverable, easier to understand, and more user-friendly on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +71 -3
README.md CHANGED
@@ -1,17 +1,22 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
4
  <p align="center">
5
  <img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/logo.jpg?raw=true" width="150" style="margin-bottom: 0.2;"/>
6
  <p>
7
 
8
- <h3 align="center"><a href="" style="color:#9C276A">
9
  RynnEC: Bringing MLLMs into Embodied World</a></h3>
10
  <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/alibaba-damo-academy/RynnEC">Github</a> to support us. πŸ™πŸ™ </h2>
11
 
 
 
12
 
13
  ## πŸ“° News
14
- * **[2025.08.08]** πŸ”₯πŸ”₯ Release our RynnEC-2B model, RynnEC-Bench and training code.
15
 
16
 
17
 
@@ -51,4 +56,67 @@ Benchmark comparison across object cognition and spatial cognition. With a highl
51
 
52
  If you find RynnEC useful for your research and applications, please cite using this BibTeX:
53
 
54
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: video-text-to-text
4
+ library_name: transformers
5
  ---
6
+
7
  <p align="center">
8
  <img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/logo.jpg?raw=true" width="150" style="margin-bottom: 0.2;"/>
9
  <p>
10
 
11
+ <h3 align="center"><a href="https://huggingface.co/papers/2508.14160" style="color:#9C276A">
12
  RynnEC: Bringing MLLMs into Embodied World</a></h3>
13
  <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/alibaba-damo-academy/RynnEC">Github</a> to support us. πŸ™πŸ™ </h2>
14
 
15
+ This repository contains the RynnEC model presented in the paper [RynnEC: Bringing MLLMs into Embodied World](https://huggingface.co/papers/2508.14160).
16
+ For more details, please visit the [project page](https://huggingface.co/spaces/Alibaba-DAMO-Academy/RynnEC) and the [GitHub repository](https://github.com/alibaba-damo-academy/RynnEC).
17
 
18
  ## πŸ“° News
19
+ * **[2025.08.08]** πŸ”₯πŸ”₯ Release our RynnEC-2B model, RynnEC-Bench and training code.
20
 
21
 
22
 
 
56
 
57
  If you find RynnEC useful for your research and applications, please cite using this BibTeX:
58
 
59
+ ```bibtex
60
+ @article{wu2025rynnec,
61
+ title={RynnEC: Bringing MLLMs into Embodied World},
62
+ author={Wu, Zhiyong and Wu, Zhenyu and Ma, Weichen and Zhou, Bo and Shen, Junnan and Wu, Lemeng and Huang, Qichen and Yu, Runhui and Liu, Qiming and Jiang, Zibo and Zhang, Hongyang},
63
+ journal={arXiv preprint arXiv:2508.14160},
64
+ year={2025}
65
+ }
66
+ ```
67
+
68
+ ## Usage
69
+
70
+ We provide a simple generation process for using our model. For more details, you could refer to the [Github repository](https://github.com/alibaba-damo-academy/RynnEC).
71
+
72
+ ```python
73
+ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
74
+ from qwen_vl_utils import process_vision_info
75
+
76
+ # Default: Load the model on the available device(s)
77
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
78
+ "Alibaba-DAMO-Academy/RynnEC-2B", torch_dtype="auto", device_map="auto"
79
+ )
80
+ processor = AutoProcessor.from_pretrained("Alibaba-DAMO-Academy/RynnEC-2B")
81
+
82
+ messages = [
83
+ {
84
+ "role": "user",
85
+ "content": [
86
+ {
87
+ "type": "image",
88
+ "image": "./examples/images/web_6f93090a-81f6-489e-bb35-1a2838b18c01.png",
89
+ },
90
+ {"type": "text", "text": "In this UI screenshot, what is the position of the element corresponding to the command \"switch language of current page\" (with bbox)?"},
91
+ ],
92
+ }
93
+ ]
94
+
95
+
96
+ # Preparation for inference
97
+ text = processor.apply_chat_template(
98
+ messages, tokenize=False, add_generation_prompt=True
99
+ )
100
+ image_inputs, video_inputs = process_vision_info(messages)
101
+ inputs = processor(
102
+ text=[text],
103
+ images=image_inputs,
104
+ videos=video_inputs,
105
+ padding=True,
106
+ return_tensors="pt",
107
+ )
108
+ inputs = inputs.to("cuda")
109
+
110
+ # Inference: Generation of the output
111
+ generated_ids = model.generate(**inputs, max_new_tokens=128)
112
+
113
+ generated_ids_trimmed = [
114
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
115
+ ]
116
+
117
+ output_text = processor.batch_decode(
118
+ generated_ids_trimmed, skip_special_tokens=False, clean_up_tokenization_spaces=False
119
+ )
120
+ print(output_text)
121
+ # <|object_ref_start|>language switch<|object_ref_end|><|box_start|>(576,12),(592,42)<|box_end|><|im_end|>
122
+ ```