Improve model card: Add pipeline tag, library name, paper link, and usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +71 -3
README.md CHANGED
@@ -1,17 +1,22 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
4
  <p align="center">
5
  <img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/logo.jpg?raw=true" width="150" style="margin-bottom: 0.2;"/>
6
  <p>
7
 
8
- <h3 align="center"><a href="" style="color:#9C276A">
9
  RynnEC: Bringing MLLMs into Embodied World</a></h3>
10
  <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/alibaba-damo-academy/RynnEC">Github</a> to support us. πŸ™πŸ™ </h2>
11
 
 
 
12
 
13
  ## πŸ“° News
14
- * **[2025.08.08]** πŸ”₯πŸ”₯ Release our RynnEC-2B model, RynnEC-Bench and training code.
15
 
16
 
17
 
@@ -51,4 +56,67 @@ Benchmark comparison across object cognition and spatial cognition. With a highl
51
 
52
  If you find RynnEC useful for your research and applications, please cite using this BibTeX:
53
 
54
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: video-text-to-text
4
+ library_name: transformers
5
  ---
6
+
7
  <p align="center">
8
  <img src="https://github.com/alibaba-damo-academy/RynnEC/blob/main/assets/logo.jpg?raw=true" width="150" style="margin-bottom: 0.2;"/>
9
  <p>
10
 
11
+ <h3 align="center"><a href="https://huggingface.co/papers/2508.14160" style="color:#9C276A">
12
  RynnEC: Bringing MLLMs into Embodied World</a></h3>
13
  <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/alibaba-damo-academy/RynnEC">Github</a> to support us. πŸ™πŸ™ </h2>
14
 
15
+ This repository contains the RynnEC model presented in the paper [RynnEC: Bringing MLLMs into Embodied World](https://huggingface.co/papers/2508.14160).
16
+ For more details, please visit the [project page](https://huggingface.co/spaces/Alibaba-DAMO-Academy/RynnEC) and the [GitHub repository](https://github.com/alibaba-damo-academy/RynnEC).
17
 
18
  ## πŸ“° News
19
+ * **[2025.08.08]** πŸ”₯πŸ”₯ Release our RynnEC-2B model, RynnEC-Bench and training code.
20
 
21
 
22
 
 
56
 
57
  If you find RynnEC useful for your research and applications, please cite using this BibTeX:
58
 
59
+ ```bibtex
60
+ @article{wu2025rynnec,
61
+ title={RynnEC: Bringing MLLMs into Embodied World},
62
+ author={Wu, Zhiyong and Wu, Zhenyu and Ma, Weichen and Zhou, Bo and Shen, Junnan and Wu, Lemeng and Huang, Qichen and Yu, Runhui and Liu, Qiming and Jiang, Zibo and Zhang, Hongyang},
63
+ journal={arXiv preprint arXiv:2508.14160},
64
+ year={2025}
65
+ }
66
+ ```
67
+
68
+ ## Usage
69
+
70
+ We provide a simple generation process for using our model. For more details, you could refer to the [Github repository](https://github.com/alibaba-damo-academy/RynnEC).
71
+
72
+ ```python
73
+ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
74
+ from qwen_vl_utils import process_vision_info
75
+
76
+ # Default: Load the model on the available device(s)
77
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
78
+ "Alibaba-DAMO-Academy/RynnEC-2B", torch_dtype="auto", device_map="auto"
79
+ )
80
+ processor = AutoProcessor.from_pretrained("Alibaba-DAMO-Academy/RynnEC-2B")
81
+
82
+ messages = [
83
+ {
84
+ "role": "user",
85
+ "content": [
86
+ {
87
+ "type": "image",
88
+ "image": "./examples/images/web_6f93090a-81f6-489e-bb35-1a2838b18c01.png",
89
+ },
90
+ {"type": "text", "text": "In this UI screenshot, what is the position of the element corresponding to the command \"switch language of current page\" (with bbox)?"},
91
+ ],
92
+ }
93
+ ]
94
+
95
+
96
+ # Preparation for inference
97
+ text = processor.apply_chat_template(
98
+ messages, tokenize=False, add_generation_prompt=True
99
+ )
100
+ image_inputs, video_inputs = process_vision_info(messages)
101
+ inputs = processor(
102
+ text=[text],
103
+ images=image_inputs,
104
+ videos=video_inputs,
105
+ padding=True,
106
+ return_tensors="pt",
107
+ )
108
+ inputs = inputs.to("cuda")
109
+
110
+ # Inference: Generation of the output
111
+ generated_ids = model.generate(**inputs, max_new_tokens=128)
112
+
113
+ generated_ids_trimmed = [
114
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
115
+ ]
116
+
117
+ output_text = processor.batch_decode(
118
+ generated_ids_trimmed, skip_special_tokens=False, clean_up_tokenization_spaces=False
119
+ )
120
+ print(output_text)
121
+ # <|object_ref_start|>language switch<|object_ref_end|><|box_start|>(576,12),(592,42)<|box_end|><|im_end|>
122
+ ```