Image-Text-to-Text
Transformers
Safetensors
English
qwen2_5_vl
conversational
text-generation-inference
Instructions to use hmhm1229/MoRE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hmhm1229/MoRE with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="hmhm1229/MoRE") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("hmhm1229/MoRE") model = AutoModelForImageTextToText.from_pretrained("hmhm1229/MoRE") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use hmhm1229/MoRE with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "hmhm1229/MoRE" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hmhm1229/MoRE", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/hmhm1229/MoRE
- SGLang
How to use hmhm1229/MoRE with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "hmhm1229/MoRE" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hmhm1229/MoRE", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "hmhm1229/MoRE" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hmhm1229/MoRE", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use hmhm1229/MoRE with Docker Model Runner:
docker model run hf.co/hmhm1229/MoRE
Add pipeline tag, library name, and link to paper/code
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,42 +1,49 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen2.5-VL-7B-Instruct
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
-
|
| 10 |
<div align="center">
|
| 11 |
|
| 12 |
-
<h1>
|
| 13 |
-
|
| 14 |
|
| 15 |
<h5 align="center">
|
| 16 |
|
| 17 |
-
<a href='https://
|
| 18 |
-
<a href='https://
|
| 19 |
-
<a href='https://huggingface.co/hmhm1229/R1-Router
|
|
|
|
| 20 |
|
| 21 |
Chunyi Peng<sup>1,3</sup>,
|
| 22 |
Zhipeng Xu<sup>1</sup>,
|
| 23 |
Zhenghao Liu<sup>1</sup>,
|
| 24 |
Yishan Li<sup>3</sup>,
|
| 25 |
Yukun Yan<sup>2</sup>,
|
|
|
|
| 26 |
Zhiyuan Liu<sup>2</sup>,
|
| 27 |
-
Yu Gu<sup>1</sup>
|
| 28 |
-
Minghe Yu<sup>1</sup>
|
| 29 |
-
Ge Yu<sup>1</sup>
|
| 30 |
Maosong Sun<sup>2</sup>
|
| 31 |
|
| 32 |
-
<sup>1</sup>Northeastern University, <sup>2</sup>Tsinghua University, <sup>3</sup>
|
| 33 |
|
| 34 |
<h5 align="center"> If you find this project useful, please give us a star🌟.
|
| 35 |
</h5>
|
| 36 |
</div>
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
## News
|
| 39 |
-
|
|
|
|
| 40 |
|
| 41 |
## Environment
|
| 42 |
For training, answer generation, and evaluation processes:
|
|
@@ -60,12 +67,12 @@ conda activate retriever
|
|
| 60 |
wikiextractor enwiki-20241020-pages-articles-multistream.xml.bz2 -o wiki_extracted
|
| 61 |
python wiki_preprocess.py
|
| 62 |
```
|
| 63 |
-
For the image corpus, you can directly download [M-BEIR](https://huggingface.co/datasets/TIGER-Lab/M-BEIR). To embed and index it you can follow the [repository](https://github.com/TIGER-AI-Lab/UniIR)
|
| 64 |
|
| 65 |
For the table corpus, you can download, embed and index Open-WikiTable following the [repository](https://github.com/sean0042/Open_WikiTable), or you can download directly the one we have already preprocessed from [here](https://huggingface.co/hmhm1229/table-retriever).
|
| 66 |
|
| 67 |
## Retrievers Preparation
|
| 68 |
-
For the Text-Image Retriever, you can directly download [UniIR](https://huggingface.co/TIGER-Lab/UniIR)
|
| 69 |
|
| 70 |
For the Table Retriever, you can train it with the help of [repository](https://github.com/sean0042/Open_WikiTable), or you can download it directly from [here](https://huggingface.co/hmhm1229/table-retriever).
|
| 71 |
|
|
@@ -76,9 +83,10 @@ We have prepared all the text datasets in `./datasets`, for images you need to d
|
|
| 76 |
- `WebQA:` WebQA images can be downloaded from [Google Drive](https://drive.google.com/drive/folders/19ApkbD5w0I5sV1IeQ9EofJRyAjKnA7tb)
|
| 77 |
|
| 78 |
## Training
|
| 79 |
-
If you do not want to train the model, you can download [R1-Router](https://huggingface.co/hmhm1229/R1-Router) and skip this section to [Evaluation](#evaluation)
|
|
|
|
| 80 |
### Data Synthesis
|
| 81 |
-
If you want to use the ready-to-use synthetic data directly, you can skip this section to [Step-GRPO Training](#step-grpo-training)
|
| 82 |
|
| 83 |
First, we need to synthesis the data step by step:
|
| 84 |
```bash
|
|
@@ -112,11 +120,11 @@ Our work is built on the following codebases, and we are deeply grateful for the
|
|
| 112 |
|
| 113 |
## Citation
|
| 114 |
We appreciate your citations if you find our paper related and useful to your research!
|
| 115 |
-
```
|
| 116 |
-
@article{
|
| 117 |
-
title={
|
| 118 |
-
author={Peng, Chunyi and Xu, Zhipeng and Liu, Zhenghao and Li, Yishan and Yan, Yukun and Wang, Shuo and Liu, Zhiyuan and Gu, Yu and
|
| 119 |
-
year={2025}
|
| 120 |
url={https://arxiv.org/abs/2505.22095},
|
| 121 |
}
|
| 122 |
```
|
|
@@ -125,4 +133,4 @@ We appreciate your citations if you find our paper related and useful to your re
|
|
| 125 |
If you have questions, suggestions, and bug reports, please email us, we will try our best to help you.
|
| 126 |
```
|
| 127 |
hm.cypeng@gmail.com
|
| 128 |
-
```
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen2.5-VL-7B-Instruct
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: image-text-to-text
|
| 8 |
+
library_name: transformers
|
| 9 |
---
|
| 10 |
|
|
|
|
| 11 |
<div align="center">
|
| 12 |
|
| 13 |
+
<h1> MoRE: Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation (R1-Router) </h1>
|
|
|
|
| 14 |
|
| 15 |
<h5 align="center">
|
| 16 |
|
| 17 |
+
<a href='https://huggingface.co/papers/2505.22095'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
|
| 18 |
+
<a href='https://github.com/OpenBMB/R1-Router'><img src='https://img.shields.io/badge/Code-GitHub-black'></a>
|
| 19 |
+
<a href='https://huggingface.co/hmhm1229/R1-Router'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue'></a>
|
| 20 |
+
<a href='https://huggingface.co/hmhm1229/R1-Router-3B'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue'></a>
|
| 21 |
|
| 22 |
Chunyi Peng<sup>1,3</sup>,
|
| 23 |
Zhipeng Xu<sup>1</sup>,
|
| 24 |
Zhenghao Liu<sup>1</sup>,
|
| 25 |
Yishan Li<sup>3</sup>,
|
| 26 |
Yukun Yan<sup>2</sup>,
|
| 27 |
+
Shuo Wang<sup>2</sup>,
|
| 28 |
Zhiyuan Liu<sup>2</sup>,
|
| 29 |
+
Yu Gu<sup>1</sup>,
|
| 30 |
+
Minghe Yu<sup>1</sup>,
|
| 31 |
+
Ge Yu<sup>1</sup>,
|
| 32 |
Maosong Sun<sup>2</sup>
|
| 33 |
|
| 34 |
+
<sup>1</sup>Northeastern University, <sup>2</sup>Tsinghua University, <sup>3</sup>ModelBest Inc.
|
| 35 |
|
| 36 |
<h5 align="center"> If you find this project useful, please give us a star🌟.
|
| 37 |
</h5>
|
| 38 |
</div>
|
| 39 |
|
| 40 |
+
**MoRE (Mixture-of-Retrieval Experts)** is a novel framework that enables Multimodal Large Language Models (MLLMs) to collaboratively interact with diverse retrieval experts for more effective knowledge exploitation. It dynamically determines which expert to engage with based on the evolving reasoning state.
|
| 41 |
+
|
| 42 |
+
This repository contains the **R1-Router** (7B version), which serves as the controller for routing queries across knowledge bases using **Stepwise Group Relative Policy Optimization (Step-GRPO)**.
|
| 43 |
+
|
| 44 |
## News
|
| 45 |
+
- **2026.04.03**: Our work is accepted by SIGIR2026 🎉🎉🎉!
|
| 46 |
+
- **2025.08.22**: We upload [MoRE-3B](https://huggingface.co/hmhm1229/R1-Router-3B).
|
| 47 |
|
| 48 |
## Environment
|
| 49 |
For training, answer generation, and evaluation processes:
|
|
|
|
| 67 |
wikiextractor enwiki-20241020-pages-articles-multistream.xml.bz2 -o wiki_extracted
|
| 68 |
python wiki_preprocess.py
|
| 69 |
```
|
| 70 |
+
For the image corpus, you can directly download [M-BEIR](https://huggingface.co/datasets/TIGER-Lab/M-BEIR). To embed and index it you can follow the [repository](https://github.com/TIGER-AI-Lab/UniIR).
|
| 71 |
|
| 72 |
For the table corpus, you can download, embed and index Open-WikiTable following the [repository](https://github.com/sean0042/Open_WikiTable), or you can download directly the one we have already preprocessed from [here](https://huggingface.co/hmhm1229/table-retriever).
|
| 73 |
|
| 74 |
## Retrievers Preparation
|
| 75 |
+
For the Text-Image Retriever, you can directly download [UniIR](https://huggingface.co/TIGER-Lab/UniIR).
|
| 76 |
|
| 77 |
For the Table Retriever, you can train it with the help of [repository](https://github.com/sean0042/Open_WikiTable), or you can download it directly from [here](https://huggingface.co/hmhm1229/table-retriever).
|
| 78 |
|
|
|
|
| 83 |
- `WebQA:` WebQA images can be downloaded from [Google Drive](https://drive.google.com/drive/folders/19ApkbD5w0I5sV1IeQ9EofJRyAjKnA7tb)
|
| 84 |
|
| 85 |
## Training
|
| 86 |
+
If you do not want to train the model, you can download [R1-Router](https://huggingface.co/hmhm1229/R1-Router) and skip this section to [Evaluation](#evaluation).
|
| 87 |
+
|
| 88 |
### Data Synthesis
|
| 89 |
+
If you want to use the ready-to-use synthetic data directly, you can skip this section to [Step-GRPO Training](#step-grpo-training).
|
| 90 |
|
| 91 |
First, we need to synthesis the data step by step:
|
| 92 |
```bash
|
|
|
|
| 120 |
|
| 121 |
## Citation
|
| 122 |
We appreciate your citations if you find our paper related and useful to your research!
|
| 123 |
+
```bibtex
|
| 124 |
+
@article{peng2025mixture,
|
| 125 |
+
title={Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation},
|
| 126 |
+
author={Peng, Chunyi and Xu, Zhipeng and Liu, Zhenghao and Li, Yishan and Yan, Yukun and Wang, Shuo and Liu, Zhiyuan and Gu, Yu and Minghe Yu and Ge Yu and Maosong Sun},
|
| 127 |
+
year={2025},
|
| 128 |
url={https://arxiv.org/abs/2505.22095},
|
| 129 |
}
|
| 130 |
```
|
|
|
|
| 133 |
If you have questions, suggestions, and bug reports, please email us, we will try our best to help you.
|
| 134 |
```
|
| 135 |
hm.cypeng@gmail.com
|
| 136 |
+
```
|