Improve model card: add metadata, paper, and code links
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,3 +1,27 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: image-text-to-text
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning
|
| 8 |
+
|
| 9 |
+
This repository contains the **MIRG-RL** model, presented in the paper [MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning](https://huggingface.co/papers/2509.21788).
|
| 10 |
+
|
| 11 |
+
MIRG-RL addresses the challenges of multi-image reasoning and grounding by understanding complex cross-image relationships at both object and image levels. It proposes a unified framework combining supervised fine-tuning with annotated trajectories and image-aware reinforcement learning optimization, progressively developing multi-image reasoning capabilities. Experiments demonstrate that MIRG-RL achieves state-of-the-art (SOTA) performance in multi-image grounding benchmarks, exceeding the previous best method by 1%.
|
| 12 |
+
|
| 13 |
+
For more details, including the code and datasets, please refer to the [GitHub repository](https://github.com/ZEUS2035/MIRG-RL).
|
| 14 |
+
|
| 15 |
+
## Datasets
|
| 16 |
+
1. Links to medical datasets:
|
| 17 |
+
MGrounding-630k dataset: https://huggingface.co/datasets/Michael4933/MGrounding-630k
|
| 18 |
+
2. The MIRG-RL data used in this paper are stored in [/MIRG-RL/dataset](https://github.com/ZEUS2035/MIRG-RL/tree/main/data). The images in the data should be downloaded from the provided link and stored at the corresponding path.
|
| 19 |
+
|
| 20 |
+
## Model
|
| 21 |
+
The model weights are available on [HuggingFace](https://huggingface.co/2U35/MIRG-RL-7B/tree/main).
|
| 22 |
+
|
| 23 |
+
## Train
|
| 24 |
+
Our two-stage training process is conducted mainly based on [ms-swift](https://github.com/modelscope/ms-swift), where the whole LLM backbone parameters are finetuned. We provide our training script for these two stages.
|
| 25 |
+
|
| 26 |
+
## Evaluation
|
| 27 |
+
We follow the [MIG-Bench](https://github.com/thunlp/Migician?tab=readme-ov-file#Inference) eval. We provide our eval script.
|