2U35
/

MIRG-RL-7B

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: image-text-to-text
+---
+# MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning
+This repository contains the **MIRG-RL** model, presented in the paper [MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning](https://huggingface.co/papers/2509.21788).
+MIRG-RL addresses the challenges of multi-image reasoning and grounding by understanding complex cross-image relationships at both object and image levels. It proposes a unified framework combining supervised fine-tuning with annotated trajectories and image-aware reinforcement learning optimization, progressively developing multi-image reasoning capabilities. Experiments demonstrate that MIRG-RL achieves state-of-the-art (SOTA) performance in multi-image grounding benchmarks, exceeding the previous best method by 1%.
+For more details, including the code and datasets, please refer to the [GitHub repository](https://github.com/ZEUS2035/MIRG-RL).
+## Datasets
+1.  Links to medical datasets:
+    MGrounding-630k dataset: https://huggingface.co/datasets/Michael4933/MGrounding-630k
+2.  The MIRG-RL data used in this paper are stored in [/MIRG-RL/dataset](https://github.com/ZEUS2035/MIRG-RL/tree/main/data). The images in the data should be downloaded from the provided link and stored at the corresponding path.
+## Model
+The model weights are available on [HuggingFace](https://huggingface.co/2U35/MIRG-RL-7B/tree/main).
+## Train
+Our two-stage training process is conducted mainly based on [ms-swift](https://github.com/modelscope/ms-swift), where the whole LLM backbone parameters are finetuned. We provide our training script for these two stages.
+## Evaluation
+We follow the [MIG-Bench](https://github.com/thunlp/Migician?tab=readme-ov-file#Inference) eval. We provide our eval script.