Image-Text-to-Text
Transformers
Safetensors
English
qwen2_5_vl
medical
multimodal
vqa
visual-grounding
chain-of-thought
reinforcement-learning
grpo
conversational
text-generation-inference
Instructions to use IQuestLab/UniReason-Med with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use IQuestLab/UniReason-Med with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="IQuestLab/UniReason-Med") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("IQuestLab/UniReason-Med") model = AutoModelForMultimodalLM.from_pretrained("IQuestLab/UniReason-Med") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use IQuestLab/UniReason-Med with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "IQuestLab/UniReason-Med" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IQuestLab/UniReason-Med", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/IQuestLab/UniReason-Med
- SGLang
How to use IQuestLab/UniReason-Med with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "IQuestLab/UniReason-Med" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IQuestLab/UniReason-Med", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "IQuestLab/UniReason-Med" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IQuestLab/UniReason-Med", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use IQuestLab/UniReason-Med with Docker Model Runner:
docker model run hf.co/IQuestLab/UniReason-Med
| license: apache-2.0 | |
| base_model: | |
| - Qwen/Qwen2.5-VL-7B-Instruct | |
| pipeline_tag: image-text-to-text | |
| library_name: transformers | |
| tags: | |
| - medical | |
| - multimodal | |
| - vqa | |
| - visual-grounding | |
| - chain-of-thought | |
| - reinforcement-learning | |
| - grpo | |
| - qwen2_5_vl | |
| language: | |
| - en | |
| datasets: | |
| - IQuestLab/UniReason-Med-Data | |
| # UniReason-Med | |
| UniReason-Med is a medical multimodal model that accompanies the paper | |
| **"UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA"**. | |
| It studies whether grounded reasoning supervision from abundant 2D medical images can improve | |
| 3D medical VQA when both modalities share a common reasoning interface. A single checkpoint | |
| processes either a 2D image or a slice-serialized 3D volume, generating interleaved textual | |
| reasoning and localized visual evidence through shared bounding-box syntax and region-token | |
| injection under a common grounded reasoning policy. | |
| - **Base model:** [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | |
| - **Training data:** [IQuestLab/UniReason-Med-Data](https://huggingface.co/datasets/IQuestLab/UniReason-Med-Data) | |
| - **Code:** [github.com/IQuestLab/unireason-med](https://github.com/IQuestLab/unireason-med) | |
| - **Modalities:** image + text → text | |
| - **License:** Apache-2.0 | |
| ## Model Description | |
| UniReason-Med is trained to interleave free-form reasoning with localized visual evidence. | |
| During reasoning, the model emits bounding boxes over the input image; the referenced region is | |
| cropped and re-injected as additional visual context for the next reasoning step (a | |
| grounded chain-of-thought, GCoT, interface). The same shared interface is applied to 2D images | |
| and to 3D volumes serialized as ordered slice sequences, which allows grounded supervision | |
| collected on plentiful 2D data to transfer to 3D reasoning. | |
| A central result of the paper is that joint 2D+3D grounded supervision improves 3D reasoning | |
| compared with 3D-only training under matched schedules, while the shared grounding interface | |
| also benefits 2D tasks. | |
| ## Training | |
| The model is built with a two-stage recipe: | |
| 1. **Supervised fine-tuning (SFT)** on the UniMed-CoT dataset — 220K grounded chain-of-thought | |
| samples (170K 2D + 50K 3D) with interleaved textual reasoning and grounded visual evidence. | |
| Vision tower and the multimodal projector are frozen; the language model is fully fine-tuned. | |
| 2. **Reinforcement learning (GRPO)** with outcome-level rewards. RL uses answer-correctness and | |
| format rewards rather than ground-truth localization-overlap rewards such as IoU or Dice. | |
| This checkpoint is the merged Hugging Face model exported from the GRPO stage. | |
| Training code (LLaMA-Factory for SFT, verl for GRPO) and configs are released at: | |
| <https://github.com/IQuestLab/unireason-med>. | |
| ## Intended Use and Limitations | |
| - **Intended use:** research on medical multimodal reasoning, visual grounding, and 2D-to-3D | |
| transfer. Suitable for academic benchmarking and method development. | |
| - **Out of scope:** UniReason-Med is a research artifact and is **not** a medical device. It must | |
| **not** be used for clinical diagnosis, treatment decisions, or any real patient care. | |
| - **Limitations:** outputs may be incorrect, incomplete, or biased; performance depends on | |
| imaging modality, anatomy, and distribution shift from the training data. Predicted bounding | |
| boxes are reasoning aids, not validated localization. Always involve qualified medical | |
| professionals for any health-related decision. | |
| ## License | |
| Released under the [Apache License 2.0](./LICENSE), consistent with the base model | |
| Qwen2.5-VL-7B-Instruct. Note the research-only intended use and the medical-use limitations above. | |
| ## Citation | |
| If you use this model, please cite the UniReason-Med paper: | |
| ```bibtex | |
| @article{unireasonmed, | |
| title = {UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA}, | |
| author = {UniReason-Med Team}, | |
| year = {2025} | |
| } | |
| ``` | |