Instructions to use iimran/Qwen2.5-3B-R1-MedicalReasoner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="iimran/Qwen2.5-3B-R1-MedicalReasoner") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("iimran/Qwen2.5-3B-R1-MedicalReasoner") model = AutoModelForCausalLM.from_pretrained("iimran/Qwen2.5-3B-R1-MedicalReasoner") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "iimran/Qwen2.5-3B-R1-MedicalReasoner" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iimran/Qwen2.5-3B-R1-MedicalReasoner", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/iimran/Qwen2.5-3B-R1-MedicalReasoner
- SGLang
How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "iimran/Qwen2.5-3B-R1-MedicalReasoner" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iimran/Qwen2.5-3B-R1-MedicalReasoner", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "iimran/Qwen2.5-3B-R1-MedicalReasoner" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iimran/Qwen2.5-3B-R1-MedicalReasoner", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for iimran/Qwen2.5-3B-R1-MedicalReasoner to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for iimran/Qwen2.5-3B-R1-MedicalReasoner to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for iimran/Qwen2.5-3B-R1-MedicalReasoner to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="iimran/Qwen2.5-3B-R1-MedicalReasoner", max_seq_length=2048, ) - Docker Model Runner
How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with Docker Model Runner:
docker model run hf.co/iimran/Qwen2.5-3B-R1-MedicalReasoner
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("iimran/Qwen2.5-3B-R1-MedicalReasoner")
model = AutoModelForCausalLM.from_pretrained("iimran/Qwen2.5-3B-R1-MedicalReasoner")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Qwen2.5-3B-R1-MedicalReasoner
Qwen2.5-3B-R1-MedicalReasoner is a clinical reasoning language model fine-tuned for advanced diagnostic and case-based problem solving. It has been developed for applications in medical education, clinical decision support, and research, with the capability to generate detailed chain-of-thought responses that include both the reasoning process and the final answer.
Overview
- Model Name: Qwen2.5-3B-R1-MedicalReasoner
- Base Architecture: Qwen2.5 (3B)
- Primary Application: Clinical reasoning and medical problem solving
- Key Features:
- Chain-of-Thought Outputs: Responds with structured reasoning (
<reasoning> ... </reasoning>) followed by a concise answer (<answer> ... </answer>). - Multi-Specialty Coverage: Well-suited for scenarios in internal medicine, surgery, pediatrics, OB/GYN, emergency medicine, and more.
- Explainable AI: Generates detailed, educational explanations that support clinical decision-making.
- Chain-of-Thought Outputs: Responds with structured reasoning (
Model Capabilities
- Expert-Level Clinical Reasoning: Equipped to analyze complex clinical scenarios and provide in-depth diagnostic reasoning.
- Structured Outputs: Enforces a response format that separates the thought process from the final answer, aiding transparency and interpretability.
- Optimized for Speed: Uses Unsloth and vLLM for fast, efficient inference on GPU systems.
Inference and Usage
Below is an example of how to use the model for inference or refer to inference.py in files section:
from unsloth import FastLanguageModel, is_bfloat16_supported
from vllm import SamplingParams
from huggingface_hub import snapshot_download
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="iimran/Qwen2.5-3B-R1-MedicalReasoner",
load_in_4bit=True,
fast_inference=True,
gpu_memory_utilization=0.5
)
lora_rank = 64
model = FastLanguageModel.get_peft_model(
model,
r=lora_rank,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=lora_rank,
use_gradient_checkpointing="unsloth",
random_state=3407,
)
lora_path = snapshot_download("iimran/Qwen2.5-3B-R1-MedicalReasoner-lora-adapter")
print("LoRA adapter downloaded to:", lora_path)
model.load_lora(lora_path)
SYSTEM_PROMPT = (
"Respond in the following format:\n"
"<reasoning>\n"
"...\n"
"</reasoning>\n"
"<answer>\n"
"...\n"
"</answer>"
)
USER_PROMPT = (
"In the context of disseminated intravascular coagulation (DIC), "
"which blood component is expected to show an increase due to the excessive breakdown of fibrin?"
)
text = tokenizer.apply_chat_template(
[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_PROMPT},
],
tokenize=False,
add_generation_prompt=True
)
sampling_params = SamplingParams(
temperature=0.1,
top_p=0.95,
max_tokens=4096,
)
outputs = model.fast_generate(
text,
sampling_params=sampling_params,
lora_request=None
)
print(outputs[0].outputs[0].text)
Adapter Integration
For further fine-tuning or experiments with LoRA adapters, the LoRA adapter for this model is available in a separate repository.
- LoRA Adapter Repo: iimran/Qwen2.5-3B-R1-MedicalReasoner-lora-adapter
To download and integrate the LoRA adapter:
from huggingface_hub import snapshot_download
# Download the LoRA adapter repository:
lora_path = snapshot_download("iimran/Qwen2.5-3B-R1-MedicalReasoner-lora-adapter")
print("LoRA adapter downloaded to:", lora_path)
# Load the adapter into the model:
model.load_lora(lora_path)
Installation
To use this model, install the required packages:
pip install unsloth vllm trl datasets huggingface-hub
A compatible GPU is recommended for optimal performance.
Citation
If you use Qwen2.5-3B-R1-MedicalReasoner in your research, please cite:
@misc{sarwar2025reinforcement,
author = {Imran Sarwar and Muhammad Rouf Mustafa},
title = {Reinforcement Learning Elevates Qwen2.5-3B Medical Reasoning Performance},
year = {2025},
month = {Apr},
day = {10},
publisher = {Imran Sarwar's Blog},
howpublished = {\url{https://www.imransarwar.com/blog-posts/Reinforcement-Learning-Elevates-Qwen2.5-Medical-Reasoning-Performance.html}},
note = {Accessed: 2025-04-09}
}
@misc{Qwen2.5-3B-R1-MedicalReasoner,
authors = {Imran Sarwar, Muhammad Rouf Mustafa},
title = {Qwen 2.5-3B Meets Deepseek R1: A Fine-Tuned Medical Reasoning Model for Enhanced Diagnostics},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/iimran/Qwen2.5-3B-R1-MedicalReasoner}
}
Disclaimer
This model is intended for research and educational purposes only. It should not be used as the sole basis for clinical decision-making. All outputs should be validated by qualified healthcare professionals.
- Downloads last month
- 6
Model tree for iimran/Qwen2.5-3B-R1-MedicalReasoner
Base model
Qwen/Qwen2.5-3B
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="iimran/Qwen2.5-3B-R1-MedicalReasoner") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)