Instructions to use iimran/Qwen2.5-3B-R1-MedicalReasoner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="iimran/Qwen2.5-3B-R1-MedicalReasoner")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("iimran/Qwen2.5-3B-R1-MedicalReasoner")
model = AutoModelForCausalLM.from_pretrained("iimran/Qwen2.5-3B-R1-MedicalReasoner")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "iimran/Qwen2.5-3B-R1-MedicalReasoner"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iimran/Qwen2.5-3B-R1-MedicalReasoner",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/iimran/Qwen2.5-3B-R1-MedicalReasoner

SGLang

How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "iimran/Qwen2.5-3B-R1-MedicalReasoner" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iimran/Qwen2.5-3B-R1-MedicalReasoner",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "iimran/Qwen2.5-3B-R1-MedicalReasoner" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iimran/Qwen2.5-3B-R1-MedicalReasoner",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for iimran/Qwen2.5-3B-R1-MedicalReasoner to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for iimran/Qwen2.5-3B-R1-MedicalReasoner to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for iimran/Qwen2.5-3B-R1-MedicalReasoner to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="iimran/Qwen2.5-3B-R1-MedicalReasoner",
    max_seq_length=2048,
)

Docker Model Runner
How to use iimran/Qwen2.5-3B-R1-MedicalReasoner with Docker Model Runner:
```
docker model run hf.co/iimran/Qwen2.5-3B-R1-MedicalReasoner
```

Qwen2.5-3B-R1-MedicalReasoner

Qwen2.5-3B-R1-MedicalReasoner is a clinical reasoning language model fine-tuned for advanced diagnostic and case-based problem solving. It has been developed for applications in medical education, clinical decision support, and research, with the capability to generate detailed chain-of-thought responses that include both the reasoning process and the final answer.

Overview

Model Name: Qwen2.5-3B-R1-MedicalReasoner
Base Architecture: Qwen2.5 (3B)
Primary Application: Clinical reasoning and medical problem solving
Key Features:
- Chain-of-Thought Outputs: Responds with structured reasoning (<reasoning> ... </reasoning>) followed by a concise answer (<answer> ... </answer>).
- Multi-Specialty Coverage: Well-suited for scenarios in internal medicine, surgery, pediatrics, OB/GYN, emergency medicine, and more.
- Explainable AI: Generates detailed, educational explanations that support clinical decision-making.

Model Capabilities

Expert-Level Clinical Reasoning: Equipped to analyze complex clinical scenarios and provide in-depth diagnostic reasoning.
Structured Outputs: Enforces a response format that separates the thought process from the final answer, aiding transparency and interpretability.
Optimized for Speed: Uses Unsloth and vLLM for fast, efficient inference on GPU systems.

Inference and Usage

Below is an example of how to use the model for inference or refer to inference.py in files section:

from unsloth import FastLanguageModel, is_bfloat16_supported
from vllm import SamplingParams
from huggingface_hub import snapshot_download
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="iimran/Qwen2.5-3B-R1-MedicalReasoner",
    load_in_4bit=True,
    fast_inference=True,
    gpu_memory_utilization=0.5
)
lora_rank = 64
model = FastLanguageModel.get_peft_model(
    model,
    r=lora_rank,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=lora_rank,
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)
lora_path = snapshot_download("iimran/Qwen2.5-3B-R1-MedicalReasoner-lora-adapter")
print("LoRA adapter downloaded to:", lora_path)
model.load_lora(lora_path)
SYSTEM_PROMPT = (
    "Respond in the following format:\n"
    "<reasoning>\n"
    "...\n"
    "</reasoning>\n"
    "<answer>\n"
    "...\n"
    "</answer>"
)
USER_PROMPT = (
    "In the context of disseminated intravascular coagulation (DIC), "
    "which blood component is expected to show an increase due to the excessive breakdown of fibrin?"
)
text = tokenizer.apply_chat_template(
    [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
    tokenize=False,
    add_generation_prompt=True
)
sampling_params = SamplingParams(
    temperature=0.1,
    top_p=0.95,
    max_tokens=4096,
)
outputs = model.fast_generate(
    text,
    sampling_params=sampling_params,
    lora_request=None
)
print(outputs[0].outputs[0].text)

Adapter Integration

For further fine-tuning or experiments with LoRA adapters, the LoRA adapter for this model is available in a separate repository.

LoRA Adapter Repo: iimran/Qwen2.5-3B-R1-MedicalReasoner-lora-adapter

To download and integrate the LoRA adapter:

from huggingface_hub import snapshot_download

# Download the LoRA adapter repository:
lora_path = snapshot_download("iimran/Qwen2.5-3B-R1-MedicalReasoner-lora-adapter")
print("LoRA adapter downloaded to:", lora_path)

# Load the adapter into the model:
model.load_lora(lora_path)

Installation

To use this model, install the required packages:

pip install unsloth vllm trl datasets huggingface-hub

A compatible GPU is recommended for optimal performance.

Citation

If you use Qwen2.5-3B-R1-MedicalReasoner in your research, please cite:

@misc{sarwar2025reinforcement,
  author = {Imran Sarwar and Muhammad Rouf Mustafa},
  title = {Reinforcement Learning Elevates Qwen2.5-3B Medical Reasoning Performance},
  year = {2025},
  month = {Apr},
  day = {10},
  publisher = {Imran Sarwar's Blog},
  howpublished = {\url{https://www.imransarwar.com/blog-posts/Reinforcement-Learning-Elevates-Qwen2.5-Medical-Reasoning-Performance.html}},
  note = {Accessed: 2025-04-09}
}

@misc{Qwen2.5-3B-R1-MedicalReasoner,
  authors = {Imran Sarwar, Muhammad Rouf Mustafa},
  title = {Qwen 2.5-3B Meets Deepseek R1: A Fine-Tuned Medical Reasoning Model for Enhanced Diagnostics},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/iimran/Qwen2.5-3B-R1-MedicalReasoner}
}

Disclaimer

This model is intended for research and educational purposes only. It should not be used as the sole basis for clinical decision-making. All outputs should be validated by qualified healthcare professionals.

Downloads last month: 6

Model tree for iimran/Qwen2.5-3B-R1-MedicalReasoner

Base model

Qwen/Qwen2.5-3B

Finetuned

(423)

this model

iimran
/

Qwen2.5-3B-R1-MedicalReasoner