Qwen2.5-0.5B-MedReason-SFT
A 494M parameter language model fine-tuned for structured clinical reasoning and medical question answering. Built on Qwen-2.5-0.5B-Instruct using QLoRA and Chain-of-Thought (CoT) supervision, this model demonstrates that compact models can produce structured, step-by-step medical diagnostic reasoning when trained on high-quality clinical data.
Demo
The Gradio interface for submitting clinical queries and reviewing AI-generated diagnostic reasoning output.
Model Details
Summary
| Property | Value |
|---|---|
| Model Name | Qwen2.5-0.5B-MedReason-SFT |
| Base Model | Qwen/Qwen2.5-0.5B-Instruct |
| Parameters | 494 Million |
| Architecture | Transformer-based Causal Decoder |
| Training Method | Supervised Fine-Tuning (SFT) with QLoRA |
| Quantization (Training) | 4-bit NormalFloat (NF4) |
| Merged Precision | 16-bit (FP16) |
| Context Window | 2048 Tokens |
| License | Apache 2.0 |
Model Description
This model was fine-tuned to internalize structured medical reasoning through Chain-of-Thought (CoT) data derived from the Baichuan-M3-235B model. The training data includes a reasoning_content field that teaches the model to think through differential diagnoses, symptom analysis, and clinical decision pathways before producing a final answer.
The project demonstrates that sub-1B parameter models can be meaningfully specialized for clinical domains without large compute budgets, using efficient fine-tuning techniques (LoRA, 4-bit quantization, gradient accumulation) on a single consumer-grade GPU.
Training Details
Dataset
| Property | Value |
|---|---|
| Dataset | OpenMed Medical-Reasoning-SFT |
| Source | Derived from Baichuan-M3-235B |
| Training Subset Used | 124,520 samples |
| Full Dataset Size | ~1,790,000 samples |
| Format | Conversational JSONL with reasoning_content CoT field |
| HuggingFace Link | openmed/Medical-Reasoning-SFT |
The dataset contains medical question-answer pairs enriched with reasoning chains that simulate expert-level clinical thinking. Each sample follows an instruction-response structure, with the reasoning trace embedded to guide the model toward interpretable, step-by-step outputs.
Hardware
- GPU: NVIDIA Tesla T4 (single GPU)
- Training Library: Unsloth
- Inference Speed Gain: ~2x via Unsloth Fast Inference kernels
Hyperparameters
| Parameter | Value |
|---|---|
| Learning Rate | 2e-4 |
| Optimizer | AdamW (8-bit) |
| Batch Size (per device) | 1 |
| Gradient Accumulation Steps | 4 |
| Effective Batch Size | 4 |
| LoRA Rank (R) | 16 |
| LoRA Alpha | 16 |
| Weight Decay | 0.01 |
| Quantization | NF4 (4-bit) |
Training Metrics
| Metric | Value |
|---|---|
| Initial Training Loss | 2.3664 |
| Final Training Loss (1,000 steps) | 1.6457 |
| Loss Reduction | ~30.5% |
How to Use
Installation
pip install torch torchvision torchaudio
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install transformers
Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Rumiii/Qwen2.5-0.5B-MedReason-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{
"role": "user",
"content": "A 58-year-old male presents with crushing chest pain radiating to the left arm, diaphoresis, and shortness of breath. What is the most likely diagnosis and immediate management?"
}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
Gradio Interface
A dedicated Gradio agent is included in the source repository for interactive clinical review:
git clone https://github.com/sufirumii/Medical-Reasoning-AI-Agent-Fine-Tuning-Qwen-2.5-0.5B
cd Medical-Reasoning-AI-Agent-Fine-Tuning-Qwen-2.5-0.5B
python gradio_agent.py
Intended Use
Appropriate Use Cases
- Medical education and student training aids
- Clinical reasoning research and benchmarking
- Exploring the capabilities of compact fine-tuned models in healthcare NLP
- Prototyping AI-assisted diagnostic tools for research environments
Out-of-Scope Use Cases
- Direct clinical decision-making without physician oversight
- Replacing licensed medical professionals
- Deployment in any production healthcare system without rigorous validation
- Use in emergency or life-critical medical situations
Limitations
- Scale: Trained on an initial subset (124,520 samples) of the full 1.79M sample dataset. Performance may improve significantly with full-dataset training.
- Parameter Count: At 494M parameters, the model may lack the depth required for nuanced or rare clinical presentations that larger models handle more reliably.
- Hallucination Risk: Like all language models, this model can produce confident but incorrect medical statements. All outputs must be validated by a qualified clinician.
- Language: Trained on English-language data only. Performance in other languages is not guaranteed.
- Recency: Medical knowledge has a training cutoff and does not reflect the latest clinical guidelines or drug approvals.
Bias and Ethical Considerations
The training data is derived from a large-scale model (Baichuan-M3-235B) and may reflect biases present in that model or its underlying sources. Medical AI systems are known to exhibit demographic bias — including disparities across age, sex, ethnicity, and socioeconomic status — which may affect the quality of reasoning for underrepresented patient populations. Users should treat all outputs critically and not apply them uniformly across diverse patient groups without independent clinical assessment.
Clinical Disclaimer
This model is intended strictly for research and educational purposes. It is not approved for clinical use and must not be used as a substitute for professional medical advice, diagnosis, or treatment. All AI-generated medical reasoning must be reviewed and verified by a licensed and qualified healthcare professional before any clinical consideration.
Citation
If you use this model in your research, please cite it as:
@misc{qwen25_medreason_sft,
title = {Qwen2.5-0.5B-MedReason-SFT: A Compact Model for Clinical Chain-of-Thought Reasoning},
author = {Rumi Sufi},
year = {2026},
howpublished = {HuggingFace},
url = {https://huggingface.co/Rumiii/Qwen2.5-0.5B-MedReason-SFT}
}
Acknowledgements
- Unsloth for memory-efficient fine-tuning infrastructure
- OpenMed for the Medical-Reasoning-SFT dataset
- Qwen Team / Alibaba Cloud for the Qwen-2.5 base model family
- Downloads last month
- 2