You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen2.5-0.5B-MedReason-SFT

A 494M parameter language model fine-tuned for structured clinical reasoning and medical question answering. Built on Qwen-2.5-0.5B-Instruct using QLoRA and Chain-of-Thought (CoT) supervision, this model demonstrates that compact models can produce structured, step-by-step medical diagnostic reasoning when trained on high-quality clinical data.

Demo

The Gradio interface for submitting clinical queries and reviewing AI-generated diagnostic reasoning output.

Model Details

Summary

Property	Value
Model Name	Qwen2.5-0.5B-MedReason-SFT
Base Model	Qwen/Qwen2.5-0.5B-Instruct
Parameters	494 Million
Architecture	Transformer-based Causal Decoder
Training Method	Supervised Fine-Tuning (SFT) with QLoRA
Quantization (Training)	4-bit NormalFloat (NF4)
Merged Precision	16-bit (FP16)
Context Window	2048 Tokens
License	Apache 2.0

Model Description

This model was fine-tuned to internalize structured medical reasoning through Chain-of-Thought (CoT) data derived from the Baichuan-M3-235B model. The training data includes a reasoning_content field that teaches the model to think through differential diagnoses, symptom analysis, and clinical decision pathways before producing a final answer.

The project demonstrates that sub-1B parameter models can be meaningfully specialized for clinical domains without large compute budgets, using efficient fine-tuning techniques (LoRA, 4-bit quantization, gradient accumulation) on a single consumer-grade GPU.

Training Details

Dataset

Property	Value
Dataset	OpenMed Medical-Reasoning-SFT
Source	Derived from Baichuan-M3-235B
Training Subset Used	124,520 samples
Full Dataset Size	~1,790,000 samples
Format	Conversational JSONL with `reasoning_content` CoT field
HuggingFace Link	openmed/Medical-Reasoning-SFT

The dataset contains medical question-answer pairs enriched with reasoning chains that simulate expert-level clinical thinking. Each sample follows an instruction-response structure, with the reasoning trace embedded to guide the model toward interpretable, step-by-step outputs.

Hardware

GPU: NVIDIA Tesla T4 (single GPU)
Training Library: Unsloth
Inference Speed Gain: ~2x via Unsloth Fast Inference kernels

Hyperparameters

Parameter	Value
Learning Rate	2e-4
Optimizer	AdamW (8-bit)
Batch Size (per device)	1
Gradient Accumulation Steps	4
Effective Batch Size	4
LoRA Rank (R)	16
LoRA Alpha	16
Weight Decay	0.01
Quantization	NF4 (4-bit)

Training Metrics

Metric	Value
Initial Training Loss	2.3664
Final Training Loss (1,000 steps)	1.6457
Loss Reduction	~30.5%

How to Use

Installation

pip install torch torchvision torchaudio
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install transformers

Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Rumiii/Qwen2.5-0.5B-MedReason-SFT"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "A 58-year-old male presents with crushing chest pain radiating to the left arm, diaphoresis, and shortness of breath. What is the most likely diagnosis and immediate management?"
    }
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Gradio Interface

A dedicated Gradio agent is included in the source repository for interactive clinical review:

git clone https://github.com/sufirumii/Medical-Reasoning-AI-Agent-Fine-Tuning-Qwen-2.5-0.5B
cd Medical-Reasoning-AI-Agent-Fine-Tuning-Qwen-2.5-0.5B
python gradio_agent.py

Intended Use

Appropriate Use Cases

Medical education and student training aids
Clinical reasoning research and benchmarking
Exploring the capabilities of compact fine-tuned models in healthcare NLP
Prototyping AI-assisted diagnostic tools for research environments

Out-of-Scope Use Cases

Direct clinical decision-making without physician oversight
Replacing licensed medical professionals
Deployment in any production healthcare system without rigorous validation
Use in emergency or life-critical medical situations

Limitations

Scale: Trained on an initial subset (124,520 samples) of the full 1.79M sample dataset. Performance may improve significantly with full-dataset training.
Parameter Count: At 494M parameters, the model may lack the depth required for nuanced or rare clinical presentations that larger models handle more reliably.
Hallucination Risk: Like all language models, this model can produce confident but incorrect medical statements. All outputs must be validated by a qualified clinician.
Language: Trained on English-language data only. Performance in other languages is not guaranteed.
Recency: Medical knowledge has a training cutoff and does not reflect the latest clinical guidelines or drug approvals.

Bias and Ethical Considerations

The training data is derived from a large-scale model (Baichuan-M3-235B) and may reflect biases present in that model or its underlying sources. Medical AI systems are known to exhibit demographic bias — including disparities across age, sex, ethnicity, and socioeconomic status — which may affect the quality of reasoning for underrepresented patient populations. Users should treat all outputs critically and not apply them uniformly across diverse patient groups without independent clinical assessment.

Clinical Disclaimer

This model is intended strictly for research and educational purposes. It is not approved for clinical use and must not be used as a substitute for professional medical advice, diagnosis, or treatment. All AI-generated medical reasoning must be reviewed and verified by a licensed and qualified healthcare professional before any clinical consideration.

Citation

If you use this model in your research, please cite it as:

@misc{qwen25_medreason_sft,
  title        = {Qwen2.5-0.5B-MedReason-SFT: A Compact Model for Clinical Chain-of-Thought Reasoning},
  author       = {Rumi Sufi},
  year         = {2026},
  howpublished = {HuggingFace},
  url          = {https://huggingface.co/Rumiii/Qwen2.5-0.5B-MedReason-SFT}
}

Acknowledgements

Unsloth for memory-efficient fine-tuning infrastructure
OpenMed for the Medical-Reasoning-SFT dataset
Qwen Team / Alibaba Cloud for the Qwen-2.5 base model family

Downloads last month: 2

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for Rumiii/Qwen2.5-0.5B-MedReason-SFT

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(499)

this model