You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen2.5-0.5B-MedReason-SFT

A 494M parameter language model fine-tuned for structured clinical reasoning and medical question answering. Built on Qwen-2.5-0.5B-Instruct using QLoRA and Chain-of-Thought (CoT) supervision, this model demonstrates that compact models can produce structured, step-by-step medical diagnostic reasoning when trained on high-quality clinical data.


Demo

Gradio Clinical Reasoning Interface The Gradio interface for submitting clinical queries and reviewing AI-generated diagnostic reasoning output.


Model Details

Summary

Property Value
Model Name Qwen2.5-0.5B-MedReason-SFT
Base Model Qwen/Qwen2.5-0.5B-Instruct
Parameters 494 Million
Architecture Transformer-based Causal Decoder
Training Method Supervised Fine-Tuning (SFT) with QLoRA
Quantization (Training) 4-bit NormalFloat (NF4)
Merged Precision 16-bit (FP16)
Context Window 2048 Tokens
License Apache 2.0

Model Description

This model was fine-tuned to internalize structured medical reasoning through Chain-of-Thought (CoT) data derived from the Baichuan-M3-235B model. The training data includes a reasoning_content field that teaches the model to think through differential diagnoses, symptom analysis, and clinical decision pathways before producing a final answer.

The project demonstrates that sub-1B parameter models can be meaningfully specialized for clinical domains without large compute budgets, using efficient fine-tuning techniques (LoRA, 4-bit quantization, gradient accumulation) on a single consumer-grade GPU.


Training Details

Dataset

Property Value
Dataset OpenMed Medical-Reasoning-SFT
Source Derived from Baichuan-M3-235B
Training Subset Used 124,520 samples
Full Dataset Size ~1,790,000 samples
Format Conversational JSONL with reasoning_content CoT field
HuggingFace Link openmed/Medical-Reasoning-SFT

The dataset contains medical question-answer pairs enriched with reasoning chains that simulate expert-level clinical thinking. Each sample follows an instruction-response structure, with the reasoning trace embedded to guide the model toward interpretable, step-by-step outputs.

Hardware

  • GPU: NVIDIA Tesla T4 (single GPU)
  • Training Library: Unsloth
  • Inference Speed Gain: ~2x via Unsloth Fast Inference kernels

Hyperparameters

Parameter Value
Learning Rate 2e-4
Optimizer AdamW (8-bit)
Batch Size (per device) 1
Gradient Accumulation Steps 4
Effective Batch Size 4
LoRA Rank (R) 16
LoRA Alpha 16
Weight Decay 0.01
Quantization NF4 (4-bit)

Training Metrics

Metric Value
Initial Training Loss 2.3664
Final Training Loss (1,000 steps) 1.6457
Loss Reduction ~30.5%

How to Use

Installation

pip install torch torchvision torchaudio
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install transformers

Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Rumiii/Qwen2.5-0.5B-MedReason-SFT"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "A 58-year-old male presents with crushing chest pain radiating to the left arm, diaphoresis, and shortness of breath. What is the most likely diagnosis and immediate management?"
    }
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Gradio Interface

A dedicated Gradio agent is included in the source repository for interactive clinical review:

git clone https://github.com/sufirumii/Medical-Reasoning-AI-Agent-Fine-Tuning-Qwen-2.5-0.5B
cd Medical-Reasoning-AI-Agent-Fine-Tuning-Qwen-2.5-0.5B
python gradio_agent.py

Intended Use

Appropriate Use Cases

  • Medical education and student training aids
  • Clinical reasoning research and benchmarking
  • Exploring the capabilities of compact fine-tuned models in healthcare NLP
  • Prototyping AI-assisted diagnostic tools for research environments

Out-of-Scope Use Cases

  • Direct clinical decision-making without physician oversight
  • Replacing licensed medical professionals
  • Deployment in any production healthcare system without rigorous validation
  • Use in emergency or life-critical medical situations

Limitations

  • Scale: Trained on an initial subset (124,520 samples) of the full 1.79M sample dataset. Performance may improve significantly with full-dataset training.
  • Parameter Count: At 494M parameters, the model may lack the depth required for nuanced or rare clinical presentations that larger models handle more reliably.
  • Hallucination Risk: Like all language models, this model can produce confident but incorrect medical statements. All outputs must be validated by a qualified clinician.
  • Language: Trained on English-language data only. Performance in other languages is not guaranteed.
  • Recency: Medical knowledge has a training cutoff and does not reflect the latest clinical guidelines or drug approvals.

Bias and Ethical Considerations

The training data is derived from a large-scale model (Baichuan-M3-235B) and may reflect biases present in that model or its underlying sources. Medical AI systems are known to exhibit demographic bias — including disparities across age, sex, ethnicity, and socioeconomic status — which may affect the quality of reasoning for underrepresented patient populations. Users should treat all outputs critically and not apply them uniformly across diverse patient groups without independent clinical assessment.


Clinical Disclaimer

This model is intended strictly for research and educational purposes. It is not approved for clinical use and must not be used as a substitute for professional medical advice, diagnosis, or treatment. All AI-generated medical reasoning must be reviewed and verified by a licensed and qualified healthcare professional before any clinical consideration.


Citation

If you use this model in your research, please cite it as:

@misc{qwen25_medreason_sft,
  title        = {Qwen2.5-0.5B-MedReason-SFT: A Compact Model for Clinical Chain-of-Thought Reasoning},
  author       = {Rumi Sufi},
  year         = {2026},
  howpublished = {HuggingFace},
  url          = {https://huggingface.co/Rumiii/Qwen2.5-0.5B-MedReason-SFT}
}

Acknowledgements

Downloads last month
2
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rumiii/Qwen2.5-0.5B-MedReason-SFT

Adapter
(499)
this model