Info

Qwen2.5-0.5B-Medical-ReasonMed370K

A 0.5 billion parameter medical reasoning model fine-tuned on the complete ReasonMed 370K dataset. This model is built on top of Qwen2.5-0.5B-Instruct and trained to perform structured clinical reasoning, differential diagnosis, and evidence-based medical question answering.

Model Details

Base Model: unsloth/Qwen2.5-0.5B-Instruct
Model Size: 0.5B parameters
Fine-tuning Method: LoRA via Unsloth
Training Dataset: ReasonMed 370K (full dataset)
Training Hardware: NVIDIA Tesla T4 (Kaggle free tier)
License: Apache 2.0

Training Details

The model was fine-tuned in two stages, each covering half of the ReasonMed dataset:

Stage 1: Fine-tuned on the first 185,000 samples of ReasonMed using LoRA with the following configuration:

LoRA rank: 8
LoRA alpha: 16
Learning rate: 5e-5
Batch size: 2 with 16 gradient accumulation steps
Max sequence length: 4096
Epochs: 1
Optimizer: AdamW 8-bit

Stage 2: Continued fine-tuning on the remaining 184,983 samples with identical configuration, completing one full pass over the entire 370K dataset.

Both stages used packing=False to ensure every sample was processed individually without truncation.

Dataset

This model was trained on ReasonMed, the largest open-source medical reasoning dataset available, comprising 370,000 high-quality examples distilled from 1.75 million initial reasoning paths generated by multiple large language models.

ReasonMed is built through a multi-agent verification and refinement pipeline that includes an Error Refiner to correct error-prone reasoning steps. Each example combines detailed chain-of-thought reasoning with a concise answer summary, covering a wide range of medical topics including clinical reasoning, differential diagnosis, pharmacology, and medical question answering.

For more details on the dataset, refer to the official repository: https://github.com/alibaba-damo-academy/ReasonMed

What the Model Can Do

After training on the full ReasonMed dataset, the model demonstrates the ability to:

Work through clinical presentations step by step
Generate differential diagnoses with reasoning for each option
Rule out unlikely diagnoses with justification
Provide structured final answers with clinical pearls
Reason through medical multiple choice questions with explanation

Demo

The screenshot above shows the model running through a clinical scenario involving hypothyroidism, demonstrating its ability to identify key symptoms, interpret lab values, and produce a structured response with management guidance.

Limitations

This is a 0.5B parameter model and has a hard ceiling on reasoning depth and factual recall
Small models are prone to inconsistency across similar questions
The model may occasionally hallucinate clinical details
This model is intended for research and educational purposes only
It should not be used for real clinical decision making or as a substitute for a qualified medical professional

Usage

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = "Rumiii/Qwen2.5-0.5B-Medical-ReasonMed370K",
    max_seq_length = 4096,
    load_in_4bit   = True,
)
FastLanguageModel.for_inference(model)

messages = [
    {"role": "user", "content": "Your medical question here"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize              = True,
    add_generation_prompt = True,
    return_tensors        = "pt"
).to("cuda")

outputs = model.generate(
    input_ids            = inputs,
    max_new_tokens       = 1024,
    temperature          = 0.7,
    do_sample            = True,
    repetition_penalty   = 1.3,
    no_repeat_ngram_size = 3,
    top_p                = 0.9,
    top_k                = 50,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

If you use this model, please cite the ReasonMed dataset:

@misc{sun2025reasonmed370kmultiagentgenerated,
      title={ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning}, 
      author={Yu Sun and Xingyu Qian and Weiwen Xu and Hao Zhang and Chenghao Xiao and Long Li and Yu Rong and Wenbing Huang and Qifeng Bai and Tingyang Xu},
      year={2025},
      eprint={2506.09513},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.09513}, 
}

Acknowledgements

Training was conducted on Kaggle free tier infrastructure using Unsloth for efficient fine-tuning. The ReasonMed dataset was created by the team at Alibaba DAMO Academy and Tencent AI Lab.