Qwen2.5-7B-ReasonMed-cot-123k

Full fine-tune of Qwen/Qwen2.5-7B on the CoTMed variant of ReasonMed — chain-of-thought reasoning followed by the answer, without <think> tags. 123K samples (1/3 of full dataset) × 3 epochs.

Training code: https://github.com/Chen-Jie7/NLP_project

Training data

lingshu-medical-mllm/ReasonMed — CoTMed.json variant. Outputs are free-form CoT reasoning concluding with the answer.

Three-way format comparison

All three models trained with identical hyperparameters on 123K samples. Evaluated via loglikelihood MCQ scoring.

Variant Output format Total acc
reason <think>CoT</think>Response 65.8
cot (this model) CoT without tags 65.0
response Direct answer only 63.8

Evaluation

Benchmark Ours (123K) Paper (370K)
MedQA 61.0 66.9
MedMCQA (val) 60.4 65.1
PubMedQA 75.9 82.0
MMLU-Anatomy 74.8 75.6
MMLU-Clinical-Knowledge 78.1 79.3
MMLU-College-Biology 81.9 79.2
MMLU-College-Medicine 68.8 73.4
MMLU-Medical-Genetics 84.0 85.0
MMLU-Professional-Medicine 79.0 80.9
Total 65.0 69.6

Training hyperparameters

  • learning_rate: 1e-05
  • effective batch size: 128 (8 GPU × 4 per-device × 4 accum)
  • num_epochs: 3.0
  • lr_scheduler: cosine, warmup_ratio 0.1
  • precision: bf16
  • deepspeed: ZeRO stage 2
  • hardware: 8× H200, ~6.5h

Framework versions

  • Transformers 4.57.6
  • Pytorch 2.10.0+cu128
  • LlamaFactory 0.9.5
Downloads last month
4
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chens7/Qwen2.5-7B-ReasonMed-cot-123k

Base model

Qwen/Qwen2.5-7B
Finetuned
(913)
this model

Dataset used to train Chens7/Qwen2.5-7B-ReasonMed-cot-123k