SwarmMedQA-7B-v1

Medical QA model fine-tuned on clinical Chain-of-Thought reasoning data.

Built by Swarm & Bee (S&B) on the SwarmOS platform.

Model Details

Property	Value
Base model	Qwen/Qwen2.5-7B-Instruct
Method	LoRA (r=64, alpha=128, all attn+MLP)
Training data	SwarmOS/SwarmMedQA v0.1.0 (124 gold examples)
Epochs	4
Training time	2 min 22 sec (2x RTX 3090 Ti)
Final loss	0.66
Token accuracy	82.4%
Trainable params	161M / 7.8B (2.1%)

Benchmark Results

Benchmark	Base Qwen2.5-7B	SwarmMedQA-7B-v1	Delta
MedQA (USMLE 4-option)	62.0%	66.0%	+4.0%
PubMedQA (abstract grounding)	0.0%*	53.0%	+53.0%
Internal benchmark (hard/expert)	88.9%	77.8%	-11.1%**

* Base model gives verbose answers that fail the yes/no/maybe parser. The fine-tuned model learned format adherence.

** Small sample (9 examples). Internal dip is mild overfitting on 124 training examples over 4 epochs — expected to resolve with more data.

Training Configuration

# LoRA
lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]

# Training
learning_rate: 2e-5
lr_scheduler: cosine
warmup_ratio: 0.1
weight_decay: 0.01
epochs: 4
batch_size: 2
gradient_accumulation: 8
effective_batch_size: 16
fp16: true
gradient_checkpointing: true

Data Pipeline

Every training example passes through a 3-stage automated quality gate:

Verification - Fact-checked against medical literature (factuality score 1-10)
Scoring - Evaluated for clinical relevance, reasoning depth, educational value
Safety Check - Screened for patient harm potential

Gold criteria: factuality >= 9 AND reasoning_depth >= 8 AND not rejected AND risk != critical

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "SwarmOS/SwarmMedQA-7B-v1",
    dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("SwarmOS/SwarmMedQA-7B-v1")

prompt = """### Instruction:
You are a board-certified physician. Think step by step and explain your clinical reasoning.

### Input:
A 65-year-old male presents with sudden onset crushing chest pain radiating to the left arm, diaphoresis, and shortness of breath. ECG shows ST-elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Specialties Covered

Cardiology, Clinical Reasoning, Emergency Medicine, Endocrinology, General Surgery, Gynecology, Neurology, Obstetrics, Oncology, Pediatrics, Pharmacology, Psychiatry

Limitations

Trained on 124 gold examples — early-stage model, not for clinical use
English only
Mild overfitting on hardest examples (4 epochs on small dataset)
PubMedQA improvement is partially format adherence, not just knowledge gain
Not a substitute for professional medical advice

Citation

@model{swarmos_swarmmedqa_7b_v1,
  title={SwarmMedQA-7B-v1: Clinical-Grade Medical QA with Chain-of-Thought},
  author={Swarm and Bee},
  year={2026},
  base_model={Qwen/Qwen2.5-7B-Instruct},
  dataset={SwarmOS/SwarmMedQA},
  url={https://huggingface.co/SwarmOS/SwarmMedQA-7B-v1}
}

License

Apache 2.0

Built with the Dark Box Engine. We compute intelligence.

Downloads last month: 13

Safetensors

Model size

8B params

Tensor type

F16

Model tree for SwarmOS/SwarmMedQA-7B-v1

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(948)

this model

Adapters

2 models

Dataset used to train SwarmOS/SwarmMedQA-7B-v1

Evaluation results

Accuracy on MedQA (USMLE)
self-reported

66.000
Accuracy on PubMedQA
self-reported

53.000