SwarmMedQA-7B-v1

Medical QA model fine-tuned on clinical Chain-of-Thought reasoning data.

Built by Swarm & Bee (S&B) on the SwarmOS platform.

Model Details

Property Value
Base model Qwen/Qwen2.5-7B-Instruct
Method LoRA (r=64, alpha=128, all attn+MLP)
Training data SwarmOS/SwarmMedQA v0.1.0 (124 gold examples)
Epochs 4
Training time 2 min 22 sec (2x RTX 3090 Ti)
Final loss 0.66
Token accuracy 82.4%
Trainable params 161M / 7.8B (2.1%)

Benchmark Results

Benchmark Base Qwen2.5-7B SwarmMedQA-7B-v1 Delta
MedQA (USMLE 4-option) 62.0% 66.0% +4.0%
PubMedQA (abstract grounding) 0.0%* 53.0% +53.0%
Internal benchmark (hard/expert) 88.9% 77.8% -11.1%**

* Base model gives verbose answers that fail the yes/no/maybe parser. The fine-tuned model learned format adherence.

** Small sample (9 examples). Internal dip is mild overfitting on 124 training examples over 4 epochs — expected to resolve with more data.

Training Configuration

# LoRA
lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]

# Training
learning_rate: 2e-5
lr_scheduler: cosine
warmup_ratio: 0.1
weight_decay: 0.01
epochs: 4
batch_size: 2
gradient_accumulation: 8
effective_batch_size: 16
fp16: true
gradient_checkpointing: true

Data Pipeline

Every training example passes through a 3-stage automated quality gate:

  1. Verification - Fact-checked against medical literature (factuality score 1-10)
  2. Scoring - Evaluated for clinical relevance, reasoning depth, educational value
  3. Safety Check - Screened for patient harm potential

Gold criteria: factuality >= 9 AND reasoning_depth >= 8 AND not rejected AND risk != critical

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "SwarmOS/SwarmMedQA-7B-v1",
    dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("SwarmOS/SwarmMedQA-7B-v1")

prompt = """### Instruction:
You are a board-certified physician. Think step by step and explain your clinical reasoning.

### Input:
A 65-year-old male presents with sudden onset crushing chest pain radiating to the left arm, diaphoresis, and shortness of breath. ECG shows ST-elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Specialties Covered

Cardiology, Clinical Reasoning, Emergency Medicine, Endocrinology, General Surgery, Gynecology, Neurology, Obstetrics, Oncology, Pediatrics, Pharmacology, Psychiatry

Limitations

  • Trained on 124 gold examples — early-stage model, not for clinical use
  • English only
  • Mild overfitting on hardest examples (4 epochs on small dataset)
  • PubMedQA improvement is partially format adherence, not just knowledge gain
  • Not a substitute for professional medical advice

Citation

@model{swarmos_swarmmedqa_7b_v1,
  title={SwarmMedQA-7B-v1: Clinical-Grade Medical QA with Chain-of-Thought},
  author={Swarm and Bee},
  year={2026},
  base_model={Qwen/Qwen2.5-7B-Instruct},
  dataset={SwarmOS/SwarmMedQA},
  url={https://huggingface.co/SwarmOS/SwarmMedQA-7B-v1}
}

License

Apache 2.0


Built with the Dark Box Engine. We compute intelligence.

Downloads last month
13
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SwarmOS/SwarmMedQA-7B-v1

Base model

Qwen/Qwen2.5-7B
Adapter
(948)
this model
Adapters
2 models

Dataset used to train SwarmOS/SwarmMedQA-7B-v1

Evaluation results