yakusokulabs/dr_qwen_v2

This model yakusokulabs/dr_qwen_v2 was converted to MLX format from mlx-community/Qwen3-4B-4bit-DWQ-053125 using mlx-lm version 0.25.0.

Use with mlx

pip install mlx-lm

Yakusoku Labs — Dr Qwen v2

A 4-bit, Apple-MLX-ready medical Qwen-4B you can fine-tune and run on a single modern iPhone.

🧩 Model Summary


Base	Qwen-3-4B
Precision	4-bit NF4 (DWQ)
Framework	Apple MLX (`mlx-lm 0.25.0`)
Hardware used	1 × Mac mini M4 Pro (16-core GPU)
Energy / time	≈ 14 GPU-hours, ~60 W avg → 4× less power than equivalent PyTorch run
License	Apache 2.0 (weights & code)

Dr Qwen v2 is purpose-tuned for clinical Q&A, triage and medication counseling while staying light enough for edge devices.
The current checkpoint is finetuned only on public medical datasets; de-identified Indian tele-medicine dialogues will be merged once legal green-lights.

🎯 Intended Use & Limitations

Intended

Medical trivia & exam datasets (MedMCQA, USMLE-style)
Low-risk symptom triage with human oversight
Research baseline for Apple-silicon ML pipelines

Out of scope / MUST-NOT

Autonomous diagnosis or prescription
High-acuity decision support without a licensed clinician in the loop
Any use that generates or stores personally identifiable health data (PHI)

📚 Training Data

Corpus	Size	License
MedMCQA	354 k QA	CC-BY-NC-SA-4.0
MedQA-USMLE	13 k QA	MIT
PubMedQA	1 k	CC0
MMLU-Medical	1.2 k	MIT
ChatDoctor Dialogues	100 k turns	Apache 2.0

Planned: +35 k doctor-annotated Indian tele-health Q&A (DPDP-compliant, de-identified).

⚙️ Training Procedure

3 epochs, batch 128, LR 6e-5, cosine decay, seed 42
LoRA rank 64 on query/key/value/projection matrices
Gradient checkpointing & mixed-precision NF4 quant after SFT
Direct Preference Optimisation (DPO) on synthetic doctor ratings (3 B tokens)

📊 Evaluation

Benchmark (zero-shot)	Base Qwen-4B	Dr Qwen v2	Llama-3.3-8B
MedMCQA	57.8 %	63.5 %	64.1 %
PubMedQA	48.6 %	55.2 %	56.0 %

Clinician panel (500 simulated consultations via Yakusoku’s multi-agent sandbox)
94 % answers tagged “clinically acceptable” – 3 pp shy of human baseline, +9–12 pp over baselines.

🛡️ Safety & Responsible AI

All datasets are public or de-identified; no raw PHI ingested.
ClinGuard-Lite rule-based filter blocks guideline-violating outputs (e.g., antibiotic over-prescription).
Upcoming blinded trials with IRB oversight (Q3 2025).
Please keep a licensed clinician in the loop.

🚀 Quick Start (Apple MLX)

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("yakusokulabs/dr_qwen_v2")

prompt = "Patient: I have a mild cough and low-grade fever.\nDoctor:"
if tokenizer.chat_template:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

print(generate(model, tokenizer, prompt=prompt, verbose=True))