Instructions to use yakusokulabs/dr_qwen_v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use yakusokulabs/dr_qwen_v2 with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("yakusokulabs/dr_qwen_v2") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use yakusokulabs/dr_qwen_v2 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "yakusokulabs/dr_qwen_v2"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "yakusokulabs/dr_qwen_v2" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use yakusokulabs/dr_qwen_v2 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "yakusokulabs/dr_qwen_v2"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default yakusokulabs/dr_qwen_v2
Run Hermes
hermes
- MLX LM
How to use yakusokulabs/dr_qwen_v2 with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "yakusokulabs/dr_qwen_v2"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "yakusokulabs/dr_qwen_v2" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yakusokulabs/dr_qwen_v2", "messages": [ {"role": "user", "content": "Hello"} ] }'
yakusokulabs/dr_qwen_v2
This model yakusokulabs/dr_qwen_v2 was converted to MLX format from mlx-community/Qwen3-4B-4bit-DWQ-053125 using mlx-lm version 0.25.0.
Use with mlx
pip install mlx-lm
Yakusoku Labs — Dr Qwen v2
A 4-bit, Apple-MLX-ready medical Qwen-4B you can fine-tune and run on a single modern iPhone.
🧩 Model Summary
| Base | Qwen-3-4B |
| Precision | 4-bit NF4 (DWQ) |
| Framework | Apple MLX (mlx-lm 0.25.0) |
| Hardware used | 1 × Mac mini M4 Pro (16-core GPU) |
| Energy / time | ≈ 14 GPU-hours, ~60 W avg → 4× less power than equivalent PyTorch run |
| License | Apache 2.0 (weights & code) |
Dr Qwen v2 is purpose-tuned for clinical Q&A, triage and medication counseling while staying light enough for edge devices.
The current checkpoint is finetuned only on public medical datasets; de-identified Indian tele-medicine dialogues will be merged once legal green-lights.
🎯 Intended Use & Limitations
Intended
- Medical trivia & exam datasets (MedMCQA, USMLE-style)
- Low-risk symptom triage with human oversight
- Research baseline for Apple-silicon ML pipelines
Out of scope / MUST-NOT
- Autonomous diagnosis or prescription
- High-acuity decision support without a licensed clinician in the loop
- Any use that generates or stores personally identifiable health data (PHI)
📚 Training Data
| Corpus | Size | License |
|---|---|---|
| MedMCQA | 354 k QA | CC-BY-NC-SA-4.0 |
| MedQA-USMLE | 13 k QA | MIT |
| PubMedQA | 1 k | CC0 |
| MMLU-Medical | 1.2 k | MIT |
| ChatDoctor Dialogues | 100 k turns | Apache 2.0 |
Planned: +35 k doctor-annotated Indian tele-health Q&A (DPDP-compliant, de-identified).
⚙️ Training Procedure
- 3 epochs, batch 128, LR 6e-5, cosine decay, seed 42
- LoRA rank 64 on query/key/value/projection matrices
- Gradient checkpointing & mixed-precision NF4 quant after SFT
- Direct Preference Optimisation (DPO) on synthetic doctor ratings (3 B tokens)
📊 Evaluation
| Benchmark (zero-shot) | Base Qwen-4B | Dr Qwen v2 | Llama-3.3-8B |
|---|---|---|---|
| MedMCQA | 57.8 % | 63.5 % | 64.1 % |
| PubMedQA | 48.6 % | 55.2 % | 56.0 % |
Clinician panel (500 simulated consultations via Yakusoku’s multi-agent sandbox)
94 % answers tagged “clinically acceptable” – 3 pp shy of human baseline, +9–12 pp over baselines.
🛡️ Safety & Responsible AI
- All datasets are public or de-identified; no raw PHI ingested.
ClinGuard-Literule-based filter blocks guideline-violating outputs (e.g., antibiotic over-prescription).- Upcoming blinded trials with IRB oversight (Q3 2025).
- Please keep a licensed clinician in the loop.
🚀 Quick Start (Apple MLX)
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("yakusokulabs/dr_qwen_v2")
prompt = "Patient: I have a mild cough and low-grade fever.\nDoctor:"
if tokenizer.chat_template:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
print(generate(model, tokenizer, prompt=prompt, verbose=True))
- Downloads last month
- 4
4-bit
Model tree for yakusokulabs/dr_qwen_v2
Base model
Qwen/Qwen3-4B-Base