🏥 BalastMed-4B — Local Medical Assistant for Clinicians

A fine-tuned version of Qwen/Qwen3.5-4B designed to run fully locally as a clinical decision support assistant for doctors and healthcare professionals.

Specialized in emergency triage, ESI scoring, differential diagnosis, and medical situation management — without sending any patient data to external servers.

⚠️ Disclaimer: This model is for research and clinical support purposes only. It is NOT a substitute for professional medical judgment. Final decisions always rest with licensed medical professionals.


🎯 Model Overview

Property Value
Base Model Qwen/Qwen3.5-4B
Fine-tuning Method LoRA + SFT (Thinking pipeline re-training)
Task Medical Triage / Clinical Decision Support
Language English
License CC-BY-NC 4.0
Parameters ~4B
Quantization Q4_K_M (GGUF) — 2.78 GB

📊 Evaluation Results

Benchmark Score
MedQA (USMLE-style) 77.6%

MedQA tests clinical reasoning across USMLE-style multiple choice questions covering diagnosis, treatment, and medical knowledge.


🧠 Training Details

  • Method: LoRA fine-tuning + full SFT for clinical thinking pipeline re-training
  • Base Model: Qwen/Qwen3.5-4B
  • Hardware: 1× NVIDIA A100 40GB
  • Training Data: Proprietary clinical dataset (not publicly available)
  • Thinking Pipeline: The model's reasoning chain was completely re-trained via SFT to follow structured clinical logic — differentials, missing data identification, emergency flagging
  • Focus Areas:
    • ESI (Emergency Severity Index) levels 1–5
    • Symptom assessment and chief complaint classification
    • Differential diagnosis support
    • Medical situation management for clinical staff

💬 Recommended System Prompt

You are a clinical medical assistant. Think through clinical reasoning, consider differentials, identify what data is missing, and flag emergencies. State uncertainty when evidence is insufficient. Defer final decisions to clinicians.

⚙️ Recommended Parameters

Parameter Value Notes
temperature 0.72 Balanced between consistency and nuanced clinical reasoning
top_p 0.94 Wide token probability coverage
top_k 60 For rare conditions and broader differential evaluation
top_k 20–40 For focused, high-confidence diagnosis
repetition_penalty 1.08 Prevents output looping without over-constraining
max_new_tokens 512–1024 Higher range recommended for thinking mode

Tip: Use top_k: 60 when exploring broad differentials or rare presentations. Use top_k: 20–40 when you need a clear, direct clinical answer. The thinking pipeline produces higher quality output when max_new_tokens is set generously (≥1024).


🚀 Quick Start

With Ollama (Recommended for local use)

ollama run hf.co/balastml/balastmed-4B:Q4_K_M

With llama.cpp

brew install llama.cpp
llama-server -hf balastml/balastmed-4B:Q4_K_M

With LM Studio

Search for balastml/balastmed-4B in LM Studio's model browser and download the Q4_K_M variant.

With Python (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "balastml/balastmed-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

system_prompt = "You are a clinical medical assistant. Think through clinical reasoning, consider differentials, identify what data is missing, and flag emergencies. State uncertainty when evidence is insufficient. Defer final decisions to clinicians."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "58yo male, crushing chest pain radiating to left arm, diaphoresis, BP 90/60. ESI level and immediate actions?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.72,
    top_p=0.94,
    top_k=40,
    repetition_penalty=1.08,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🩺 Example Use Cases

Emergency Triage:

22yo female, sudden onset severe dyspnea, SpO2 82%, stridor present.
→ ESI level and initial management?

Differential Diagnosis:

45yo male, 3-week history of progressive fatigue, night sweats,
unintentional 8kg weight loss, palpable cervical lymphadenopathy.
→ Top differentials and recommended workup?

Medical Situation Management:

ICU patient, post-op day 2 after bowel resection. Sudden fever 39.8°C,
HR 118, BP dropping to 88/55, rising lactate. Current antibiotics: piperacillin-tazobactam.
→ Assessment and management priorities?

🔒 Privacy & Local Deployment

BalastMed-4B is designed for fully offline, local deployment. No patient data is sent to external servers. This makes it suitable for:

  • Hospital internal networks
  • Clinics with strict data privacy requirements
  • GDPR / HIPAA-conscious environments (with appropriate institutional validation)

Minimum hardware for local use: 8GB RAM (Q4_K_M quantization, ~2.78 GB)


⚠️ Limitations

  • Not validated for autonomous clinical deployment — requires physician oversight
  • Trained primarily on English-language clinical data
  • Training dataset is proprietary and not available for public inspection
  • Performance may vary on highly specialized sub-specialties
  • Should be used only by or under supervision of licensed medical professionals

🔗 Related Models

Model MedQA Languages Notes
BalastMed-4B 77.6% EN This model
BalastMed-9B 88.2% EN + TR Larger, bilingual

📬 Contact & Feedback

For questions, collaborations, or clinical feedback, open a discussion on the Community tab.

Downloads last month
40
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for balastml/balastmed-4B

Finetuned
Qwen/Qwen3.5-4B
Quantized
(217)
this model