How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="balastml/balastmed-9B",
	filename="balastmed-9b-q4_k_m.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

🏥 BalastMed-9B — Bilingual Local Medical Assistant (EN / TR)

A fine-tuned version of Qwen/Qwen3.5-9B designed to run fully locally as a clinical decision support assistant for doctors and healthcare professionals.

Specialized in emergency triage, ESI scoring, differential diagnosis, and medical situation management — without sending any patient data to external servers.

BalastMed-9B introduces full Turkish clinical language support, a clinically re-trained thinking pipeline via SFT, and significantly stronger benchmark performance over its predecessor.

⚠️ Disclaimer: This model is for research and clinical support purposes only. It is NOT a substitute for professional medical judgment. Final decisions always rest with licensed medical professionals.


🆕 What's New in 9B

Feature BalastMed-4B BalastMed-9B
Base Model Qwen3.5-4B Qwen3.5-9B
Turkish Clinical Support ✅ Full bilingual (EN/TR)
Thinking Pipeline Clinically re-trained via SFT Clinically re-trained via SFT (enhanced)
MedQA Score 77.6% 88.2%
Parameters ~4B ~9B

🎯 Model Overview

Property Value
Base Model Qwen/Qwen3.5-9B
Fine-tuning Method LoRA + SFT (Thinking pipeline re-training)
Task Medical Triage / Clinical Decision Support
Languages English & Turkish
License CC-BY-NC 4.0
Parameters ~9B
Quantization Q4_K_M (GGUF)

📊 Evaluation Results

Benchmark Score
MedQA (USMLE-style) 88.2%

MedQA tests clinical reasoning across USMLE-style multiple choice questions covering diagnosis, treatment, and medical knowledge. For reference: BalastMed-9B performs comparably to DeepSeek V4 Flash (no thinking) among open-weights clinical models.


🧠 Training Details

  • Method: LoRA fine-tuning + full SFT for clinical thinking pipeline re-training
  • Base Model: Qwen/Qwen3.5-9B
  • Hardware: 1× NVIDIA A100 80GB
  • Training Data: Proprietary bilingual clinical dataset (not publicly available)
  • Thinking Pipeline: The model's reasoning chain was completely re-trained via SFT to follow structured clinical logic — differentials, missing data identification, emergency flagging
  • Focus Areas:
    • ESI (Emergency Severity Index) levels 1–5
    • Symptom assessment and chief complaint classification
    • Differential diagnosis support
    • Medical situation management for clinical staff
    • Full Turkish clinical language support

💬 Recommended System Prompt

You are a clinical medical assistant. Think through clinical reasoning, consider differentials, identify what data is missing, and flag emergencies. State uncertainty when evidence is insufficient. Defer final decisions to clinicians.

The model responds in the same language as the clinician. Send queries in Turkish and it will assess, reason, and respond fully in Turkish — no separate prompt needed.


⚙️ Recommended Parameters

Parameter Value Notes
temperature 0.72 Balanced between consistency and nuanced clinical reasoning
top_p 0.94 Wide token probability coverage
top_k 60 For rare conditions and broader differential evaluation
top_k 20–40 For focused, high-confidence diagnosis
repetition_penalty 1.08 Prevents output looping without over-constraining
max_new_tokens 512–2048 Higher range recommended for thinking mode

Tip: Use top_k: 60 when exploring broad differentials or rare presentations. Use top_k: 20–40 when you need a clear, direct clinical answer. The thinking pipeline produces higher quality output when max_new_tokens is set generously (≥1024).


🚀 Quick Start

With Ollama (Recommended for local use)

ollama run hf.co/balastml/balastmed-9B:Q4_K_M

With llama.cpp

brew install llama.cpp
llama-server -hf balastml/balastmed-9B:Q4_K_M

With LM Studio

Search for balastml/balastmed-9B in LM Studio's model browser and download the Q4_K_M variant.

With Python (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "balastml/balastmed-9B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

system_prompt = "You are a clinical medical assistant. Think through clinical reasoning, consider differentials, identify what data is missing, and flag emergencies. State uncertainty when evidence is insufficient. Defer final decisions to clinicians."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "58yo male, crushing chest pain radiating to left arm, diaphoresis, BP 90/60. ESI level and immediate actions?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.72,
    top_p=0.94,
    top_k=40,
    repetition_penalty=1.08,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🩺 Example Use Cases

Emergency Triage (English):

22yo female, sudden onset severe dyspnea, SpO2 82%, stridor present.
→ ESI level and initial management?

Acil Triaj (Türkçe):

22 yaşında kadın hasta, ani başlayan ciddi nefes darlığı, SpO2 %82, stridor mevcut.
→ ESI düzeyi ve ilk müdahale adımları?

Differential Diagnosis:

45yo male, 3-week history of progressive fatigue, night sweats,
unintentional 8kg weight loss, palpable cervical lymphadenopathy.
→ Top differentials and recommended workup?

Klinik Durum Yönetimi (Türkçe):

Bağırsak rezeksiyonu sonrası 2. gün YBÜ hastası. Ani ateş 39.8°C,
KH 118, KB 88/55'e düşüyor, yükselen laktat. Mevcut antibiyotik: pip-taz.
→ Değerlendirme ve yönetim öncelikleri?

🔒 Privacy & Local Deployment

BalastMed-9B is designed for fully offline, local deployment. No patient data is sent to external servers. This makes it suitable for:

  • Hospital internal networks
  • Clinics with strict data privacy requirements
  • GDPR / KVKK / HIPAA-conscious environments (with appropriate institutional validation)
  • Turkish healthcare institutions requiring native-language clinical AI

Minimum hardware for local use: 8GB VRAM (Q4_K_M quantization) Recommended: 12GB+ VRAM for full thinking pipeline performance


⚠️ Limitations

  • Not validated for autonomous clinical deployment — requires physician oversight
  • Training dataset is proprietary and not available for public inspection
  • Performance may vary on highly specialized sub-specialties
  • Should be used only by or under supervision of licensed medical professionals

🔗 Related Models

Model MedQA Languages Notes
BalastMed-4B 77.6% EN Previous version
BalastMed-9B 88.2% EN + TR This model

📬 Contact & Feedback

For questions, collaborations, or clinical feedback, open a discussion on the Community tab.

Downloads last month
24
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for balastml/balastmed-9B

Finetuned
Qwen/Qwen3.5-9B
Quantized
(260)
this model