A newer version of this model is available: Saminx22/MedSLM-SFT-LoRA

MedSLM-SFT — Instruction-Tuned Medical Language Model

⚠️ Research Only — Not for Clinical Use
This model is intended for research and educational purposes only.
It must not be used for medical diagnosis, treatment recommendations, or any clinical decision-making.

Model Summary

MedSLM-SFT is a 330M-parameter medical language model fine-tuned for instruction following and question answering.
It was created by applying Supervised Fine-Tuning (SFT) with QLoRA (4-bit quantized LoRA) to the pre-trained base model Saminx22/MedSLM using the dataset Saminx22/medical_data_for_slm_SFT.

This repository contains the merged model (LoRA adapters baked into the base model at full fp16 precision). It can be used like any standard Hugging Face causal LM without requiring PEFT at inference time.
For the standalone LoRA adapter weights, see Saminx22/MedSLM-SFT-LoRA.

Property Value
Base model Saminx22/MedSLM
Architecture LLaMA-style (RMSNorm, RoPE, SwiGLU, GQA)
Parameters ~330M
Context length 1,024 tokens
Vocabulary 50,257 (GPT-2 tokenizer)
Fine-tuning method QLoRA (4-bit base + LoRA adapters)
LoRA rank / alpha 16 / 32
Trainable parameters ~7.1M (3.59% of total)
Training data 46,166 medical QA pairs
Training framework Unsloth + TRL SFTTrainer
Hardware Tesla T4 (15.6 GB VRAM)

Architecture

MedSLM-SFT uses a modern LLaMA-style architecture with:

  • RMSNorm for stable training
  • Rotary Positional Embeddings (RoPE)
  • SwiGLU activation in the feed-forward network
  • Grouped-Query Attention (GQA) — 16 query heads / 8 key-value heads

The base model was pre-trained from scratch on ~148M tokens of medical text (PubMed abstracts, PMC full texts, and clinical guidelines).

Training Details

Dataset

  • Repository: Saminx22/medical_data_for_slm_SFT
  • Splits: 46,166 train / 2,565 validation / 2,565 test
  • Sources: WikiDoc, medical Q&A corpora
  • Average length: ~180 tokens per example

Training prompt template:

### System:
You are a medical AI assistant. Provide accurate, evidence-based answers to medical questions.
### User:
{question}
### Assistant:
{answer}

SFT Hyperparameters

Hyperparameter Value
Learning rate 2e-4
LR scheduler Cosine decay
Warmup ratio 5%
Batch size (per device) 4
Gradient accumulation steps 8
Effective batch size 32
Epochs 3
Weight decay 0.01
Max gradient norm 1.0
Optimizer AdamW (8-bit)
Sequence packing Enabled
Max sequence length 1,024 tokens
Precision fp16

LoRA Configuration

Parameter Value
Rank (r) 16
Alpha (α) 32
Effective scaling 2.0
Dropout 0.0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Bias none

How to Use

Installation

pip install transformers torch accelerate bitsandbytes

Quick Start Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

MODEL_ID = "Saminx22/MedSLM-SFT"

# 4-bit quantization for lower VRAM (optional)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,   # Remove this line for full fp16
    device_map="auto",
)
model.eval()

SYSTEM_PROMPT = "You are a medical AI assistant. Provide accurate, evidence-based answers to medical questions."

def ask(question: str, max_new_tokens: int = 300) -> str:
    prompt = f"### System:\n{SYSTEM_PROMPT}\n\n### User:\n{question}\n\n### Assistant:\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.inference_mode():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            top_k=50,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.eos_token_id,
        )
    
    response = output_ids[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(response, skip_special_tokens=True).strip()

# Example
print(ask("What are the warning signs of a stroke?"))

Recommended Prompt Template

Always use this exact format for best performance:

### System:
You are a medical AI assistant. Provide accurate, evidence-based answers to medical questions.
### User:
<your question here>
### Assistant:

Limitations and Risks

  • Research only — Not for clinical use or patient care.
  • Small model size (~330M parameters) → more prone to hallucinations than larger models.
  • No RLHF or safety alignment.
  • Trained only for single-turn QA.
  • Context length limited to 1,024 tokens.

Citation

@misc{medslm-sft-2025,
  title = {MedSLM-SFT: Instruction-Tuned Medical Small Language Model},
  author = {Saminx22},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Saminx22/MedSLM-SFT}
}

Related Repositories

Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Saminx22/MedSLM-SFT