SmolLM2-1.7B-ClinicalNER

Model Description

QLoRA fine-tuned version of HuggingFaceTB/SmolLM2-1.7B-Instruct for clinical named entity recognition (NER). Created as part of Chapter 7 of Rearchitecting LLMs.

  • Book: Rearchitecting LLMs
  • Technique: QLoRA (Quantized Low-Rank Adaptation)
  • Task: Clinical NER — structured JSON extraction from clinical notes
  • Chapter: Chapter 7 — Specialization Tuning

Rearchitecting LLMs


What This Model Does

Given a free-text clinical note, the model extracts structured clinical entities into a strict JSON schema using only the two-word prompt Extract:.

Before fine-tuning: a 15-line system prompt was required. After QLoRA training: the model responds correctly to Extract: alone.


Schema Compliance Results

Results from CH07_NB02_L4_QLoRA_QDoRA on the oopere/clinical-ner-qdora test set (40 samples, 5 categories).

Model Prompt Schema Compliance
SmolLM2-1.7B baseline Strict (15-line prompt) 87.5%
SmolLM2-1.7B baseline Minimal (Extract:) 0.0%
SmolLM2-1.7B QLoRA (this model) Minimal (Extract:) 95.0%

Fine-tuning permanently absorbed the 15-line prompt into the model weights.


Training Details

Dataset

  • Source: oopere/clinical-ner-qdora
  • Train samples: 200 (40 per category)
  • Test samples: 40 (8 per category)
  • Categories: clean, abbreviations, implicit, typos, irrelevant

QLoRA Hyperparameters

Parameter Value
Base model HuggingFaceTB/SmolLM2-1.7B-Instruct
Quantization NF4 4-bit + double quantization
LoRA rank (r) 8
LoRA alpha 16
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs 3
Batch size 8
Learning rate 2e-4
LR scheduler cosine
Max sequence length 512
Compute dtype bfloat16

Hardware

  • GPU: NVIDIA L4 (Google Colab)

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = 'oopere/SmolLM2-1.7B-ClinicalNER'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.eval()

note = 'Patient 45yo male, fever and dry cough for 3 days. Temp 38.5C, HR 98, BP 120/80.'

messages = [{'role': 'system', 'content': 'Extract:'}, {'role': 'user', 'content': note}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors='pt').to(model.device)

with torch.inference_mode():
    output_ids = model.generate(**inputs, max_new_tokens=256, do_sample=False)

new_tokens = output_ids[0][len(inputs.input_ids[0]):]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))

Limitations & Intended Use

Educational model from Rearchitecting LLMs Chapter 7. Demonstrates QLoRA fine-tuning and the workflow: Train → Merge → Upload → Verify.

Not intended for clinical or production use. Training data is synthetic.


Citation

@book{martra2026rearchitecting,
  author    = {Pere Martra},
  title     = {Rearchitecting LLMs: Structural techniques for efficient models},
  publisher = {Manning Publications},
  year      = {2026},
  url       = {https://hubs.la/Q040tvtp0}
}

Acknowledgments

Created following Rearchitecting LLMs (Manning, 2026). Challenge: can you push schema compliance above 95%? Try: higher LoRA rank, more epochs, or QDoRA instead. Share your results: discussion forum


Downloads last month
179
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oopere/SmolLM2-1.7B-ClinicalNER

Finetuned
(148)
this model

Dataset used to train oopere/SmolLM2-1.7B-ClinicalNER

Collection including oopere/SmolLM2-1.7B-ClinicalNER