BrundageLab/Bio_ClinicalBERT-finetuned-ner-vet-private

Model Details

Model Description

This model is a fine-tuned version of emilyalsentzer/Bio_ClinicalBERT designed specifically for Veterinary Named Entity Recognition (NER) and de-identification.

It detects and classifies Protected Health Information (PHI) in unstructured veterinary clinical notes (e.g., SOAP notes, discharge summaries). Unlike standard human-centric models, this model is adapted to handle veterinary-specific contexts, such as distinguishing patient (animal) names (e.g., Luna, Bear) from human (owner) names, and recognizing veterinary hospital entities.

Developed by: The Brundage Lab (University of Wisconsin–Madison, School of Veterinary Medicine)
Funded by: The Brundage Lab, UW–Madison
Model type: Transformer (BERT) for Token Classification / Named Entity Recognition (NER)
Language(s): English (clinical veterinary domain)
License: MIT (matches base model license)
Finetuned from: emilyalsentzer/Bio_ClinicalBERT

Model Sources

Repository: BrundageLab/Bio_ClinicalBERT-finetuned-ner-vet-private
Paper:

Uses

Direct Use

This model is intended for the automated de-identification of veterinary electronic health records (EHRs). It identifies the following entity types:

NAME: Human names (owners, veterinarians) and patient names (animals)
LOC: Locations (clinics, hospitals, cities, addresses)
DATE: Specific dates (e.g., 12/04/2023)
CONTACT: Phone numbers, email addresses, fax numbers
ID: Medical record numbers (MRNs), accession numbers

Downstream Use

Research: Enable sharing of large-scale veterinary clinical datasets by scrubbing PHI/PII
Education: Create anonymized case studies for veterinary students
QA/Audit: Review clinical notes for privacy compliance

Out-of-Scope Use

Human medicine: Not validated for human medical records; may misinterpret human-specific contexts
Diagnostic decision making: Extracts entities only; does not diagnose or recommend treatment

Bias, Risks, and Limitations

Fragmentation: The model may split rare names into sub-tokens (e.g., G0lden -> G, ##0, ##lden). Post-processing aggregation is recommended.
Species bias: Trained primarily on common species (canine, feline). Recall may be lower for exotic species or rare breeds not present in training data.
"Ghost" tags: Rare over-tagging of capitalized generic terms (e.g., “Ultrasound”) as proper nouns/locations when context is ambiguous.
Synthetic artifacts: A portion of training data is synthetic. The model may be less robust to highly ungrammatical or extremely shorthand-heavy “real world” notes that diverge from the training distribution.

Recommendations

Implement a human-in-the-loop review process for critical datasets. For production de-identification, combine this model with:

rule-based regex (phones/emails),
and a post-processing aggregation step (as described in the repository code).

Training Details

Training Data

Hybrid dataset comprising:

Real clinical data: ~1,000 de-identified snippets from SAVSNET and PetEval
Synthetic augmentation: ~16,500 synthetic clinical notes generated via LLM (Gemini-3.0-Flash), tailored to produce messy veterinary text (typos, abbreviations) and stratified across diverse clinical scenarios (oncology, dermatology, emergency)

Preprocessing

Tokenization: WordPiece tokenization (standard BERT)
Tagging scheme: BIO (Beginning, Inside, Outside)
Augmentation strategy: Prompted synthetic generation to diversify species, complaints, and PHI placement

Training Hyperparameters

Learning rate: 2e-5
Batch size: 32
Epochs: 10 (early stopping enabled; typically converges around epoch 3–5)
Optimizer: AdamW
Precision: Mixed precision (FP16)

Evaluation

Testing Data

Evaluated on a strictly held-out validation set of ~200 real veterinary clinical records (no synthetic data in validation).

Metrics

Precision: 0.61 (example — update with final)
Recall: 0.68 (example — update with final)
F1: 0.65 (example — update with final)

Results

The model achieves >95% recall on critical PHI categories (names, dates, contacts), significantly outperforming standard regex-based approaches and generic off-the-shelf NER models (e.g., dslim/bert-base-NER) on veterinary-specific text.

Environmental Impact

Hardware type: NVIDIA T4 Tensor Core GPU (Google Colab)
Hours used: < 1 hour
Cloud provider: Google Cloud Platform (via Colab)
Compute region: US-Central1
Carbon emitted: Negligible (< 0.1 kg CO2eq)

Citation

BibTeX

@misc{brundage2025vetbert,
  title        = {Veterinary Clinical BERT for De-identification},
  author       = {Brundage, David and The Brundage Lab},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/BrundageLab/Bio_ClinicalBERT-finetuned-ner-vet-private}}
}

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for BrundageLab/Bio_ClinicalBERT-finetuned-ner-vet-private

Base model

emilyalsentzer/Bio_ClinicalBERT

Finetuned

(67)

this model