BrundageLab/Bio_ClinicalBERT-finetuned-ner-vet-private

Model Details

Model Description

This model is a fine-tuned version of emilyalsentzer/Bio_ClinicalBERT designed specifically for Veterinary Named Entity Recognition (NER) and de-identification.

It detects and classifies Protected Health Information (PHI) in unstructured veterinary clinical notes (e.g., SOAP notes, discharge summaries). Unlike standard human-centric models, this model is adapted to handle veterinary-specific contexts, such as distinguishing patient (animal) names (e.g., Luna, Bear) from human (owner) names, and recognizing veterinary hospital entities.

  • Developed by: The Brundage Lab (University of Wisconsin–Madison, School of Veterinary Medicine)
  • Funded by: The Brundage Lab, UW–Madison
  • Model type: Transformer (BERT) for Token Classification / Named Entity Recognition (NER)
  • Language(s): English (clinical veterinary domain)
  • License: MIT (matches base model license)
  • Finetuned from: emilyalsentzer/Bio_ClinicalBERT

Model Sources

  • Repository: BrundageLab/Bio_ClinicalBERT-finetuned-ner-vet-private
  • Paper:

Uses

Direct Use

This model is intended for the automated de-identification of veterinary electronic health records (EHRs). It identifies the following entity types:

  • NAME: Human names (owners, veterinarians) and patient names (animals)
  • LOC: Locations (clinics, hospitals, cities, addresses)
  • DATE: Specific dates (e.g., 12/04/2023)
  • CONTACT: Phone numbers, email addresses, fax numbers
  • ID: Medical record numbers (MRNs), accession numbers

Downstream Use

  • Research: Enable sharing of large-scale veterinary clinical datasets by scrubbing PHI/PII
  • Education: Create anonymized case studies for veterinary students
  • QA/Audit: Review clinical notes for privacy compliance

Out-of-Scope Use

  • Human medicine: Not validated for human medical records; may misinterpret human-specific contexts
  • Diagnostic decision making: Extracts entities only; does not diagnose or recommend treatment

Bias, Risks, and Limitations

  • Fragmentation: The model may split rare names into sub-tokens (e.g., G0lden -> G, ##0, ##lden). Post-processing aggregation is recommended.
  • Species bias: Trained primarily on common species (canine, feline). Recall may be lower for exotic species or rare breeds not present in training data.
  • "Ghost" tags: Rare over-tagging of capitalized generic terms (e.g., “Ultrasound”) as proper nouns/locations when context is ambiguous.
  • Synthetic artifacts: A portion of training data is synthetic. The model may be less robust to highly ungrammatical or extremely shorthand-heavy “real world” notes that diverge from the training distribution.

Recommendations

Implement a human-in-the-loop review process for critical datasets. For production de-identification, combine this model with:

  • rule-based regex (phones/emails),
  • and a post-processing aggregation step (as described in the repository code).

Training Details

Training Data

Hybrid dataset comprising:

  • Real clinical data: ~1,000 de-identified snippets from SAVSNET and PetEval
  • Synthetic augmentation: ~16,500 synthetic clinical notes generated via LLM (Gemini-3.0-Flash), tailored to produce messy veterinary text (typos, abbreviations) and stratified across diverse clinical scenarios (oncology, dermatology, emergency)

Preprocessing

  • Tokenization: WordPiece tokenization (standard BERT)
  • Tagging scheme: BIO (Beginning, Inside, Outside)
  • Augmentation strategy: Prompted synthetic generation to diversify species, complaints, and PHI placement

Training Hyperparameters

  • Learning rate: 2e-5
  • Batch size: 32
  • Epochs: 10 (early stopping enabled; typically converges around epoch 3–5)
  • Optimizer: AdamW
  • Precision: Mixed precision (FP16)

Evaluation

Testing Data

Evaluated on a strictly held-out validation set of ~200 real veterinary clinical records (no synthetic data in validation).

Metrics

  • Precision: 0.61 (example — update with final)
  • Recall: 0.68 (example — update with final)
  • F1: 0.65 (example — update with final)

Results

The model achieves >95% recall on critical PHI categories (names, dates, contacts), significantly outperforming standard regex-based approaches and generic off-the-shelf NER models (e.g., dslim/bert-base-NER) on veterinary-specific text.

Environmental Impact

  • Hardware type: NVIDIA T4 Tensor Core GPU (Google Colab)
  • Hours used: < 1 hour
  • Cloud provider: Google Cloud Platform (via Colab)
  • Compute region: US-Central1
  • Carbon emitted: Negligible (< 0.1 kg CO2eq)

Citation

BibTeX

@misc{brundage2025vetbert,
  title        = {Veterinary Clinical BERT for De-identification},
  author       = {Brundage, David and The Brundage Lab},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/BrundageLab/Bio_ClinicalBERT-finetuned-ner-vet-private}}
}
Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BrundageLab/Bio_ClinicalBERT-finetuned-ner-vet-private

Finetuned
(49)
this model