Bert_NER_Ausa

A lightweight token-classification (NER) model that extracts structured entities from short health-assistant utterances — appointments, symptoms, allergies, routines/medication, and profile details. It is the entity-extraction stage of the AUSA Hub voice/text assistant, paired with the SetFit intent router aadiausa/Set_Fit_Ausa.

Model details

Architecture: BertForTokenClassification (TinyBERT — 4 hidden layers, hidden size 312, 12 attention heads, ~14.5M parameters)
Tokenizer: WordPiece, uncased (bert-base-uncased vocab, 30522 tokens)
Max sequence length: 512 tokens
Tagging scheme: BIO — 55 labels = O + B-/I- for 27 entity types

Entity types (27)

Domain	Entities
Symptoms & allergies	`ALLERGY`, `SYMPTOM`, `SEVERITY`, `ONSET`, `DOSAGE`
Scheduling	`PROVIDER`, `DATE`, `START_TIME`, `END_TIME`, `ROUTINE`, `FREQUENCY`, `SCHEDULED_TIME`, `START_DATE`, `END_DATE`, `INTERVAL`, `DURATION`, `DAY_OF_WEEK`
Profile / contacts	`FULL_NAME`, `EMAIL`, `PHONE`, `ADDRESS`, `GENDER`, `HEIGHT`, `WEIGHT`, `RELATION`, `INVITE_METHOD`, `PERMISSION`

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "aadiausa/Bert_NER_Ausa"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)

ner = pipeline("token-classification", model=model, tokenizer=tok,
               aggregation_strategy="simple")

ner("book an appointment with Dr. Patel next Monday at 3pm")
# -> [{'entity_group': 'PROVIDER', 'word': 'dr. patel', ...},
#     {'entity_group': 'DATE', 'word': 'next monday', ...},
#     {'entity_group': 'START_TIME', 'word': '3pm', ...}]

Intended use & limitations

Designed for short, first-person health-assistant commands in English. Performance on long-form clinical notes or other domains is not guaranteed.
The model detects spans that may correspond to personal data (FULL_NAME, EMAIL, PHONE, ADDRESS); it does not validate, store, or de-identify them — downstream handling is the integrator's responsibility.
Not a medical device and not intended for diagnosis or treatment decisions.

Downloads last month: 6

Safetensors

Model size

14.3M params

Tensor type

F32