Token Classification
Transformers
Safetensors
English
bert
named-entity-recognition
ner
tinybert
healthcare
medical
Instructions to use aadiausa/Bert_NER_Ausa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aadiausa/Bert_NER_Ausa with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="aadiausa/Bert_NER_Ausa")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("aadiausa/Bert_NER_Ausa") model = AutoModelForTokenClassification.from_pretrained("aadiausa/Bert_NER_Ausa") - Notebooks
- Google Colab
- Kaggle
Bert_NER_Ausa
A lightweight token-classification (NER) model that extracts structured entities from short health-assistant utterances โ appointments, symptoms, allergies, routines/medication, and profile details. It is the entity-extraction stage of the AUSA Hub voice/text assistant, paired with the SetFit intent router aadiausa/Set_Fit_Ausa.
Model details
- Architecture:
BertForTokenClassification(TinyBERT โ 4 hidden layers, hidden size 312, 12 attention heads, ~14.5M parameters) - Tokenizer: WordPiece, uncased (
bert-base-uncasedvocab, 30522 tokens) - Max sequence length: 512 tokens
- Tagging scheme: BIO โ 55 labels =
O+ B-/I- for 27 entity types
Entity types (27)
| Domain | Entities |
|---|---|
| Symptoms & allergies | ALLERGY, SYMPTOM, SEVERITY, ONSET, DOSAGE |
| Scheduling | PROVIDER, DATE, START_TIME, END_TIME, ROUTINE, FREQUENCY, SCHEDULED_TIME, START_DATE, END_DATE, INTERVAL, DURATION, DAY_OF_WEEK |
| Profile / contacts | FULL_NAME, EMAIL, PHONE, ADDRESS, GENDER, HEIGHT, WEIGHT, RELATION, INVITE_METHOD, PERMISSION |
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_id = "aadiausa/Bert_NER_Ausa"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
ner = pipeline("token-classification", model=model, tokenizer=tok,
aggregation_strategy="simple")
ner("book an appointment with Dr. Patel next Monday at 3pm")
# -> [{'entity_group': 'PROVIDER', 'word': 'dr. patel', ...},
# {'entity_group': 'DATE', 'word': 'next monday', ...},
# {'entity_group': 'START_TIME', 'word': '3pm', ...}]
Intended use & limitations
- Designed for short, first-person health-assistant commands in English. Performance on long-form clinical notes or other domains is not guaranteed.
- The model detects spans that may correspond to personal data (
FULL_NAME,EMAIL,PHONE,ADDRESS); it does not validate, store, or de-identify them โ downstream handling is the integrator's responsibility. - Not a medical device and not intended for diagnosis or treatment decisions.
- Downloads last month
- 6