--- library_name: transformers license: mit --- # distilbert-medication-ner This model is a fine-tuned version of [distilbert-base-cased](https://huggingface.co/distilbert/distilbert-base-cased) on synthetically generated medication data by [Synthea](https://github.com/synthetichealth/synthea). More details on how this model was trained can be found on [GitHub](https://github.com/JackLeeJM/slm-medication-ner). ## Model Description A fine-tuned NER model developed to handle 5 specific entities (i.e. DRUG, DOSAGE, ROUTE, BRAND, QUANTITY) when processing medication strings such as: - Ibuprofen 100 MG Oral Tablet - 1 ML medroxyprogesterone acetate 150 MG/ML Injection - Acetaminophen 325 MG / Oxycodone Hydrochloride 10 MG Oral Tablet [Percocet] The model was trained and evaluated on limited manually annotated datasets (i.e. train_n_samples=309, eval_n_samples=335), achieved the following evaluation metrics: - **Precision**: 0.998 - **Recall**: 0.983 - **F1**: 0.991 ## Usage 1. Load model: ```python from transformers import AutoTokenizer, AutoModelForTokenClassification model_name = "jackleejm/distilbert-medication-ner" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) ``` 2. Setup a pipeline and run inferences: ```python from transformers import pipeline ner_pipeline = pipeline( task="token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple", device_map="auto", ) input = ["Acetaminophen 325 MG Oral Tablet"] results = ner_pipeline(input) print(results) # Outputs [ [ { "word": "Acetaminophen", "score": np.float32(0.99948627), "entity_group": "DRUG", "start": 0, "end": 13 }, { "word": "325 MG", "score": np.float32(0.99882394), "entity_group": "DOSAGE", "start": 14, "end": 20 }, { "word": "Oral Tablet", "score": np.float32(0.9994621), "entity_group": "ROUTE", "start": 21, "end": 32 } ] ] ``` ### Training Procedure #### Training Hyperparameters - learning_rate: 2e-5 - per_device_train_batch_size: 16 - per_device_eval_batch_size: 16 - num_train_epochs: 20 - weight_decay: 0.01 - evaluation_strategy: "steps" - eval_steps: 50 - load_best_model_at_end: True - metric_for_best_model: "f1" ## Framework versions - Transformers 4.49.0 - Pytorch 2.6.0 - Datasets 3.3.2 - Tokenizers 0.21.0