jackleejm
/

distilbert-medication-ner

Token Classification

Model card Files Files and versions

distilbert-medication-ner / README.md

jackleejm's picture

Add details

c91e1a0 verified 10 months ago

|

history blame contribute delete

2.45 kB

	---
	library_name: transformers
	license: mit
	---

	# distilbert-medication-ner

	This model is a fine-tuned version of [distilbert-base-cased](https://huggingface.co/distilbert/distilbert-base-cased) on synthetically generated medication data by [Synthea](https://github.com/synthetichealth/synthea).

	More details on how this model was trained can be found on [GitHub](https://github.com/JackLeeJM/slm-medication-ner).

	## Model Description

	A fine-tuned NER model developed to handle 5 specific entities (i.e. DRUG, DOSAGE, ROUTE, BRAND, QUANTITY) when processing medication strings such as:
	- Ibuprofen 100 MG Oral Tablet
	- 1 ML medroxyprogesterone acetate 150 MG/ML Injection
	- Acetaminophen 325 MG / Oxycodone Hydrochloride 10 MG Oral Tablet [Percocet]

	The model was trained and evaluated on limited manually annotated datasets (i.e. train_n_samples=309, eval_n_samples=335), achieved the following evaluation metrics:
	- Precision: 0.998
	- Recall: 0.983
	- F1: 0.991

	## Usage

	1. Load model:
	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification

	model_name = "jackleejm/distilbert-medication-ner"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)
	```
	2. Setup a pipeline and run inferences:
	```python
	from transformers import pipeline

	ner_pipeline = pipeline(
	task="token-classification",
	model=model,
	tokenizer=tokenizer,
	aggregation_strategy="simple",
	device_map="auto",
	)

	input = ["Acetaminophen 325 MG Oral Tablet"]
	results = ner_pipeline(input)

	print(results)

	# Outputs
	[
	[
	{
	"word": "Acetaminophen",
	"score": np.float32(0.99948627),
	"entity_group": "DRUG",
	"start": 0,
	"end": 13
	},
	{
	"word": "325 MG",
	"score": np.float32(0.99882394),
	"entity_group": "DOSAGE",
	"start": 14,
	"end": 20
	},
	{
	"word": "Oral Tablet",
	"score": np.float32(0.9994621),
	"entity_group": "ROUTE",
	"start": 21,
	"end": 32
	}
	]
	]
	```

	### Training Procedure

	#### Training Hyperparameters

	- learning_rate: 2e-5
	- per_device_train_batch_size: 16
	- per_device_eval_batch_size: 16
	- num_train_epochs: 20
	- weight_decay: 0.01
	- evaluation_strategy: "steps"
	- eval_steps: 50
	- load_best_model_at_end: True
	- metric_for_best_model: "f1"


	## Framework versions
	- Transformers 4.49.0
	- Pytorch 2.6.0
	- Datasets 3.3.2
	- Tokenizers 0.21.0