Files changed (1) hide show
  1. README.md +65 -35
README.md CHANGED
@@ -1,60 +1,90 @@
1
  ---
2
- library_name: transformers
3
  license: apache-2.0
4
  base_model: distilbert-base-uncased
 
5
  tags:
6
- - generated_from_trainer
 
 
 
 
 
 
 
 
 
7
  model-index:
8
  - name: Medical-NER-2026-Success
9
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
  # Medical-NER-2026-Success
16
 
17
- This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the None dataset.
18
- It achieves the following results on the evaluation set:
19
- - Loss: 0.5459
20
-
21
- ## Model description
22
-
23
- More information needed
24
-
25
- ## Intended uses & limitations
26
 
27
- More information needed
 
 
 
 
 
28
 
29
- ## Training and evaluation data
 
30
 
31
- More information needed
 
32
 
33
- ## Training procedure
 
 
 
34
 
35
- ### Training hyperparameters
 
 
36
 
37
- The following hyperparameters were used during training:
38
- - learning_rate: 2e-05
39
- - train_batch_size: 8
40
- - eval_batch_size: 8
41
- - seed: 42
42
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
- - lr_scheduler_type: linear
44
- - num_epochs: 3
45
 
46
- ### Training results
 
47
 
48
- | Training Loss | Epoch | Step | Validation Loss |
49
- |:-------------:|:-----:|:----:|:---------------:|
50
- | No log | 1.0 | 4 | 1.1384 |
51
- | No log | 2.0 | 8 | 0.7012 |
52
- | 1.1499 | 3.0 | 12 | 0.5459 |
53
 
 
 
54
 
55
- ### Framework versions
 
 
 
56
 
 
57
  - Transformers 5.0.0
58
  - Pytorch 2.10.0+cpu
59
  - Datasets 4.8.3
60
  - Tokenizers 0.22.2
 
 
1
  ---
2
+ language: en
3
  license: apache-2.0
4
  base_model: distilbert-base-uncased
5
+ library_name: transformers
6
  tags:
7
+ - medical
8
+ - ner
9
+ - token-classification
10
+ - healthcare
11
+ - clinical-nlp
12
+ datasets:
13
+ - sohaibdevv/medical-prescription-ner-2026-benchmark
14
+ metrics:
15
+ - loss
16
+ pipeline_tag: token-classification
17
  model-index:
18
  - name: Medical-NER-2026-Success
19
+ results:
20
+ - task:
21
+ type: token-classification
22
+ name: Named Entity Recognition
23
+ dataset:
24
+ name: Medical Prescription NER 2026 Benchmark
25
+ type: csv
26
+ metrics:
27
+ - type: loss
28
+ value: 0.5459
29
+ name: Validation Loss
30
+ widget:
31
+ - text: "Take 500mg of Amoxicillin twice daily for 7 days."
32
+ example_title: "Standard Prescription"
33
+ - text: "Administer 10ml of Ibuprofen at night."
34
+ example_title: "Liquid Dosage"
35
  ---
36
 
 
 
 
37
  # Medical-NER-2026-Success
38
 
39
+ ## Overview
40
+ This model is a specialized **Named Entity Recognition (NER)** tool fine-tuned from **DistilBERT**. It is specifically designed to extract clinical entities from medical prescriptions and doctor notes. This project was developed as a benchmark for 2026 Medical NLP tasks.
 
 
 
 
 
 
 
41
 
42
+ ### Detected Entities
43
+ | Label | Description | Example |
44
+ | :--- | :--- | :--- |
45
+ | **DRUG** | Name of the medication | *Aspirin, Insulin, Amoxicillin* |
46
+ | **DOSAGE** | Amount, strength, or form | *500mg, 2 tablets, 10ml* |
47
+ | **FREQ** | Frequency and timing | *Daily, twice a day, every 8 hours* |
48
 
49
+ ## How to use
50
+ You can use this model directly with the Hugging Face `pipeline`:
51
 
52
+ ```python
53
+ from transformers import pipeline
54
 
55
+ # Load the model
56
+ ner_pipe = pipeline("token-classification",
57
+ model="sohaibdevv/Medical-NER-2026-Success",
58
+ aggregation_strategy="simple")
59
 
60
+ # Test a prescription
61
+ text = "Patient is prescribed 20mg of Lisinopril once daily."
62
+ results = ner_pipe(text)
63
 
64
+ for entity in results:
65
+ print(f"Entity: {entity['word']} | Label: {entity['entity_group']}")
66
+ ```
 
 
 
 
 
67
 
68
+ ## Training Details
69
+ The model was trained using a **Rule-Based Bootstrapping** approach on the 2026 Medical Benchmark dataset.
70
 
71
+ * **Base Model:** `distilbert-base-uncased`
72
+ * **Labels:** 7 (BIO format for Drug, Dosage, and Frequency)
73
+ * **Epochs:** 3
74
+ * **Learning Rate:** 2e-05
75
+ * **Optimization:** AdamW with linear scheduler
76
 
77
+ ### Performance
78
+ The model achieved a **Validation Loss of 0.5459**, showing strong convergence for medical entity detection in structured English sentences.
79
 
80
+ ## Limitations & Ethics
81
+ - **Research Only:** This model is for educational and research purposes.
82
+ - **Not for Diagnosis:** It should never be used to automate clinical decisions without professional human oversight.
83
+ - **English Only:** Currently optimized for English-language medical text.
84
 
85
+ ## Framework Versions
86
  - Transformers 5.0.0
87
  - Pytorch 2.10.0+cpu
88
  - Datasets 4.8.3
89
  - Tokenizers 0.22.2
90
+ ```