File size: 4,495 Bytes
215017d 8c87394 215017d 8c87394 215017d 8c87394 215017d 8c87394 215017d 8c87394 215017d af2cbf9 215017d 8c87394 215017d 8c87394 af2cbf9 215017d 8c87394 215017d 8c87394 215017d af2cbf9 215017d 8c87394 215017d af2cbf9 8c87394 af2cbf9 8c87394 215017d 8c87394 215017d 8c87394 215017d 8c87394 af2cbf9 215017d af2cbf9 215017d d64a36a 215017d 8c87394 215017d 8c87394 215017d 8c87394 215017d 8c87394 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
---
datasets:
- ner_dataset_2.jsonl
language:
- en
license: apache-2.0
model-index:
- name: ner-distilbert-base-cased
results:
- dataset:
name: ner_dataset_2.jsonl
type: ner_dataset_2.jsonl
metrics:
- name: Eval Loss
type: eval_loss
value: 0.0216
- name: Eval Accuracy
type: eval_accuracy
value: 0.993
- name: Eval F1
type: eval_f1
value: 0.9929
- name: Eval Recall
type: eval_recall
value: 0.993
- name: Eval Precision
type: eval_precision
value: 0.9933
task:
name: Ner
type: token-classification
tags:
- ner
- sklearn
- mlflow
- transformers
- openchs
---
# ner-distilbert-base-cased
This is a Named Entity Recognition (NER) model based on the `distilbert-base-cased` architecture.
This model was **expertly fine-tuned** to serve a critical function: accurately identifying and
classifying key information within the specific domain of child helpline conversations.
Through its specialized training, the model has been optimized to excel at recognizing the following
entities crucial for this context:
`CALLER`: The person initiating the call for help.
`COUNSELOR`: The helpline staff member providing assistance.
`VICTIM`: The person who is the subject of the issue being discussed.
`PERPETRATOR`: The person causing the harm or issue.
`LOCATION`: Geographic places relevant to the conversation.
`AGE`: The age of the individuals involved.
`GENDER`: Gender references.
`INCIDENT_TYPE`: The specific type of problem or issue being reported (e.g., "bullying", "abuse"
).
## Model Details
- **Model Name:** ner-distilbert-base-cased
- **Version:** 1
- **Task:** Ner
- **Languages:** en
- **Framework:** sklearn
- **License:** apache-2.0
## Intended Uses & Limitations
### Scope of Use
**In Scope:** The model is intended for text that is structurally and contextually similar to
the training data (i.e., conversational, first-person accounts of issues).
**Out of Scope:**
This model is **not intended for general-purpose NER tasks** (e.g., analyzing
news articles, legal documents, or emails). Its performance on text outside of its specialized
domain has not been evaluated.
## Training Data
- **Dataset:** ner_dataset_1.jsonl
- **Size:** Not specified
- **Languages:** en
- **Nature:** The training data is synthetic, generated to mimic real-world conversations while protecting the privacy and confidentiality of actual helpline users.
## Training Configuration
| Parameter | Value |
|-----------|-------|
| Author | Rogendo |
| Batch Size | 4 |
| Epochs | 10 |
| Lr | 2e-05 |
| Model Name | distilbert-base-cased |
| Test Size | 0.1 |
| Training Date | 2025-10-30T11:58:48.315647 |
| Weight Decay | 0.01 |
## Performance Metrics
### Evaluation Results
| Metric | Value |
|--------|-------|
| Epoch | 10.0000 |
| Eval Accuracy | 0.9930 |
| Eval F1 | 0.9929 |
| Eval Loss | 0.0216 |
| Eval Precision | 0.9933 |
| Eval Recall | 0.9930 |
| Eval Runtime | 0.1509 |
| Eval Samples Per Second | 106.0170 |
| Eval Steps Per Second | 13.2520 |
## Usage
### Installation
```bash
pip install transformers torch
```
### Named Entity Recognition Example
```python
from transformers import pipeline
Use the model from the openchs repository on Hugging Face
ner = pipeline("ner", model="openchs/ner_distillbert_v1", aggregation_strategy="simple")
Example of a typical helpline conversation snippet
text = "Hello 116, my name is Mary and I'm calling from Kampala. My daughter, Jane, is 12 years old and is being
bullied at school by a boy named Peter."
entities = ner(text)
for entity in entities:
print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.2f})")
```
## MLflow Tracking
- **Experiment:** NER_Distilbert/marlon
- **Run ID:** `10d2648a456a4f6ab74022a9e45c9f40`
- **Training Date:** 2025-10-30 11:58:48
- **Tracking URI:** http://192.168.10.6:5000
## Training Metrics Visualization
View detailed training metrics and TensorBoard logs in the [Training metrics](https://huggingface.co/openchs/ner_distillbert_v1/tensorboard) tab.
## Citation
```bibtex
@misc{ner_distilbert_base_cased,
title={ner-distilbert-base-cased},
author={OpenCHS Team},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/openchs/ner_distillbert_v1}
}
```
## Contact
info@bitz-itc.com
---
*Model card auto-generated from MLflow*
|