File size: 4,495 Bytes
215017d
 
8c87394
215017d
 
 
8c87394
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215017d
 
8c87394
215017d
8c87394
 
215017d
 
8c87394
215017d
af2cbf9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215017d
8c87394
215017d
8c87394
af2cbf9
215017d
8c87394
 
215017d
 
8c87394
215017d
af2cbf9
 
 
 
 
 
 
215017d
8c87394
215017d
af2cbf9
8c87394
 
af2cbf9
8c87394
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215017d
 
 
 
 
8c87394
215017d
 
8c87394
215017d
8c87394
 
af2cbf9
 
 
 
 
 
215017d
af2cbf9
 
 
215017d
d64a36a
 
215017d
 
8c87394
 
 
 
 
 
 
 
215017d
 
 
 
8c87394
 
 
215017d
 
8c87394
215017d
8c87394
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
datasets:
- ner_dataset_2.jsonl
language:
- en
license: apache-2.0
model-index:
- name: ner-distilbert-base-cased
  results:
  - dataset:
      name: ner_dataset_2.jsonl
      type: ner_dataset_2.jsonl
    metrics:
    - name: Eval Loss
      type: eval_loss
      value: 0.0216
    - name: Eval Accuracy
      type: eval_accuracy
      value: 0.993
    - name: Eval F1
      type: eval_f1
      value: 0.9929
    - name: Eval Recall
      type: eval_recall
      value: 0.993
    - name: Eval Precision
      type: eval_precision
      value: 0.9933
    task:
      name: Ner
      type: token-classification
tags:
- ner
- sklearn
- mlflow
- transformers
- openchs
---

# ner-distilbert-base-cased

This is a Named Entity Recognition (NER) model based on the `distilbert-base-cased` architecture. 
       This model was **expertly fine-tuned** to serve a critical function: accurately identifying and
       classifying key information within the specific domain of child helpline conversations.
      Through its specialized training, the model has been optimized to excel at recognizing the following
       entities crucial for this context:
    
    `CALLER`: The person initiating the call for help.
    `COUNSELOR`: The helpline staff member providing assistance.
    `VICTIM`: The person who is the subject of the issue being discussed.
    `PERPETRATOR`: The person causing the harm or issue.
    `LOCATION`: Geographic places relevant to the conversation.
    `AGE`: The age of the individuals involved.
    `GENDER`: Gender references.
    `INCIDENT_TYPE`: The specific type of problem or issue being reported (e.g., "bullying", "abuse"
       ).

## Model Details

- **Model Name:** ner-distilbert-base-cased
- **Version:** 1
- **Task:** Ner
- **Languages:** en
- **Framework:** sklearn
- **License:** apache-2.0

## Intended Uses & Limitations

### Scope of Use
 **In Scope:** The model is intended for text that is structurally and contextually similar to
       the training data (i.e., conversational, first-person accounts of issues).
  **Out of Scope:**
  This model is **not intended for general-purpose NER tasks** (e.g., analyzing
  news articles, legal documents, or emails). Its performance on text outside of its specialized
  domain has not been evaluated.

## Training Data

- **Dataset:** ner_dataset_1.jsonl
- **Size:** Not specified
- **Languages:** en
- **Nature:** The training data is synthetic, generated to mimic real-world conversations while protecting the privacy and confidentiality of actual helpline users.

## Training Configuration

| Parameter | Value |
|-----------|-------|
| Author | Rogendo |
| Batch Size | 4 |
| Epochs | 10 |
| Lr | 2e-05 |
| Model Name | distilbert-base-cased |
| Test Size | 0.1 |
| Training Date | 2025-10-30T11:58:48.315647 |
| Weight Decay | 0.01 |

## Performance Metrics

### Evaluation Results
| Metric | Value |
|--------|-------|
| Epoch | 10.0000 |
| Eval Accuracy | 0.9930 |
| Eval F1 | 0.9929 |
| Eval Loss | 0.0216 |
| Eval Precision | 0.9933 |
| Eval Recall | 0.9930 |
| Eval Runtime | 0.1509 |
| Eval Samples Per Second | 106.0170 |
| Eval Steps Per Second | 13.2520 |

## Usage

### Installation
```bash
pip install transformers torch
```

### Named Entity Recognition Example
```python
from transformers import pipeline

Use the model from the openchs repository on Hugging Face
  ner = pipeline("ner", model="openchs/ner_distillbert_v1", aggregation_strategy="simple")

  Example of a typical helpline conversation snippet
  text = "Hello 116, my name is Mary and I'm calling from Kampala. My daughter, Jane, is 12 years old and is being
  bullied at school by a boy named Peter."

  entities = ner(text)
  for entity in entities:
      print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.2f})")
```


## MLflow Tracking

- **Experiment:** NER_Distilbert/marlon
- **Run ID:** `10d2648a456a4f6ab74022a9e45c9f40`
- **Training Date:** 2025-10-30 11:58:48
- **Tracking URI:** http://192.168.10.6:5000

## Training Metrics Visualization

View detailed training metrics and TensorBoard logs in the [Training metrics](https://huggingface.co/openchs/ner_distillbert_v1/tensorboard) tab.

## Citation

```bibtex
@misc{ner_distilbert_base_cased,
  title={ner-distilbert-base-cased},
  author={OpenCHS Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/openchs/ner_distillbert_v1}
}
```

## Contact

info@bitz-itc.com

---
*Model card auto-generated from MLflow*