File size: 4,495 Bytes

---
datasets:
- ner_dataset_2.jsonl
language:
- en
license: apache-2.0
model-index:
- name: ner-distilbert-base-cased
  results:
  - dataset:
      name: ner_dataset_2.jsonl
      type: ner_dataset_2.jsonl
    metrics:
    - name: Eval Loss
      type: eval_loss
      value: 0.0216
    - name: Eval Accuracy
      type: eval_accuracy
      value: 0.993
    - name: Eval F1
      type: eval_f1
      value: 0.9929
    - name: Eval Recall
      type: eval_recall
      value: 0.993
    - name: Eval Precision
      type: eval_precision
      value: 0.9933
    task:
      name: Ner
      type: token-classification
tags:
- ner
- sklearn
- mlflow
- transformers
- openchs
---

# ner-distilbert-base-cased

This is a Named Entity Recognition (NER) model based on the `distilbert-base-cased` architecture. 
       This model was **expertly fine-tuned** to serve a critical function: accurately identifying and
       classifying key information within the specific domain of child helpline conversations.
      Through its specialized training, the model has been optimized to excel at recognizing the following
       entities crucial for this context:
    
    `CALLER`: The person initiating the call for help.
    `COUNSELOR`: The helpline staff member providing assistance.
    `VICTIM`: The person who is the subject of the issue being discussed.
    `PERPETRATOR`: The person causing the harm or issue.
    `LOCATION`: Geographic places relevant to the conversation.
    `AGE`: The age of the individuals involved.
    `GENDER`: Gender references.
    `INCIDENT_TYPE`: The specific type of problem or issue being reported (e.g., "bullying", "abuse"
       ).

## Model Details

- **Model Name:** ner-distilbert-base-cased
- **Version:** 1
- **Task:** Ner
- **Languages:** en
- **Framework:** sklearn
- **License:** apache-2.0

## Intended Uses & Limitations

### Scope of Use
 **In Scope:** The model is intended for text that is structurally and contextually similar to
       the training data (i.e., conversational, first-person accounts of issues).
  **Out of Scope:**
  This model is **not intended for general-purpose NER tasks** (e.g., analyzing
  news articles, legal documents, or emails). Its performance on text outside of its specialized
  domain has not been evaluated.

## Training Data

- **Dataset:** ner_dataset_1.jsonl
- **Size:** Not specified
- **Languages:** en
- **Nature:** The training data is synthetic, generated to mimic real-world conversations while protecting the privacy and confidentiality of actual helpline users.

## Training Configuration

| Parameter | Value |
|-----------|-------|
| Author | Rogendo |
| Batch Size | 4 |
| Epochs | 10 |
| Lr | 2e-05 |
| Model Name | distilbert-base-cased |
| Test Size | 0.1 |
| Training Date | 2025-10-30T11:58:48.315647 |
| Weight Decay | 0.01 |

## Performance Metrics

### Evaluation Results
| Metric | Value |
|--------|-------|
| Epoch | 10.0000 |
| Eval Accuracy | 0.9930 |
| Eval F1 | 0.9929 |
| Eval Loss | 0.0216 |
| Eval Precision | 0.9933 |
| Eval Recall | 0.9930 |
| Eval Runtime | 0.1509 |
| Eval Samples Per Second | 106.0170 |
| Eval Steps Per Second | 13.2520 |

## Usage

### Installation
```bash
pip install transformers torch
```

### Named Entity Recognition Example
```python
from transformers import pipeline

Use the model from the openchs repository on Hugging Face
  ner = pipeline("ner", model="openchs/ner_distillbert_v1", aggregation_strategy="simple")

  Example of a typical helpline conversation snippet
  text = "Hello 116, my name is Mary and I'm calling from Kampala. My daughter, Jane, is 12 years old and is being
  bullied at school by a boy named Peter."

  entities = ner(text)
  for entity in entities:
      print(f"{entity['entity_group']}: {entity['word']} (score: {entity['score']:.2f})")
```


## MLflow Tracking

- **Experiment:** NER_Distilbert/marlon
- **Run ID:** `10d2648a456a4f6ab74022a9e45c9f40`
- **Training Date:** 2025-10-30 11:58:48
- **Tracking URI:** http://192.168.10.6:5000

## Training Metrics Visualization

View detailed training metrics and TensorBoard logs in the [Training metrics](https://huggingface.co/openchs/ner_distillbert_v1/tensorboard) tab.

## Citation

```bibtex
@misc{ner_distilbert_base_cased,
  title={ner-distilbert-base-cased},
  author={OpenCHS Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/openchs/ner_distillbert_v1}
}
```

## Contact

info@bitz-itc.com

---
*Model card auto-generated from MLflow*