JengaAI Swahili NER (Context-Aware)
Discharged as part of the JengaAI Framework, this model is a specialized Named Entity Recognition (NER) system designed for the African context. It moves beyond standard entity detection (Person/Org) to extract incident-specific details, making it a powerful tool for Automated incident processing, Legal tech, and Data Anonymization.
Model Capabilities
This model is fine-tuned to detect 10 specific entity types relevant to structured data extraction from unstructured reports:
| Label | Description | Example |
|---|---|---|
NAME |
Names of individuals involved | "Kamau", "John Doe" |
AGE |
Age of individuals | "34", "18 years old" |
GENDER |
Gender identification | "male", "female" |
PHONE_NUMBER |
Contact information | "0712345678" |
LOCATION |
General location areas | "Moi Avenue", "Nairobi" |
LANDMARK |
Specific reference points | "National Archives" |
INCIDENT_TYPE |
Values describing the event | "theft", "accident", "assault" |
PERPETRATOR |
The alleged offender | "suspect", "attacker" |
VICTIM |
The affected party | "victim", "complainant" |
O |
Outside (non-entity) | - |
Usage with JengaAI
This model is built on the distilbert-base-uncased backbone for efficiency on edge devices.
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# 1. Load Model
tokenizer = AutoTokenizer.from_pretrained("Rogendo/JengaAI_NER_distilbert-base-uncased-v01")
model = AutoModelForTokenClassification.from_pretrained("Rogendo/JengaAI_NER_distilbert-base-uncased-v01")
# 2. Prepare Input
text = "Kamau reported the incident at Nairobi."
inputs = tokenizer(text, return_tensors="pt")
# 3. Inference
with torch.no_grad():
logits = model(**inputs).logits
# 4. Decode
predictions = torch.argmax(logits, dim=2)
predicted_token_class = [model.config.id2label[t.item()] for t in predictions[0]]
print(predicted_token_class)
Intended Use & Impact
- Legal Tech: Automating the digitization of police abstracts and court affidavits.
- Emergency Response: Rapidly extracting location and incident details from distress texts.
- Data Sovereignty: Processing sensitive PII locally without sending data to foreign APIs.
Training Data
Trained on ner_synthetic_dataset_v1, a curated dataset of synthetic incident reports reflecting linguistic patterns found in East African administrative text.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Rogendo/JengaAI_NER_distilbert-base-uncased-v01
Base model
distilbert/distilbert-base-uncased