JengaAI Swahili NER (Context-Aware)

Discharged as part of the JengaAI Framework, this model is a specialized Named Entity Recognition (NER) system designed for the African context. It moves beyond standard entity detection (Person/Org) to extract incident-specific details, making it a powerful tool for Automated incident processing, Legal tech, and Data Anonymization.

Model Capabilities

This model is fine-tuned to detect 10 specific entity types relevant to structured data extraction from unstructured reports:

Label Description Example
NAME Names of individuals involved "Kamau", "John Doe"
AGE Age of individuals "34", "18 years old"
GENDER Gender identification "male", "female"
PHONE_NUMBER Contact information "0712345678"
LOCATION General location areas "Moi Avenue", "Nairobi"
LANDMARK Specific reference points "National Archives"
INCIDENT_TYPE Values describing the event "theft", "accident", "assault"
PERPETRATOR The alleged offender "suspect", "attacker"
VICTIM The affected party "victim", "complainant"
O Outside (non-entity) -

Usage with JengaAI

This model is built on the distilbert-base-uncased backbone for efficiency on edge devices.

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# 1. Load Model
tokenizer = AutoTokenizer.from_pretrained("Rogendo/JengaAI_NER_distilbert-base-uncased-v01")
model = AutoModelForTokenClassification.from_pretrained("Rogendo/JengaAI_NER_distilbert-base-uncased-v01")

# 2. Prepare Input
text = "Kamau reported the incident at Nairobi."
inputs = tokenizer(text, return_tensors="pt")

# 3. Inference
with torch.no_grad():
    logits = model(**inputs).logits

# 4. Decode
predictions = torch.argmax(logits, dim=2)
predicted_token_class = [model.config.id2label[t.item()] for t in predictions[0]]
print(predicted_token_class)

Intended Use & Impact

  • Legal Tech: Automating the digitization of police abstracts and court affidavits.
  • Emergency Response: Rapidly extracting location and incident details from distress texts.
  • Data Sovereignty: Processing sensitive PII locally without sending data to foreign APIs.

Training Data

Trained on ner_synthetic_dataset_v1, a curated dataset of synthetic incident reports reflecting linguistic patterns found in East African administrative text.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rogendo/JengaAI_NER_distilbert-base-uncased-v01

Finetuned
(10742)
this model