YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

LinguoNER: Yambeta Named Entity Recognition Model

Model Description

This repository contains LinguoNER, a Transformer-based Named Entity Recognition (NER) model fine-tuned for Yambeta (yat), a low-resource Bantu language spoken in Cameroon.

The model is trained using a silver-standard automatically annotated corpus derived from the Yambeta New Testament and validated through expert-in-the-loop evaluation on sampled subsets.

  • Task: Named Entity Recognition (NER)
  • Language: Yambeta (yat)
  • Entity Types: PER, LOC, ORG
  • Model Architecture: Token classification head on top of a Transformer encoder

Base Model

  • Base checkpoint: bert-base-cased
  • Tokenizer: DS4H-ICTU/yat-bert-tokenizer (WordPiece, Yambeta-specific)

The token-classification head was randomly initialized and fine-tuned jointly with the encoder, following standard practice for NER adaptation.

Training Data

  • Dataset: DS4H-ICTU/yat-ner-dataset
  • Annotation type: Silver (dictionary-based BIO tagging)
  • Gold validation: 500 sentences (annotation logs) + 200 sentences (NER output), reviewed by a domain expert

Dataset Split

  • Train: 6317 sentences
  • Validation: 790 sentences
  • Test: 790 sentences

Evaluation Results (Test Set)

  • precision: 0.9885
  • recall: 0.9810
  • f1: 0.9847
  • accuracy: 0.9997

Metrics are reported at token level using standard Precision / Recall / F1.

Intended Use

  • Proof-of-concept NER for extremely low-resource African languages
  • Baseline for further expert annotation and domain expansion
  • Demonstration of Hugging Face workflows under data scarcity

Limitations

  • The corpus is Bible-derived, with repetitive narrative structure and limited entity inventory.
  • Results should be interpreted as restricted-domain proof-of-concept performance.
  • Dictionary-driven annotation may introduce label bias; expert validation mitigates but does not eliminate this.

Citation

If you use this model, please cite:

@misc{linguoner_yambeta_ner,
  title        = {LinguoNER: Yambeta Named Entity Recognition Model},
  author       = {DS4H-ICTU Research Group},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {Model},
  url          = {https://huggingface.co/DS4H-ICTU/yat-linguoner-ner-model}
}
Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support