Instructions to use mopatik/PuoBERTa_MRP_version with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mopatik/PuoBERTa_MRP_version with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="mopatik/PuoBERTa_MRP_version")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("mopatik/PuoBERTa_MRP_version") model = AutoModelForSequenceClassification.from_pretrained("mopatik/PuoBERTa_MRP_version") - Notebooks
- Google Colab
- Kaggle
- PuoBERTa-MRP for Setswana Offensive Content Detection
- Model Summary
- What is MRP?
- Research Motivation
- Intended Use
- Out-of-Scope Use
- Dataset Description
- Rationale and Trigger Annotation
- Evaluation Setting
- Model Architecture
- Training Configuration
- MRP-Specific Training / Analysis Workflow
- Test Set Results
- Explainability
- Interpreting Attribution Scores
- Counterfactual and Masking Analysis
- How to Use the Model
- Optional: Masked Rationale Diagnostic Example
- Limitations
- Ethical Considerations
- Bias and Fairness Considerations
- Reproducibility
- Recommended Citation
- License
- Contact
- Model Card Notes
- Model Summary
PuoBERTa-MRP for Setswana Offensive Content Detection
Model Summary
This repository contains PuoBERTa-MRP, a rationale-aware fine-tuned version of PuoBERTa for binary offensive-content detection in Setswana.
The model classifies Setswana text into:
| Label ID | Label |
|---|---|
| 0 | Non-offensive |
| 1 | Offensive |
The model was developed for research on low-resource African language NLP, digital forensic investigation, and explainable offensive-language detection. The MRP version extends the standard PuoBERTa fine-tuning setup by incorporating Masked Rationale Prediction (MRP) as a rationale-aware training and evaluation strategy.
In this work, rationales refer to semantically important offensive spans or trigger expressions that contribute to the offensive classification decision. These spans are used during model development to study whether the classifier relies on linguistically meaningful cues rather than shallow lexical shortcuts.
What is MRP?
MRP stands for Masked Rationale Prediction.
The purpose of the MRP setup is to test and improve the relationship between:
- sentence-level offensive classification,
- annotated semantic trigger spans,
- masked or neutralised rationale regions,
- and explanation faithfulness.
In the MRP setting, annotated offensive rationales are used to create controlled training or diagnostic variants in which key offensive spans may be masked, removed, or neutralised. This allows the researcher to examine whether the model:
- depends only on explicit offensive tokens;
- uses broader contextual patterns;
- remains robust when rationale-bearing terms are masked;
- produces explanations aligned with annotated semantic triggers.
This makes the model useful not only for classification, but also for forensic explainability analysis.
Research Motivation
Offensive-language detection in Setswana presents challenges that are not fully addressed by ordinary sentence-level classification. Offensive meaning may be expressed through:
- culturally specific insults,
- idiomatic expressions,
- indirect accusations,
- threats,
- phishing-related cues,
- sarcasm,
- dehumanising metaphors,
- and code-switched or non-standard orthography.
In small low-resource datasets, a model may overfit to obvious abusive terms while failing to capture broader discourse structures. MRP is introduced to investigate whether rationale masking can reveal or reduce such dependency.
The central research question is:
Can rationale-aware masking improve the interpretability and robustness of Setswana offensive-language detection while preserving useful classification performance?
Intended Use
This model is intended for:
- Setswana offensive-language detection research;
- cyberbullying and harassment detection experiments;
- digital forensic triage support;
- explainable AI experiments;
- LIME and S-LIME attribution analysis;
- masked rationale and counterfactual evaluation;
- benchmarking rationale-aware transformer models for low-resource languages.
It may be useful in research workflows where the goal is to analyse both:
- what the model predicts, and
- why the model predicts it.
Out-of-Scope Use
This model should not be used for:
- fully automated legal decision-making;
- disciplinary action without human review;
- automated criminal attribution;
- autonomous social media moderation;
- profiling individuals or communities;
- deployment on non-Setswana text without validation.
The model is intended to support research and forensic triage, not replace human interpretation.
Dataset Description
The model is based on a manually curated Setswana offensive-language corpus containing offensive and non-offensive examples.
The dataset follows a simple CSV structure compatible with common offensive-language NLP datasets such as OLID and HateCheck:
TEXT,TARGET
Where:
| Column | Description |
|---|---|
TEXT |
Setswana sentence or comment |
TARGET |
Class label: Offensive or Non-offensive |
The broader corpus contains approximately:
| Class | Count |
|---|---|
| Non-offensive | 500 |
| Offensive | 477 |
| Total | 977 |
If using the public merged release, verify the exact row count in the dataset card and release notes, as sanitised or release-ready versions may differ slightly from the internal experimental corpus.
Rationale and Trigger Annotation
During dataset preparation, semantically important offensive spans were annotated as rationales or trigger regions.
These rationales may include:
- direct insults;
- vulgar expressions;
- harassment phrases;
- threat expressions;
- phishing or scam cues;
- dehumanising metaphors;
- culturally grounded abusive expressions.
Example rationale-style annotation:
O tshwanetse go tlogela <TRIGGER>boaka</TRIGGER>
For MRP experiments, such spans can be converted into masked variants, for example:
O tshwanetse go tlogela <MASK>
or neutralised variants, depending on the experiment design.
Evaluation Setting
A key principle of this work is that the model should be assessed under realistic conditions.
Therefore, final evaluation should be performed on:
- tag-free text,
- unmasked ordinary inputs,
- and a held-out test set not used during training or tuning.
This avoids giving the model artificial markup during deployment-like testing.
The evaluation protocol follows:
- 80/20 train-test split;
- 5-fold stratified cross-validation on the training partition;
- final evaluation on the untouched holdout test set;
- tag-free inference during final testing;
- rationale-aware analysis through masking and counterfactual evaluation.
Model Architecture
| Component | Details |
|---|---|
| Base model | PuoBERTa |
| Architecture family | RoBERTa |
| Task | Sequence classification |
| Language | Setswana |
| ISO language code | tn |
| Number of labels | 2 |
| Framework | Hugging Face Transformers |
| Backend | PyTorch |
Training Configuration
The model was fine-tuned using a transformer sequence-classification setup.
Typical configuration:
| Parameter | Value |
|---|---|
| Maximum sequence length | 128 |
| Optimizer | AdamW |
| Learning rate | 1e-5 |
| Weight decay | 0.01 |
| Training batch size | 16 |
| Evaluation batch size | 64 |
| Loss function | Class-weighted cross-entropy |
| Class weights | [1.0, 2.0] |
| Model selection focus | Offensive-class recall |
The offensive class was assigned a higher loss weight to reduce the risk of missing harmful instances.
MRP-Specific Training / Analysis Workflow
The MRP workflow may include the following steps:
- Train or fine-tune the classifier on labelled Setswana text.
- Use annotated semantic rationales to identify offensive spans.
- Create masked-rationale variants of selected samples.
- Evaluate prediction changes after masking.
- Compare original and masked predictions.
- Use LIME or S-LIME to inspect whether top-attributed tokens align with annotated rationales.
- Analyse flip and non-flip cases to determine whether the model depends on explicit offensive tokens or broader contextual templates.
This workflow supports both predictive evaluation and forensic interpretability.
Test Set Results
Insert the final MRP test-set metrics below once confirmed.
| Metric | Value |
|---|---|
| Accuracy | 0.74 |
| Macro F1-score | 0.74 |
| Recall: Offensive class | 0.81 |
| MCC | TO_BE_ADDED |
| ROC-AUC | TO_BE_ADDED |
| Loss | 1.820457 |
Example format:
accuracy = 0.xxxx
macro_f1 = 0.xxxx
recall_1 = 0.xxxx
mcc = 0.xxxx
roc_auc = 0.xxxx
Do not reuse metrics from the standard PuoBERTa or train-time trigger model unless they are from the exact MRP run.
Explainability
This model is designed to support explainability experiments, especially:
- LIME;
- S-LIME;
- token-level attribution;
- masked-rationale comparison;
- counterfactual trigger neutralisation;
- rationale-alignment analysis.
In rationale-alignment analysis, the main question is whether the model’s most influential tokens overlap with human-annotated offensive rationales.
For example, if a human-annotated rationale is:
<TRIGGER>o sematla</TRIGGER>
then a faithful explanation should assign strong attribution to the same phrase or semantically related parts of the sentence.
Interpreting Attribution Scores
For LIME and S-LIME outputs:
- Positive attribution scores support the Offensive class.
- Negative attribution scores support the Non-offensive class.
- Stable attributions across random seeds indicate more reliable explanations.
- Large changes after rationale masking may indicate strong dependence on the masked phrase.
- Non-flip cases may indicate that surrounding context still carries offensive meaning.
MRP is therefore useful for distinguishing between:
- lexical reliance,
- contextual reasoning,
- and potentially spurious shortcut learning.
Counterfactual and Masking Analysis
The MRP model can be evaluated using counterfactual edits such as:
| Original Type | Counterfactual Operation |
|---|---|
| Offensive rationale present | Mask offensive span |
| Offensive rationale present | Replace with neutral paraphrase |
| Offensive rationale present | Remove trigger span |
| Context preserved | Re-evaluate prediction |
A prediction flip from Offensive to Non-offensive may suggest that the model relied strongly on the rationale span.
A non-flip may suggest that offensive meaning is also encoded in the surrounding context, such as accusatory templates or threat-like phrasing.
How to Use the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "YOUR-USERNAME/YOUR-PUOBERTA-MRP-MODEL"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "Ke dumela gore re tshwanetse go bua sentle."
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
padding=True,
max_length=128
)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
pred = torch.argmax(probs, dim=-1).item()
label_map = {
0: "Non-offensive",
1: "Offensive"
}
print("Prediction:", label_map[pred])
print("Probabilities:", probs.tolist())
Optional: Masked Rationale Diagnostic Example
The following is a diagnostic workflow for research use only.
original_text = "O tshwanetse go tlogela boaka"
masked_text = "O tshwanetse go tlogela <mask>"
texts = [original_text, masked_text]
inputs = tokenizer(
texts,
return_tensors="pt",
truncation=True,
padding=True,
max_length=128
)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
for text, prob in zip(texts, probs):
print(text)
print(prob.tolist())
Use this only if your tokenizer/model configuration supports the mask token appropriately.
Limitations
The model has several limitations:
- The dataset is relatively small.
- The model is trained primarily for Setswana.
- It may be sensitive to spelling variation and informal orthography.
- It may struggle with sarcasm, irony, and implicit abuse.
- It may underperform on unseen slang or emerging online expressions.
- It performs binary classification only.
- It does not classify offensive subtypes such as hate speech, harassment, threat, or phishing separately.
- Rationale masking can help diagnosis, but it does not prove causal reasoning.
Ethical Considerations
This model deals with offensive and potentially harmful language. It should be used carefully and only in appropriate research or forensic contexts.
Recommended safeguards:
- human-in-the-loop review;
- calibrated confidence thresholds;
- abstention for uncertain predictions;
- careful error analysis;
- avoidance of automated punitive action;
- compliance with data protection and cybercrime legislation;
- masking or sanitisation of offensive examples in public outputs.
The model should not be used as the sole basis for legal, disciplinary, or investigative conclusions.
Bias and Fairness Considerations
Potential sources of bias include:
- sampling bias from public social media content;
- underrepresentation of dialectal variants;
- limited coverage of emerging slang;
- ambiguity in culturally specific phrases;
- and label uncertainty in sarcastic or metaphorical cases.
Users should validate the model on their own target domain before applying it in practical settings.
Reproducibility
Related reproducibility resources may include:
- training notebooks;
- MRP experiment notebooks;
- LIME/S-LIME explainability notebooks;
- scripts for generating tables and figures;
- sanitised output files;
- dataset card;
- model card;
- Zenodo release.
Associated GitHub repository:
https://github.com/bkekgathetse/setswana-offensive-977
Associated Hugging Face dataset:
ADD_DATASET_LINK_HERE
Associated Zenodo release:
ADD_ZENODO_DOI_HERE
Recommended Citation
@misc{kekgathetse2025puoberta_mrp,
title={PuoBERTa-MRP for Setswana Offensive Content Detection},
author={Kekgathetse, Bernerdict},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/YOUR-USERNAME/YOUR-PUOBERTA-MRP-MODEL}}
}
If this model is linked to a manuscript, cite the corresponding paper as well:
@article{kekgathetse2025setswanaoffensive,
title={Developing Monolingual Setswana Datasets for Offensive Content Detection},
author={Kekgathetse, Bernerdict},
journal={To be updated},
year={2025}
}
License
Please refer to the license specified in this repository.
Recommended licensing structure:
- Code: MIT or Apache-2.0
- Documentation: CC-BY 4.0
- Dataset access: governed separately due to ethical considerations
Contact
For academic queries, reproducibility questions, or collaboration requests, please refer to the associated GitHub repository or manuscript contact details.
Model Card Notes
This model card describes the MRP version of the PuoBERTa offensive-content classifier. It should be updated with the exact final test metrics and repository links before public release.
- Downloads last month
- 80