|
|
--- |
|
|
language: |
|
|
- en |
|
|
- yo |
|
|
- ha |
|
|
- ig |
|
|
- sw |
|
|
- am |
|
|
- pcm |
|
|
license: apache-2.0 |
|
|
base_model: davlan/afro-xlmr-base |
|
|
tags: |
|
|
- text-classification |
|
|
- human-ai-text-attribution |
|
|
- hata |
|
|
- african-languages |
|
|
- multilingual |
|
|
datasets: |
|
|
- msmaje/phd-hata-african-dataset |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
--- |
|
|
|
|
|
# AfroXLMR for Human-AI Text Attribution (HATA) |
|
|
|
|
|
This model is a fine-tuned version of [davlan/afro-xlmr-base](https://huggingface.co/davlan/afro-xlmr-base) for **Human-AI Text Attribution** in African languages. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Model Type:** Text Classification (Binary) |
|
|
- **Base Model:** AfroXLMR-base |
|
|
- **Languages:** Yoruba, Hausa, Igbo, Swahili, Amharic, Nigerian Pidgin, English |
|
|
- **Task:** Distinguishing between human-written and AI-generated text |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Metric | Score | |
|
|
|-----------|--------| |
|
|
| Accuracy | 1.0000 | |
|
|
| F1 Score | 1.0000 | |
|
|
| Precision | 1.0000 | |
|
|
| Recall | 1.0000 | |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model_name = "msmaje/phdhatamodel" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
text = "Your text here" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
predicted_class = torch.argmax(predictions, dim=-1).item() |
|
|
|
|
|
labels = {0: "Human-written", 1: "AI-generated"} |
|
|
print(f"Prediction: {labels[predicted_class]}") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Dataset:** msmaje/phd-hata-african-dataset |
|
|
- **Training samples:** 128,000 |
|
|
- **Validation samples:** 32,000 |
|
|
- **Epochs:** 3 |
|
|
- **Learning Rate:** 2e-5 |
|
|
- **Batch Size:** 16 |
|
|
|
|
|
## Citation |
|
|
```bibtex |
|
|
@misc{msmaje2025hata, |
|
|
author = {Maje, M.S.}, |
|
|
title = {AfroXLMR for Human-AI Text Attribution}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/msmaje/phdhatamodel} |
|
|
} |
|
|
``` |
|
|
|