--- language: - en - yo - ha - ig - sw - am - pcm license: apache-2.0 base_model: davlan/afro-xlmr-base tags: - text-classification - human-ai-text-attribution - hata - african-languages - multilingual datasets: - msmaje/phd-hata-african-dataset metrics: - accuracy - f1 --- # AfroXLMR for Human-AI Text Attribution (HATA) This model is a fine-tuned version of [davlan/afro-xlmr-base](https://huggingface.co/davlan/afro-xlmr-base) for **Human-AI Text Attribution** in African languages. ## Model Description - **Model Type:** Text Classification (Binary) - **Base Model:** AfroXLMR-base - **Languages:** Yoruba, Hausa, Igbo, Swahili, Amharic, Nigerian Pidgin, English - **Task:** Distinguishing between human-written and AI-generated text ## Performance | Metric | Score | |-----------|--------| | Accuracy | 1.0000 | | F1 Score | 1.0000 | | Precision | 1.0000 | | Recall | 1.0000 | ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "msmaje/phdhatamodel" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) text = "Your text here" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(predictions, dim=-1).item() labels = {0: "Human-written", 1: "AI-generated"} print(f"Prediction: {labels[predicted_class]}") ``` ## Training Details - **Dataset:** msmaje/phd-hata-african-dataset - **Training samples:** 128,000 - **Validation samples:** 32,000 - **Epochs:** 3 - **Learning Rate:** 2e-5 - **Batch Size:** 16 ## Citation ```bibtex @misc{msmaje2025hata, author = {Maje, M.S.}, title = {AfroXLMR for Human-AI Text Attribution}, year = {2025}, publisher = {HuggingFace}, url = {https://huggingface.co/msmaje/phdhatamodel} } ```