Amharic Hate Speech Detection Model
This repository hosts a fine-tuned Amharic hate speech detection model based on mBERT (Multilingual BERT). The model classifies text into two categories:
- free: Non-hate speech content.
- hate: Hate speech content.
Model Details
- Model Name:
DawitMelka/amharic-hate-speech-detection-mBERT - Architecture: BERT-based model for sequence classification.
- Language: Amharic
- Problem Type: Binary classification
Use Cases
The model is intended for use in:
- Moderation: Flagging potentially harmful or offensive content in Amharic.
- Research: Analyzing trends in hate speech within Amharic texts.
- Content Filtering: Ensuring safe online environments.
Labels
The model outputs the following labels:
- free: Indicates the input text does not contain hate speech.
- hate: Indicates the input text contains hate speech.
Example Usage
Hugging Face Inference API
You can directly use the model with the Hugging Face Inference API:
import requests
API_URL = "https://api-inference.huggingface.co/models/DawitMelka/amharic-hate-speech-detection-mBERT"
headers = {"Authorization": f"Bearer YOUR_HUGGINGFACE_API_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
# Single text prediction
result = query({"inputs": "α°αα α₯αα΅α΅ αα"})
print(result)
# Batch text prediction
batch_result = query({"inputs": ["Text 1", "Text 2", "Text 3"]})
print(batch_result)
Output
[
{"label": "free", "score": 0.999930739402771},
{"label": "hate", "score": 6.921886233612895e-05}
]
Local Deployment
You can deploy the model locally using FastAPI:
Install dependencies:
pip install fastapi uvicorn transformers torchCreate a Python file
main.py:from fastapi import FastAPI, HTTPException from pydantic import BaseModel from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch app = FastAPI() MODEL_NAME = "DawitMelka/amharic-hate-speech-detection-mBERT" tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME) class TextRequest(BaseModel): text: str @app.post("/predict/") async def predict(request: TextRequest): text = request.text if not text.strip(): raise HTTPException(status_code=400, detail="Text cannot be empty.") inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) logits = outputs.logits prediction = torch.argmax(logits, dim=1).item() labels = {0: "free", 1: "hate"} return {"text": text, "prediction": labels.get(prediction, "unknown")}Start the FastAPI server:
uvicorn main:app --reloadTest the API:
curl -X POST "http://127.0.0.1:8000/predict/" \ -H "Content-Type: application/json" \ -d '{"text": "α°αα α₯αα΅α΅ αα"}'
Training Details
- Dataset: The model was fine-tuned on a curated dataset of Amharic texts labeled as hate or free speech.
- Base Model:
mBERT - Framework: Hugging Face Transformers
Limitations
- Bias: The model's predictions depend on the training data and may inherit any biases present in it.
- Language Scope: Designed for Amharic only and may not perform well on other languages.
- Context Understanding: May misclassify nuanced or ambiguous content.
License
This model is licensed under the Apache License 2.0.
Citation
If you use this model in your work, please cite:
@misc{amharic-hate-speech-detection,
author = {Dawit Melka},
title = {Amharic Hate Speech Detection Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/DawitMelka/amharic-hate-speech-detection-mBERT}
}
Acknowledgments
Special thanks to the Hugging Face community and contributors for providing the tools and resources to develop this model.
- Downloads last month
- -