Amharic Hate Speech Detection Model

This repository hosts a fine-tuned Amharic hate speech detection model based on mBERT (Multilingual BERT). The model classifies text into two categories:

free: Non-hate speech content.
hate: Hate speech content.

Model Details

Model Name: DawitMelka/amharic-hate-speech-detection-mBERT
Architecture: BERT-based model for sequence classification.
Language: Amharic
Problem Type: Binary classification

Use Cases

The model is intended for use in:

Moderation: Flagging potentially harmful or offensive content in Amharic.
Research: Analyzing trends in hate speech within Amharic texts.
Content Filtering: Ensuring safe online environments.

Labels

The model outputs the following labels:

free: Indicates the input text does not contain hate speech.
hate: Indicates the input text contains hate speech.

Example Usage

Hugging Face Inference API

You can directly use the model with the Hugging Face Inference API:

import requests

API_URL = "https://api-inference.huggingface.co/models/DawitMelka/amharic-hate-speech-detection-mBERT"
headers = {"Authorization": f"Bearer YOUR_HUGGINGFACE_API_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

# Single text prediction
result = query({"inputs": "ሰላም እንድት ነው"})
print(result)

# Batch text prediction
batch_result = query({"inputs": ["Text 1", "Text 2", "Text 3"]})
print(batch_result)

Output

[
  {"label": "free", "score": 0.999930739402771},
  {"label": "hate", "score": 6.921886233612895e-05}
]

Local Deployment

You can deploy the model locally using FastAPI:

Install dependencies:

pip install fastapi uvicorn transformers torch

Create a Python file main.py:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

app = FastAPI()

MODEL_NAME = "DawitMelka/amharic-hate-speech-detection-mBERT"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

class TextRequest(BaseModel):
    text: str

@app.post("/predict/")
async def predict(request: TextRequest):
    text = request.text
    if not text.strip():
        raise HTTPException(status_code=400, detail="Text cannot be empty.")
    
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    outputs = model(**inputs)
    logits = outputs.logits
    prediction = torch.argmax(logits, dim=1).item()
    labels = {0: "free", 1: "hate"}
    return {"text": text, "prediction": labels.get(prediction, "unknown")}

Start the FastAPI server:
```
uvicorn main:app --reload
```

Test the API:

curl -X POST "http://127.0.0.1:8000/predict/" \
-H "Content-Type: application/json" \
-d '{"text": "ሰላም እንድት ነው"}'

Training Details

Dataset: The model was fine-tuned on a curated dataset of Amharic texts labeled as hate or free speech.
Base Model: mBERT
Framework: Hugging Face Transformers

Limitations

Bias: The model's predictions depend on the training data and may inherit any biases present in it.
Language Scope: Designed for Amharic only and may not perform well on other languages.
Context Understanding: May misclassify nuanced or ambiguous content.

License

This model is licensed under the Apache License 2.0.

Citation

If you use this model in your work, please cite:

@misc{amharic-hate-speech-detection,
  author = {Dawit Melka},
  title = {Amharic Hate Speech Detection Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DawitMelka/amharic-hate-speech-detection-mBERT}
}

Acknowledgments

Special thanks to the Hugging Face community and contributors for providing the tools and resources to develop this model.

Downloads last month: 7

Safetensors

Model size

0.2B params

Tensor type

F32