Amharic Hate Speech Detection Model

This repository hosts a fine-tuned Amharic hate speech detection model based on mBERT (Multilingual BERT). The model classifies text into two categories:

  • free: Non-hate speech content.
  • hate: Hate speech content.

Model Details

  • Model Name: DawitMelka/amharic-hate-speech-detection-mBERT
  • Architecture: BERT-based model for sequence classification.
  • Language: Amharic
  • Problem Type: Binary classification

Use Cases

The model is intended for use in:

  • Moderation: Flagging potentially harmful or offensive content in Amharic.
  • Research: Analyzing trends in hate speech within Amharic texts.
  • Content Filtering: Ensuring safe online environments.

Labels

The model outputs the following labels:

  • free: Indicates the input text does not contain hate speech.
  • hate: Indicates the input text contains hate speech.

Example Usage

Hugging Face Inference API

You can directly use the model with the Hugging Face Inference API:

import requests

API_URL = "https://api-inference.huggingface.co/models/DawitMelka/amharic-hate-speech-detection-mBERT"
headers = {"Authorization": f"Bearer YOUR_HUGGINGFACE_API_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

# Single text prediction
result = query({"inputs": "αˆ°αˆ‹αˆ αŠ₯αŠ•α‹΅α‰΅ αŠα‹"})
print(result)

# Batch text prediction
batch_result = query({"inputs": ["Text 1", "Text 2", "Text 3"]})
print(batch_result)

Output

[
  {"label": "free", "score": 0.999930739402771},
  {"label": "hate", "score": 6.921886233612895e-05}
]

Local Deployment

You can deploy the model locally using FastAPI:

  1. Install dependencies:

    pip install fastapi uvicorn transformers torch
    
  2. Create a Python file main.py:

    from fastapi import FastAPI, HTTPException
    from pydantic import BaseModel
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    import torch
    
    app = FastAPI()
    
    MODEL_NAME = "DawitMelka/amharic-hate-speech-detection-mBERT"
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
    
    class TextRequest(BaseModel):
        text: str
    
    @app.post("/predict/")
    async def predict(request: TextRequest):
        text = request.text
        if not text.strip():
            raise HTTPException(status_code=400, detail="Text cannot be empty.")
        
        inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
        outputs = model(**inputs)
        logits = outputs.logits
        prediction = torch.argmax(logits, dim=1).item()
        labels = {0: "free", 1: "hate"}
        return {"text": text, "prediction": labels.get(prediction, "unknown")}
    
  3. Start the FastAPI server:

    uvicorn main:app --reload
    
  4. Test the API:

    curl -X POST "http://127.0.0.1:8000/predict/" \
    -H "Content-Type: application/json" \
    -d '{"text": "αˆ°αˆ‹αˆ αŠ₯αŠ•α‹΅α‰΅ αŠα‹"}'
    

Training Details

  • Dataset: The model was fine-tuned on a curated dataset of Amharic texts labeled as hate or free speech.
  • Base Model: mBERT
  • Framework: Hugging Face Transformers

Limitations

  • Bias: The model's predictions depend on the training data and may inherit any biases present in it.
  • Language Scope: Designed for Amharic only and may not perform well on other languages.
  • Context Understanding: May misclassify nuanced or ambiguous content.

License

This model is licensed under the Apache License 2.0.

Citation

If you use this model in your work, please cite:

@misc{amharic-hate-speech-detection,
  author = {Dawit Melka},
  title = {Amharic Hate Speech Detection Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DawitMelka/amharic-hate-speech-detection-mBERT}
}

Acknowledgments

Special thanks to the Hugging Face community and contributors for providing the tools and resources to develop this model.

Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support