File size: 2,233 Bytes

e5aec30
 
646daad
 
 
e5aec30
646daad
e5aec30
646daad
e5aec30
646daad
e5aec30
 
 
646daad
e5aec30
646daad
 
 
 
 
 
 
 
 
e5aec30
646daad
e5aec30
646daad
e5aec30
646daad
 
 
 
e5aec30
646daad
e5aec30
646daad
e5aec30
646daad
e5aec30
646daad
e5aec30
646daad
 
 
 
e5aec30
646daad
e5aec30
646daad
e5aec30
646daad

---
library_name: transformers
license: mit
language:
- am
---
# Amharic Hate Speech Detection Model using Fine-Tuned mBERT

## Overview

This repository presents a **Hate Speech Detection Model for the Amharic language**, fine-tuned from the multilingual BERT (mBERT) model. Leveraging the **HuggingFace Trainer API**, this model is specifically designed to detect hate speech in Amharic with high accuracy and precision.

## Model Details

The base model for this project is **Davlan's bert-base-multilingual-cased-finetuned-amharic** from Huggingface. This pretrained model was further fine-tuned on a custom dataset for the downstream task of **hate speech detection** in Amharic.

### Key Highlights:
- **Model Architecture**: mBERT (Multilingual BERT)
- **Training Framework**: HuggingFace's Trainer API
- **Performance**: 
  - **F1-Score**: 0.9172
  - **Accuracy**: 91.59%
- **Training Parameters**:
  - **Epochs**: 15
  - **Learning Rate**: 5e-5

## Dataset

The model was fine-tuned using a dataset sourced from [Mendeley Data](https://data.mendeley.com/datasets/ymtmxx385m). The dataset consists of **30,000 labeled instances**, making it one of the most comprehensive datasets for Amharic hate speech detection.

### Dataset Overview:
- **Total Samples**: 30,000
- **Source**: Mendeley Data Repository
- **Language**: Amharic

## Model Usage

For those interested in utilizing or exploring this model further, the complete Google Colab notebook detailing the training process and performance metrics is available on GitHub. You can easily access it via the following link:

**[Google Colab Notebook: Amharic Hate Speech Detection Using mBERT](https://github.com/dawit2123/amharic-hate-speech-detection-using-ML/blob/main/Hate_speech_detection_using_amharic_language.ipynb)**

## How to Use

To use this model for Amharic hate speech detection, you can follow the steps in the Google Colab notebook to load and test the model on new data. The notebook includes all necessary instructions for:
- Loading the fine-tuned mBERT model
- Preprocessing Amharic text data
- Making predictions on new instances

---

### Contact Information

If you have any questions or suggestions, feel free to reach out or contribute via GitHub.