File size: 1,285 Bytes
f594cd0 805ac0f 6ac2f9d f594cd0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ---
datasets:
- google/jigsaw_toxicity_pred
language:
- en
metrics:
- accuracy
---
# Multi-Label Hate Speech Classifier
## Overview
The **Multi-Label Hate Speech Classifier** is a machine learning model designed to detect and categorize multiple forms of hate speech within textual data. It leverages a OneVsRest Logistic Regression classifier combined with TF-IDF vectorization to analyze and classify text into multiple labels simultaneously.
## Features
- **Multi-Label Detection:** Assigns multiple hate speech categories to a single piece of text.
- **Supported Categories:**
- **toxic**
- **obscene**
- **insult**
- **threat**
- **identity_hate**
- **Custom Thresholds:** Optimized thresholds are applied to each label to balance precision and recall.
## Model Architecture
- **Text Vectorization:** Utilizes TF-IDF (Term Frequency-Inverse Document Frequency) to convert raw text into a numerical format.
- **Classifier:** Implements a OneVsRest Logistic Regression approach for multi-label classification.
- **Training Process:** Trained on a balanced dataset with pre-processed text to achieve robust performance across all categories.
## Setup & Installation
### Requirements
- Python 3.x
- Dependencies:
- `numpy`
- `pandas`
- `scikit-learn`
- `joblib` |