File size: 1,285 Bytes
f594cd0
 
 
 
 
 
 
 
805ac0f
 
6ac2f9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f594cd0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
datasets:
- google/jigsaw_toxicity_pred
language:
- en
metrics:
- accuracy
---
# Multi-Label Hate Speech Classifier

## Overview
The **Multi-Label Hate Speech Classifier** is a machine learning model designed to detect and categorize multiple forms of hate speech within textual data. It leverages a OneVsRest Logistic Regression classifier combined with TF-IDF vectorization to analyze and classify text into multiple labels simultaneously.

## Features
- **Multi-Label Detection:** Assigns multiple hate speech categories to a single piece of text.
- **Supported Categories:**
  - **toxic**
  - **obscene**
  - **insult**
  - **threat**
  - **identity_hate**
- **Custom Thresholds:** Optimized thresholds are applied to each label to balance precision and recall.

## Model Architecture
- **Text Vectorization:** Utilizes TF-IDF (Term Frequency-Inverse Document Frequency) to convert raw text into a numerical format.
- **Classifier:** Implements a OneVsRest Logistic Regression approach for multi-label classification.
- **Training Process:** Trained on a balanced dataset with pre-processed text to achieve robust performance across all categories.

## Setup & Installation

### Requirements
- Python 3.x
- Dependencies:
  - `numpy`
  - `pandas`
  - `scikit-learn`
  - `joblib`