hate-speech-knn / README.md

Merikatori

Upload README.md with huggingface_hub

48da132 verified 11 days ago

preview code

raw

history blame contribute delete

1.18 kB

metadata

language: en
tags:
  - text-classification
  - hate-speech
  - twitter
  - knn
  - sklearn
datasets:
  - hate_speech_offensive
metrics:
  - f1
library_name: sklearn

Hate Speech Detector — KNN Pipeline

KNN classifier cho bài toán phân loại hate speech trên Twitter.

Labels

0 — Hate Speech: ngôn ngữ thù ghét
1 — Offensive: xúc phạm nhưng không phải hate speech
2 — Neither: bình thường

Pipeline

TF-IDF (15k features) + Chi2 selection (top 5000)
Sentence Embeddings: all-MiniLM-L6-v2 (384 chiều)
Meta features: word count, uppercase ratio, mention count, v.v.
KNN (k=3, euclidean, distance-weighted, BallTree)
Imbalance: sample_weight='balanced' (không ADASYN — tránh overfit)

Kết quả

Metric	Score
Accuracy	0.8574
Macro F1	0.6396
Weighted F1	0.8437

Load pipeline

import joblib
from huggingface_hub import hf_hub_download

path = hf_hub_download(repo_id="Merikatori/hate-speech-knn", filename="knn_pipeline.pkl")
pipeline = joblib.load(path)

# Predict
knn   = pipeline['knn']
# (cần chạy feature extraction trước — xem gradio_demo.py)