Merikatori commited on
Commit
48da132
·
verified ·
1 Parent(s): 89e7d11

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - text-classification
5
+ - hate-speech
6
+ - twitter
7
+ - knn
8
+ - sklearn
9
+ datasets:
10
+ - hate_speech_offensive
11
+ metrics:
12
+ - f1
13
+ library_name: sklearn
14
+ ---
15
+
16
+ # Hate Speech Detector — KNN Pipeline
17
+
18
+ KNN classifier cho bài toán phân loại hate speech trên Twitter.
19
+
20
+ ## Labels
21
+ - **0 — Hate Speech**: ngôn ngữ thù ghét
22
+ - **1 — Offensive**: xúc phạm nhưng không phải hate speech
23
+ - **2 — Neither**: bình thường
24
+
25
+ ## Pipeline
26
+ - TF-IDF (15k features) + Chi2 selection (top 5000)
27
+ - Sentence Embeddings: `all-MiniLM-L6-v2` (384 chiều)
28
+ - Meta features: word count, uppercase ratio, mention count, v.v.
29
+ - KNN (k=3, euclidean, distance-weighted, BallTree)
30
+ - Imbalance: sample_weight='balanced' (không ADASYN — tránh overfit)
31
+
32
+ ## Kết quả
33
+ | Metric | Score |
34
+ |--------|-------|
35
+ | Accuracy | 0.8574 |
36
+ | Macro F1 | 0.6396 |
37
+ | Weighted F1 | 0.8437 |
38
+
39
+ ## Load pipeline
40
+ ```python
41
+ import joblib
42
+ from huggingface_hub import hf_hub_download
43
+
44
+ path = hf_hub_download(repo_id="Merikatori/hate-speech-knn", filename="knn_pipeline.pkl")
45
+ pipeline = joblib.load(path)
46
+
47
+ # Predict
48
+ knn = pipeline['knn']
49
+ # (cần chạy feature extraction trước — xem gradio_demo.py)
50
+ ```