gregco commited on
Commit
b590f16
·
verified ·
1 Parent(s): 2d5bba1

Add model card

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - fr
4
+ license: mit
5
+ tags:
6
+ - text-classification
7
+ - cyberbullying
8
+ - harassment
9
+ - social-media
10
+ - french
11
+ - sentence-transformers
12
+ - sklearn
13
+ datasets:
14
+ - custom
15
+ metrics:
16
+ - f1
17
+ - precision
18
+ - recall
19
+ - accuracy
20
+ model-index:
21
+ - name: balance-tes-haters-classifier
22
+ results:
23
+ - task:
24
+ type: text-classification
25
+ name: Binary Harassment Detection
26
+ dataset:
27
+ name: French social media comments (held-out test set)
28
+ type: custom
29
+ metrics:
30
+ - type: f1
31
+ value: 0.6916
32
+ - type: precision
33
+ value: 0.6852
34
+ - type: recall
35
+ value: 0.6981
36
+ - type: accuracy
37
+ value: 0.7130
38
+ ---
39
+
40
+ # Balance Tes Haters — Harassment Classifier
41
+
42
+ Binary classifier for French social media comments: **harassment (1) vs benign (0)**.
43
+
44
+ Built for the [Balance Tes Haters](https://balanceteshaters.fr) project, which collects and analyses cyberbullying reports from Instagram, TikTok, YouTube and Twitter.
45
+
46
+ ## Architecture
47
+
48
+ This is a **two-component** model:
49
+
50
+ | Component | Description |
51
+ |---|---|
52
+ | **Encoder** | [`Snowflake/snowflake-arctic-embed-l-v2.0`](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) — 568M params, 1024-dim embeddings, loaded from HuggingFace at inference |
53
+ | **Classifier** | `harassment_arctic_mlp.joblib` — sklearn MLP (512→128, ReLU) trained on frozen Arctic embeddings, bundled in this repo (~7 MB) |
54
+
55
+ The encoder is **not fine-tuned** — only the MLP head was trained. This keeps the classifier small and the encoder swappable.
56
+
57
+ ## Performance
58
+
59
+ Evaluated on a stratified held-out test set (15% of annotated French comments):
60
+
61
+ | Metric | Score |
62
+ |---|---|
63
+ | F1 | **0.6916** |
64
+ | Precision | 0.6852 |
65
+ | Recall | 0.6981 |
66
+ | Accuracy | 0.7130 |
67
+
68
+ Comparison with other frozen-embedding approaches on the same test set:
69
+
70
+ | Model | Classifier | F1 |
71
+ |---|---|---|
72
+ | Arctic | MLP | **0.6916** |
73
+ | Arctic | LogReg | 0.6903 |
74
+ | Harrier (270M) | LightGBM | 0.6729 |
75
+ | jina-nano (239M) | LightGBM | 0.6573 |
76
+ | jina-small (677M) | MLP | 0.6195 |
77
+
78
+ ## Usage
79
+
80
+ ```python
81
+ from huggingface_hub import hf_hub_download
82
+ from sentence_transformers import SentenceTransformer
83
+ import joblib
84
+ import numpy as np
85
+
86
+ # Load components
87
+ clf = joblib.load(hf_hub_download(
88
+ repo_id="gregco/balance-tes-haters-classifier",
89
+ filename="harassment_arctic_mlp.joblib",
90
+ ))
91
+ encoder = SentenceTransformer("Snowflake/snowflake-arctic-embed-l-v2.0")
92
+
93
+ def predict(text: str) -> int:
94
+ """Returns 1 (harassment) or 0 (benign)."""
95
+ X = encoder.encode([text], convert_to_numpy=True)
96
+ return int(clf.predict(X)[0])
97
+
98
+ def predict_proba(text: str) -> float:
99
+ """Returns harassment probability between 0 and 1."""
100
+ X = encoder.encode([text], convert_to_numpy=True)
101
+ return float(clf.predict_proba(X)[0, 1])
102
+
103
+ # Examples
104
+ predict("t'es vraiment nulle va mourir") # → 1
105
+ predict("super vidéo, continue comme ça") # → 0
106
+ ```
107
+
108
+ ## Training Data
109
+
110
+ - **Real annotations**: French social media comments manually annotated via the Balance Tes Haters platform, covering 11 harassment categories (injure, menaces, doxxing, incitation à la haine, etc.)
111
+ - **Split**: 70% train / 15% val / 15% test (stratified)
112
+ - The MLP was trained on the `real` split only (no synthetic augmentation for this checkpoint)
113
+
114
+ ## Categories detected
115
+
116
+ The model collapses all harassment categories into a single binary label:
117
+
118
+ - `0` — Absence de cyberharcèlement
119
+ - `1` — Any of: Cyberharcèlement, Injure, Diffamation, Menaces, Doxxing, Incitation au suicide, Incitation à la haine, Cyberharcèlement à caractère sexuel, and others
120
+
121
+ ## Limitations
122
+
123
+ - Trained exclusively on **French** comments — not suitable for other languages
124
+ - Sarcasm and context-dependent harassment may be misclassified
125
+ - F1 of ~0.69 means roughly 1 in 10 harassment comments is missed and 1 in 10 benign comments is flagged
126
+ - Should be used as a **triage tool**, not a final decision system — human review recommended for borderline cases
127
+
128
+ ## Dependencies
129
+
130
+ ```bash
131
+ pip install sentence-transformers scikit-learn huggingface_hub
132
+ ```