DataForGood
/

balance-tes-haters-classifier

+---
+language:
+  - fr
+license: mit
+tags:
+  - text-classification
+  - cyberbullying
+  - harassment
+  - social-media
+  - french
+  - sentence-transformers
+  - sklearn
+datasets:
+  - custom
+metrics:
+  - f1
+  - precision
+  - recall
+  - accuracy
+model-index:
+  - name: balance-tes-haters-classifier
+    results:
+      - task:
+          type: text-classification
+          name: Binary Harassment Detection
+        dataset:
+          name: French social media comments (held-out test set)
+          type: custom
+        metrics:
+          - type: f1
+            value: 0.6916
+          - type: precision
+            value: 0.6852
+          - type: recall
+            value: 0.6981
+          - type: accuracy
+            value: 0.7130
+---
+# Balance Tes Haters — Harassment Classifier
+Binary classifier for French social media comments: **harassment (1) vs benign (0)**.
+Built for the [Balance Tes Haters](https://balanceteshaters.fr) project, which collects and analyses cyberbullying reports from Instagram, TikTok, YouTube and Twitter.
+## Architecture
+This is a **two-component** model:
+| Component | Description |
+|---|---|
+| **Encoder** | [`Snowflake/snowflake-arctic-embed-l-v2.0`](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) — 568M params, 1024-dim embeddings, loaded from HuggingFace at inference |
+| **Classifier** | `harassment_arctic_mlp.joblib` — sklearn MLP (512→128, ReLU) trained on frozen Arctic embeddings, bundled in this repo (~7 MB) |
+The encoder is **not fine-tuned** — only the MLP head was trained. This keeps the classifier small and the encoder swappable.
+## Performance
+Evaluated on a stratified held-out test set (15% of annotated French comments):
+| Metric | Score |
+|---|---|
+| F1 | **0.6916** |
+| Precision | 0.6852 |
+| Recall | 0.6981 |
+| Accuracy | 0.7130 |
+Comparison with other frozen-embedding approaches on the same test set:
+| Model | Classifier | F1 |
+|---|---|---|
+| Arctic | MLP | **0.6916** |
+| Arctic | LogReg | 0.6903 |
+| Harrier (270M) | LightGBM | 0.6729 |
+| jina-nano (239M) | LightGBM | 0.6573 |
+| jina-small (677M) | MLP | 0.6195 |
+## Usage
+```python
+from huggingface_hub import hf_hub_download
+from sentence_transformers import SentenceTransformer
+import joblib
+import numpy as np
+# Load components
+clf = joblib.load(hf_hub_download(
+    repo_id="gregco/balance-tes-haters-classifier",
+    filename="harassment_arctic_mlp.joblib",
+))
+encoder = SentenceTransformer("Snowflake/snowflake-arctic-embed-l-v2.0")
+def predict(text: str) -> int:
+    """Returns 1 (harassment) or 0 (benign)."""
+    X = encoder.encode([text], convert_to_numpy=True)
+    return int(clf.predict(X)[0])
+def predict_proba(text: str) -> float:
+    """Returns harassment probability between 0 and 1."""
+    X = encoder.encode([text], convert_to_numpy=True)
+    return float(clf.predict_proba(X)[0, 1])
+# Examples
+predict("t'es vraiment nulle va mourir")   # → 1
+predict("super vidéo, continue comme ça")  # → 0
+```
+## Training Data
+- **Real annotations**: French social media comments manually annotated via the Balance Tes Haters platform, covering 11 harassment categories (injure, menaces, doxxing, incitation à la haine, etc.)
+- **Split**: 70% train / 15% val / 15% test (stratified)
+- The MLP was trained on the `real` split only (no synthetic augmentation for this checkpoint)
+## Categories detected
+The model collapses all harassment categories into a single binary label:
+- `0` — Absence de cyberharcèlement
+- `1` — Any of: Cyberharcèlement, Injure, Diffamation, Menaces, Doxxing, Incitation au suicide, Incitation à la haine, Cyberharcèlement à caractère sexuel, and others
+## Limitations
+- Trained exclusively on **French** comments — not suitable for other languages
+- Sarcasm and context-dependent harassment may be misclassified
+- F1 of ~0.69 means roughly 1 in 10 harassment comments is missed and 1 in 10 benign comments is flagged
+- Should be used as a **triage tool**, not a final decision system — human review recommended for borderline cases
+## Dependencies
+```bash
+pip install sentence-transformers scikit-learn huggingface_hub
+```