--- language: - fr license: mit tags: - text-classification - cyberbullying - harassment - social-media - french - sentence-transformers - sklearn datasets: - custom metrics: - f1 - precision - recall - accuracy model-index: - name: balance-tes-haters-classifier results: - task: type: text-classification name: Binary Harassment Detection dataset: name: French social media comments (held-out test set) type: custom metrics: - type: f1 value: 0.6916 - type: precision value: 0.6852 - type: recall value: 0.6981 - type: accuracy value: 0.7130 --- # Balance Tes Haters — Harassment Classifier Binary classifier for French social media comments: **harassment (1) vs benign (0)**. Built for the [Balance Tes Haters](https://balanceteshaters.fr) project, which collects and analyses cyberbullying reports from Instagram, TikTok, YouTube and Twitter. ## Architecture This is a **two-component** model: | Component | Description | |---|---| | **Encoder** | [`Snowflake/snowflake-arctic-embed-l-v2.0`](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) — 568M params, 1024-dim embeddings, loaded from HuggingFace at inference | | **Classifier** | `harassment_arctic_mlp.joblib` — sklearn MLP (512→128, ReLU) trained on frozen Arctic embeddings, bundled in this repo (~7 MB) | The encoder is **not fine-tuned** — only the MLP head was trained. This keeps the classifier small and the encoder swappable. ## Performance Evaluated on a stratified held-out test set (15% of annotated French comments): | Metric | Score | |---|---| | F1 | **0.6916** | | Precision | 0.6852 | | Recall | 0.6981 | | Accuracy | 0.7130 | Comparison with other frozen-embedding approaches on the same test set: | Model | Classifier | F1 | |---|---|---| | Arctic | MLP | **0.6916** | | Arctic | LogReg | 0.6903 | | Harrier (270M) | LightGBM | 0.6729 | | jina-nano (239M) | LightGBM | 0.6573 | | jina-small (677M) | MLP | 0.6195 | ## Usage ```python from huggingface_hub import hf_hub_download from sentence_transformers import SentenceTransformer import joblib import numpy as np # Load components clf = joblib.load(hf_hub_download( repo_id="DataForGood/balance-tes-haters-classifier", filename="harassment_arctic_mlp.joblib", )) encoder = SentenceTransformer("Snowflake/snowflake-arctic-embed-l-v2.0") def predict(text: str) -> int: """Returns 1 (harassment) or 0 (benign).""" X = encoder.encode([text], convert_to_numpy=True) return int(clf.predict(X)[0]) def predict_proba(text: str) -> float: """Returns harassment probability between 0 and 1.""" X = encoder.encode([text], convert_to_numpy=True) return float(clf.predict_proba(X)[0, 1]) # Examples predict("") # → 1 predict("super vidéo, continue comme ça") # → 0 ``` ## Training Data - **Real annotations**: French social media comments manually annotated via the Balance Tes Haters platform, covering 11 harassment categories (injure, menaces, doxxing, incitation à la haine, etc.) - **Split**: 70% train / 15% val / 15% test (stratified) - The MLP was trained on the `real` split only (no synthetic augmentation for this checkpoint) ## Categories detected The model collapses all harassment categories into a single binary label: - `0` — Absence de cyberharcèlement - `1` — Any of: Cyberharcèlement, Injure, Diffamation, Menaces, Doxxing, Incitation au suicide, Incitation à la haine, Cyberharcèlement à caractère sexuel, and others ## Limitations - Trained exclusively on **French** comments — not suitable for other languages - Sarcasm and context-dependent harassment may be misclassified - F1 of ~0.69 means roughly 1 in 10 harassment comments is missed and 1 in 10 benign comments is flagged - Should be used as a **triage tool**, not a final decision system — human review recommended for borderline cases ## Dependencies ```bash pip install sentence-transformers scikit-learn huggingface_hub ```