Text Classification
sentence-transformers
Joblib
Scikit-learn
French
cyberbullying
harassment
social-media
french
Eval Results (legacy)
Instructions to use DataForGood/balance-tes-haters-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use DataForGood/balance-tes-haters-classifier with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("DataForGood/balance-tes-haters-classifier") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Scikit-learn
How to use DataForGood/balance-tes-haters-classifier with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("DataForGood/balance-tes-haters-classifier", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
File size: 4,144 Bytes
b590f16 e151cec b590f16 8307b30 b590f16 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | ---
language:
- fr
license: mit
tags:
- text-classification
- cyberbullying
- harassment
- social-media
- french
- sentence-transformers
- sklearn
datasets:
- custom
metrics:
- f1
- precision
- recall
- accuracy
model-index:
- name: balance-tes-haters-classifier
results:
- task:
type: text-classification
name: Binary Harassment Detection
dataset:
name: French social media comments (held-out test set)
type: custom
metrics:
- type: f1
value: 0.6916
- type: precision
value: 0.6852
- type: recall
value: 0.6981
- type: accuracy
value: 0.7130
---
# Balance Tes Haters — Harassment Classifier
Binary classifier for French social media comments: **harassment (1) vs benign (0)**.
Built for the [Balance Tes Haters](https://balanceteshaters.fr) project, which collects and analyses cyberbullying reports from Instagram, TikTok, YouTube and Twitter.
## Architecture
This is a **two-component** model:
| Component | Description |
|---|---|
| **Encoder** | [`Snowflake/snowflake-arctic-embed-l-v2.0`](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) — 568M params, 1024-dim embeddings, loaded from HuggingFace at inference |
| **Classifier** | `harassment_arctic_mlp.joblib` — sklearn MLP (512→128, ReLU) trained on frozen Arctic embeddings, bundled in this repo (~7 MB) |
The encoder is **not fine-tuned** — only the MLP head was trained. This keeps the classifier small and the encoder swappable.
## Performance
Evaluated on a stratified held-out test set (15% of annotated French comments):
| Metric | Score |
|---|---|
| F1 | **0.6916** |
| Precision | 0.6852 |
| Recall | 0.6981 |
| Accuracy | 0.7130 |
Comparison with other frozen-embedding approaches on the same test set:
| Model | Classifier | F1 |
|---|---|---|
| Arctic | MLP | **0.6916** |
| Arctic | LogReg | 0.6903 |
| Harrier (270M) | LightGBM | 0.6729 |
| jina-nano (239M) | LightGBM | 0.6573 |
| jina-small (677M) | MLP | 0.6195 |
## Usage
```python
from huggingface_hub import hf_hub_download
from sentence_transformers import SentenceTransformer
import joblib
import numpy as np
# Load components
clf = joblib.load(hf_hub_download(
repo_id="DataForGood/balance-tes-haters-classifier",
filename="harassment_arctic_mlp.joblib",
))
encoder = SentenceTransformer("Snowflake/snowflake-arctic-embed-l-v2.0")
def predict(text: str) -> int:
"""Returns 1 (harassment) or 0 (benign)."""
X = encoder.encode([text], convert_to_numpy=True)
return int(clf.predict(X)[0])
def predict_proba(text: str) -> float:
"""Returns harassment probability between 0 and 1."""
X = encoder.encode([text], convert_to_numpy=True)
return float(clf.predict_proba(X)[0, 1])
# Examples
predict("<Insert hateful french comment>") # → 1
predict("super vidéo, continue comme ça") # → 0
```
## Training Data
- **Real annotations**: French social media comments manually annotated via the Balance Tes Haters platform, covering 11 harassment categories (injure, menaces, doxxing, incitation à la haine, etc.)
- **Split**: 70% train / 15% val / 15% test (stratified)
- The MLP was trained on the `real` split only (no synthetic augmentation for this checkpoint)
## Categories detected
The model collapses all harassment categories into a single binary label:
- `0` — Absence de cyberharcèlement
- `1` — Any of: Cyberharcèlement, Injure, Diffamation, Menaces, Doxxing, Incitation au suicide, Incitation à la haine, Cyberharcèlement à caractère sexuel, and others
## Limitations
- Trained exclusively on **French** comments — not suitable for other languages
- Sarcasm and context-dependent harassment may be misclassified
- F1 of ~0.69 means roughly 1 in 10 harassment comments is missed and 1 in 10 benign comments is flagged
- Should be used as a **triage tool**, not a final decision system — human review recommended for borderline cases
## Dependencies
```bash
pip install sentence-transformers scikit-learn huggingface_hub
```
|