Text Classification
sentence-transformers
Joblib
Scikit-learn
French
cyberbullying
harassment
social-media
french
Eval Results (legacy)
Instructions to use DataForGood/balance-tes-haters-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use DataForGood/balance-tes-haters-classifier with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("DataForGood/balance-tes-haters-classifier") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Scikit-learn
How to use DataForGood/balance-tes-haters-classifier with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("DataForGood/balance-tes-haters-classifier", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| language: | |
| - fr | |
| license: mit | |
| tags: | |
| - text-classification | |
| - cyberbullying | |
| - harassment | |
| - social-media | |
| - french | |
| - sentence-transformers | |
| - sklearn | |
| datasets: | |
| - custom | |
| metrics: | |
| - f1 | |
| - precision | |
| - recall | |
| - accuracy | |
| model-index: | |
| - name: balance-tes-haters-classifier | |
| results: | |
| - task: | |
| type: text-classification | |
| name: Binary Harassment Detection | |
| dataset: | |
| name: French social media comments (held-out test set) | |
| type: custom | |
| metrics: | |
| - type: f1 | |
| value: 0.6916 | |
| - type: precision | |
| value: 0.6852 | |
| - type: recall | |
| value: 0.6981 | |
| - type: accuracy | |
| value: 0.7130 | |
| # Balance Tes Haters — Harassment Classifier | |
| Binary classifier for French social media comments: **harassment (1) vs benign (0)**. | |
| Built for the [Balance Tes Haters](https://balanceteshaters.fr) project, which collects and analyses cyberbullying reports from Instagram, TikTok, YouTube and Twitter. | |
| ## Architecture | |
| This is a **two-component** model: | |
| | Component | Description | | |
| |---|---| | |
| | **Encoder** | [`Snowflake/snowflake-arctic-embed-l-v2.0`](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) — 568M params, 1024-dim embeddings, loaded from HuggingFace at inference | | |
| | **Classifier** | `harassment_arctic_mlp.joblib` — sklearn MLP (512→128, ReLU) trained on frozen Arctic embeddings, bundled in this repo (~7 MB) | | |
| The encoder is **not fine-tuned** — only the MLP head was trained. This keeps the classifier small and the encoder swappable. | |
| ## Performance | |
| Evaluated on a stratified held-out test set (15% of annotated French comments): | |
| | Metric | Score | | |
| |---|---| | |
| | F1 | **0.6916** | | |
| | Precision | 0.6852 | | |
| | Recall | 0.6981 | | |
| | Accuracy | 0.7130 | | |
| Comparison with other frozen-embedding approaches on the same test set: | |
| | Model | Classifier | F1 | | |
| |---|---|---| | |
| | Arctic | MLP | **0.6916** | | |
| | Arctic | LogReg | 0.6903 | | |
| | Harrier (270M) | LightGBM | 0.6729 | | |
| | jina-nano (239M) | LightGBM | 0.6573 | | |
| | jina-small (677M) | MLP | 0.6195 | | |
| ## Usage | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| from sentence_transformers import SentenceTransformer | |
| import joblib | |
| import numpy as np | |
| # Load components | |
| clf = joblib.load(hf_hub_download( | |
| repo_id="DataForGood/balance-tes-haters-classifier", | |
| filename="harassment_arctic_mlp.joblib", | |
| )) | |
| encoder = SentenceTransformer("Snowflake/snowflake-arctic-embed-l-v2.0") | |
| def predict(text: str) -> int: | |
| """Returns 1 (harassment) or 0 (benign).""" | |
| X = encoder.encode([text], convert_to_numpy=True) | |
| return int(clf.predict(X)[0]) | |
| def predict_proba(text: str) -> float: | |
| """Returns harassment probability between 0 and 1.""" | |
| X = encoder.encode([text], convert_to_numpy=True) | |
| return float(clf.predict_proba(X)[0, 1]) | |
| # Examples | |
| predict("<Insert hateful french comment>") # → 1 | |
| predict("super vidéo, continue comme ça") # → 0 | |
| ``` | |
| ## Training Data | |
| - **Real annotations**: French social media comments manually annotated via the Balance Tes Haters platform, covering 11 harassment categories (injure, menaces, doxxing, incitation à la haine, etc.) | |
| - **Split**: 70% train / 15% val / 15% test (stratified) | |
| - The MLP was trained on the `real` split only (no synthetic augmentation for this checkpoint) | |
| ## Categories detected | |
| The model collapses all harassment categories into a single binary label: | |
| - `0` — Absence de cyberharcèlement | |
| - `1` — Any of: Cyberharcèlement, Injure, Diffamation, Menaces, Doxxing, Incitation au suicide, Incitation à la haine, Cyberharcèlement à caractère sexuel, and others | |
| ## Limitations | |
| - Trained exclusively on **French** comments — not suitable for other languages | |
| - Sarcasm and context-dependent harassment may be misclassified | |
| - F1 of ~0.69 means roughly 1 in 10 harassment comments is missed and 1 in 10 benign comments is flagged | |
| - Should be used as a **triage tool**, not a final decision system — human review recommended for borderline cases | |
| ## Dependencies | |
| ```bash | |
| pip install sentence-transformers scikit-learn huggingface_hub | |
| ``` | |