Text Classification
Transformers
Safetensors
PyTorch
English
roberta
hate-speech-detection
content-moderation
nlp
twitter
safety
offensive-language
Eval Results (legacy)
text-embeddings-inference
Instructions to use AuricErgeson/hate-speech-detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AuricErgeson/hate-speech-detector with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="AuricErgeson/hate-speech-detector")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("AuricErgeson/hate-speech-detector") model = AutoModelForSequenceClassification.from_pretrained("AuricErgeson/hate-speech-detector") - Notebooks
- Google Colab
- Kaggle
hate speech detector
#1
by AuricErgeson - opened
I trained a hate speech detector that catches coded language
Most existing models miss stuff like "they control the media" or
"heil hitler" , they were trained on explicit slurs only.
I fused 4 datasets (Davidson, ImplicitHate, HateXplain, HateDay 2025)
- targeted augmentation for neo-Nazi codes, antisemitic dog whistles,
and white nationalist phrases.
Results on 11K held-out examples:
- neither: F1 0.884
- offensive: F1 0.870
- hate_speech: F1 0.697 β hardest class, still beats most baselines
Model: https://huggingface.co/AuricErgeson/hate-speech-detector
Try it: https://huggingface.co/spaces/AuricErgeson/hate-speech-detector
One thing it still misses: bare "1488" as a standalone token.
If you've solved this open an issue, I'm curious.
#NLP #HateSpeechDetection #ContentModeration #TextClassification