hate speech detector

#1
by AuricErgeson - opened

I trained a hate speech detector that catches coded language

Most existing models miss stuff like "they control the media" or
"heil hitler" , they were trained on explicit slurs only.

I fused 4 datasets (Davidson, ImplicitHate, HateXplain, HateDay 2025)

  • targeted augmentation for neo-Nazi codes, antisemitic dog whistles,
    and white nationalist phrases.

Results on 11K held-out examples:

  • neither: F1 0.884
  • offensive: F1 0.870
  • hate_speech: F1 0.697 ← hardest class, still beats most baselines

Model: https://huggingface.co/AuricErgeson/hate-speech-detector
Try it: https://huggingface.co/spaces/AuricErgeson/hate-speech-detector

One thing it still misses: bare "1488" as a standalone token.
If you've solved this open an issue, I'm curious.

#NLP #HateSpeechDetection #ContentModeration #TextClassification

Sign up or log in to comment