Post
64
I trained a hate speech detector that catches coded language
Most existing models miss stuff like "they control the media" or
"heil hitler" they were trained on explicit slurs only.
I fused 4 datasets (Davidson, ImplicitHate, HateXplain, HateDay 2025)
+ targeted augmentation for neo-Nazi codes, antisemitic dog whistles,
and white nationalist phrases.
Results on 11K held-out examples:
- neither: F1 0.884
- offensive: F1 0.870
- hate_speech: F1 0.697 ← hardest class, still beats most baselines
Model: AuricErgeson/hate-speech-detector
Try it: AuricErgeson/hate-speech-detector
One thing it still misses: bare "1488" as a standalone token.
If you've solved this open an issue, I'm curious.
#NLP #HateSpeechDetection #ContentModeration #TextClassification
Most existing models miss stuff like "they control the media" or
"heil hitler" they were trained on explicit slurs only.
I fused 4 datasets (Davidson, ImplicitHate, HateXplain, HateDay 2025)
+ targeted augmentation for neo-Nazi codes, antisemitic dog whistles,
and white nationalist phrases.
Results on 11K held-out examples:
- neither: F1 0.884
- offensive: F1 0.870
- hate_speech: F1 0.697 ← hardest class, still beats most baselines
Model: AuricErgeson/hate-speech-detector
Try it: AuricErgeson/hate-speech-detector
One thing it still misses: bare "1488" as a standalone token.
If you've solved this open an issue, I'm curious.
#NLP #HateSpeechDetection #ContentModeration #TextClassification