Data Description Document
Data source
This project uses a public toxicity dataset mirrored on Hugging Face from the Jigsaw Toxic Comment Classification Challenge (Wikipedia talk page comments).
Dataset name (default in configs):
thesofakillers/jigsaw-toxic-comment-classification-challenge
Licensing
- The dataset card indicates redistribution under CC0, while the underlying comment text originates from Wikipedia content under CC BY-SA 3.0. Always verify terms in the dataset card and your organization's compliance requirements.
Size & languages
- Primarily English comments.
- Multi-label annotations for 6 toxicity categories.
Labels
- toxic
- severe_toxic
- obscene
- threat
- insult
- identity_hate (mapped to
identity_attackfor Detoxify compatibility)
Preprocessing
- Minimal normalization (whitespace normalization).
- No text augmentation by default.
- Optional negative downsampling configurable in
configs/train.yaml.
Splits
- If dataset provides labeled splits, use them.
- Otherwise, create train/val/test with a fixed seed:
- train: 90%
- val: 5%
- test: 5%
Known limitations & biases
- Toxicity datasets are prone to identity term bias: mentioning certain identities can be incorrectly predicted as toxic.
- Labels reflect annotator perceptions and may encode cultural bias.
- The dataset contains offensive content; access and storage should be controlled.
Mitigation in this project:
- Use Detoxify "unbiased" model as a baseline.
- Track fairness metrics across identity mentions (conceptual plan in docs).
- Human-in-the-loop review for borderline cases.