| # Data Description Document |
|
|
| ## Data source |
| This project uses a public toxicity dataset mirrored on Hugging Face from the |
| **Jigsaw Toxic Comment Classification Challenge** (Wikipedia talk page comments). |
|
|
| Dataset name (default in configs): |
| - `thesofakillers/jigsaw-toxic-comment-classification-challenge` |
|
|
| ## Licensing |
| - The dataset card indicates redistribution under **CC0**, while the underlying comment text |
| originates from Wikipedia content under **CC BY-SA 3.0**. |
| Always verify terms in the dataset card and your organization's compliance requirements. |
|
|
| ## Size & languages |
| - Primarily English comments. |
| - Multi-label annotations for 6 toxicity categories. |
|
|
| ## Labels |
| - toxic |
| - severe_toxic |
| - obscene |
| - threat |
| - insult |
| - identity_hate (mapped to `identity_attack` for Detoxify compatibility) |
|
|
| ## Preprocessing |
| - Minimal normalization (whitespace normalization). |
| - No text augmentation by default. |
| - Optional negative downsampling configurable in `configs/train.yaml`. |
|
|
| ## Splits |
| - If dataset provides labeled splits, use them. |
| - Otherwise, create train/val/test with a fixed seed: |
| - train: 90% |
| - val: 5% |
| - test: 5% |
|
|
| ## Known limitations & biases |
| - Toxicity datasets are prone to **identity term bias**: mentioning certain identities can be |
| incorrectly predicted as toxic. |
| - Labels reflect annotator perceptions and may encode cultural bias. |
| - The dataset contains offensive content; access and storage should be controlled. |
|
|
| Mitigation in this project: |
| - Use Detoxify "unbiased" model as a baseline. |
| - Track fairness metrics across identity mentions (conceptual plan in docs). |
| - Human-in-the-loop review for borderline cases. |
|
|