# Data Description Document

## Data source
This project uses a public toxicity dataset mirrored on Hugging Face from the
**Jigsaw Toxic Comment Classification Challenge** (Wikipedia talk page comments).

Dataset name (default in configs):
- `thesofakillers/jigsaw-toxic-comment-classification-challenge`

## Licensing
- The dataset card indicates redistribution under **CC0**, while the underlying comment text
  originates from Wikipedia content under **CC BY-SA 3.0**.
Always verify terms in the dataset card and your organization's compliance requirements.

## Size & languages
- Primarily English comments.
- Multi-label annotations for 6 toxicity categories.

## Labels
- toxic
- severe_toxic
- obscene
- threat
- insult
- identity_hate (mapped to `identity_attack` for Detoxify compatibility)

## Preprocessing
- Minimal normalization (whitespace normalization).
- No text augmentation by default.
- Optional negative downsampling configurable in `configs/train.yaml`.

## Splits
- If dataset provides labeled splits, use them.
- Otherwise, create train/val/test with a fixed seed:
  - train: 90%
  - val: 5%
  - test: 5%

## Known limitations & biases
- Toxicity datasets are prone to **identity term bias**: mentioning certain identities can be
  incorrectly predicted as toxic.
- Labels reflect annotator perceptions and may encode cultural bias.
- The dataset contains offensive content; access and storage should be controlled.

Mitigation in this project:
- Use Detoxify "unbiased" model as a baseline.
- Track fairness metrics across identity mentions (conceptual plan in docs).
- Human-in-the-loop review for borderline cases.