toxicity-agent-api / docs /data_description.md
ledinhminhquan
deploy FastAPI backend to HF Space
9302284

Data Description Document

Data source

This project uses a public toxicity dataset mirrored on Hugging Face from the Jigsaw Toxic Comment Classification Challenge (Wikipedia talk page comments).

Dataset name (default in configs):

  • thesofakillers/jigsaw-toxic-comment-classification-challenge

Licensing

  • The dataset card indicates redistribution under CC0, while the underlying comment text originates from Wikipedia content under CC BY-SA 3.0. Always verify terms in the dataset card and your organization's compliance requirements.

Size & languages

  • Primarily English comments.
  • Multi-label annotations for 6 toxicity categories.

Labels

  • toxic
  • severe_toxic
  • obscene
  • threat
  • insult
  • identity_hate (mapped to identity_attack for Detoxify compatibility)

Preprocessing

  • Minimal normalization (whitespace normalization).
  • No text augmentation by default.
  • Optional negative downsampling configurable in configs/train.yaml.

Splits

  • If dataset provides labeled splits, use them.
  • Otherwise, create train/val/test with a fixed seed:
    • train: 90%
    • val: 5%
    • test: 5%

Known limitations & biases

  • Toxicity datasets are prone to identity term bias: mentioning certain identities can be incorrectly predicted as toxic.
  • Labels reflect annotator perceptions and may encode cultural bias.
  • The dataset contains offensive content; access and storage should be controlled.

Mitigation in this project:

  • Use Detoxify "unbiased" model as a baseline.
  • Track fairness metrics across identity mentions (conceptual plan in docs).
  • Human-in-the-loop review for borderline cases.