Spaces:

ledinhminhquan
/

toxicity-agent-api

Running

App Files Files Community

toxicity-agent-api / docs /data_description.md

ledinhminhquan

deploy FastAPI backend to HF Space

9302284 2 months ago

preview code

raw

history blame contribute delete

1.64 kB

	# Data Description Document

	## Data source
	This project uses a public toxicity dataset mirrored on Hugging Face from the
	Jigsaw Toxic Comment Classification Challenge (Wikipedia talk page comments).

	Dataset name (default in configs):
	- `thesofakillers/jigsaw-toxic-comment-classification-challenge`

	## Licensing
	- The dataset card indicates redistribution under CC0, while the underlying comment text
	originates from Wikipedia content under CC BY-SA 3.0.
	Always verify terms in the dataset card and your organization's compliance requirements.

	## Size & languages
	- Primarily English comments.
	- Multi-label annotations for 6 toxicity categories.

	## Labels
	- toxic
	- severe_toxic
	- obscene
	- threat
	- insult
	- identity_hate (mapped to `identity_attack` for Detoxify compatibility)

	## Preprocessing
	- Minimal normalization (whitespace normalization).
	- No text augmentation by default.
	- Optional negative downsampling configurable in `configs/train.yaml`.

	## Splits
	- If dataset provides labeled splits, use them.
	- Otherwise, create train/val/test with a fixed seed:
	- train: 90%
	- val: 5%
	- test: 5%

	## Known limitations & biases
	- Toxicity datasets are prone to identity term bias: mentioning certain identities can be
	incorrectly predicted as toxic.
	- Labels reflect annotator perceptions and may encode cultural bias.
	- The dataset contains offensive content; access and storage should be controlled.

	Mitigation in this project:
	- Use Detoxify "unbiased" model as a baseline.
	- Track fairness metrics across identity mentions (conceptual plan in docs).
	- Human-in-the-loop review for borderline cases.