Spaces:

ledinhminhquan
/

toxicity-agent-api

Running

toxicity-agent-api / docs /ethics_statement.md

ledinhminhquan

deploy FastAPI backend to HF Space

9302284 2 months ago

1.08 kB

	# Ethics Impact Statement

	## Who benefits
	- Users in communities: reduced exposure to harmful content.
	- Moderators: reduced workload and improved triage.
	- Platforms: improved trust and safety outcomes.

	## Who could be harmed
	- Users whose content is incorrectly flagged (false positives).
	- Vulnerable groups if the model exhibits identity-term bias.

	## Bias & fairness risks
	Toxicity detectors often over-predict toxicity for text mentioning certain identities.
	We mitigate by:
	- Using Detoxify "unbiased" baseline.
	- Requiring human review for borderline cases.
	- Proposing fairness slice evaluations (e.g., identity mention groups).

	## Explainability for stakeholders
	We provide:
	- top contributing label probabilities (not token-level explanations),
	- clear action rationale,
	- audit logs for moderation decisions (privacy-preserving).

	## Misuse risks
	- Over-reliance on automation; mitigate with human-in-the-loop.
	- Using the model to target/harass users; avoid exposing raw scores broadly.

	This system is intended to assist moderation, not replace human judgment.