toxicity-agent-api / docs /ethics_statement.md
ledinhminhquan
deploy FastAPI backend to HF Space
9302284
# Ethics Impact Statement
## Who benefits
- Users in communities: reduced exposure to harmful content.
- Moderators: reduced workload and improved triage.
- Platforms: improved trust and safety outcomes.
## Who could be harmed
- Users whose content is incorrectly flagged (false positives).
- Vulnerable groups if the model exhibits identity-term bias.
## Bias & fairness risks
Toxicity detectors often over-predict toxicity for text mentioning certain identities.
We mitigate by:
- Using Detoxify "unbiased" baseline.
- Requiring human review for borderline cases.
- Proposing fairness slice evaluations (e.g., identity mention groups).
## Explainability for stakeholders
We provide:
- top contributing label probabilities (not token-level explanations),
- clear action rationale,
- audit logs for moderation decisions (privacy-preserving).
## Misuse risks
- Over-reliance on automation; mitigate with human-in-the-loop.
- Using the model to target/harass users; avoid exposing raw scores broadly.
This system is intended to assist moderation, not replace human judgment.