toxicity-agent-api / docs /ethics_statement.md
ledinhminhquan
deploy FastAPI backend to HF Space
9302284

Ethics Impact Statement

Who benefits

  • Users in communities: reduced exposure to harmful content.
  • Moderators: reduced workload and improved triage.
  • Platforms: improved trust and safety outcomes.

Who could be harmed

  • Users whose content is incorrectly flagged (false positives).
  • Vulnerable groups if the model exhibits identity-term bias.

Bias & fairness risks

Toxicity detectors often over-predict toxicity for text mentioning certain identities. We mitigate by:

  • Using Detoxify "unbiased" baseline.
  • Requiring human review for borderline cases.
  • Proposing fairness slice evaluations (e.g., identity mention groups).

Explainability for stakeholders

We provide:

  • top contributing label probabilities (not token-level explanations),
  • clear action rationale,
  • audit logs for moderation decisions (privacy-preserving).

Misuse risks

  • Over-reliance on automation; mitigate with human-in-the-loop.
  • Using the model to target/harass users; avoid exposing raw scores broadly.

This system is intended to assist moderation, not replace human judgment.