Spaces:

ledinhminhquan
/

toxicity-agent-api

Running

App Files Files Community

toxicity-agent-api / docs /ethics_statement.md

ledinhminhquan

deploy FastAPI backend to HF Space

9302284 2 months ago

|

history blame contribute delete

1.08 kB

Ethics Impact Statement

Who benefits

Users in communities: reduced exposure to harmful content.
Moderators: reduced workload and improved triage.
Platforms: improved trust and safety outcomes.

Who could be harmed

Users whose content is incorrectly flagged (false positives).
Vulnerable groups if the model exhibits identity-term bias.

Bias & fairness risks

Toxicity detectors often over-predict toxicity for text mentioning certain identities. We mitigate by:

Using Detoxify "unbiased" baseline.
Requiring human review for borderline cases.
Proposing fairness slice evaluations (e.g., identity mention groups).

Explainability for stakeholders

We provide:

top contributing label probabilities (not token-level explanations),
clear action rationale,
audit logs for moderation decisions (privacy-preserving).

Misuse risks

Over-reliance on automation; mitigate with human-in-the-loop.
Using the model to target/harass users; avoid exposing raw scores broadly.

This system is intended to assist moderation, not replace human judgment.