BERT-based Toxicity Classifier
This is a BERT-based model fine-tuned for multi-label toxicity classification. It can identify toxic language across several dimensions (toxic, obscene, insult, threat, and identity hate). The model is designed to assist in content moderation by providing an automated way to flag harmful text.
Model Details
Model Description
This model is a fine-tuned version of the bert-base-uncased model from the transformers library. It has been trained on a balanced dataset annotated for toxicity and is capable of multi-label classification. Each input text can be simultaneously assigned multiple toxicity labels.
Developed by: Ujjawal Kumar (ujjawalsah9801@gmail.com)
For any issues or inquiries regarding the use of this script, please contact via email.Model type: BERT-based sequence classifier
Language(s) (NLP): English
Finetuned from model: bert-base-uncased
Model Sources
- Repository: ujjawalsah/bert-toxicity-classifier
- Paper: [More Information Needed]
- Demo: https://digitalcontentdetection.vercel.app/
Uses
Direct Use
This model can be used directly for toxicity classification in text. It is suitable for content moderation applications where identifying harmful language is critical.
Downstream Use
The model can be integrated into larger systems (e.g., social media monitoring tools or chat moderation systems) or further fine-tuned for domain-specific toxicity detection.
Out-of-Scope Use
This model is not intended for use as the sole basis for automated content removal or decisions with legal implications. It should be complemented with human oversight and additional verification tools.
Bias, Risks, and Limitations
The model may inherit biases from the training data and might not generalize perfectly to all forms of language or dialects. It may also produce false positives or negatives, so its predictions should be interpreted with caution.
Recommendations
- โ ๏ธ Use as Part of a Pipeline: Integrate the model into a broader content moderation system.
- โ๏ธ Validate on Your Data: Evaluate the model on your specific dataset before deploying.
- ๐ Monitor and Mitigate Bias: Combine automated predictions with human oversight.
How to Get Started with the Model
To use this model, install the transformers library and load the model and tokenizer as follows:
import requests
# Your Hugging Face access token header
headers = {"Authorization": "Bearer *********************"}
# The model endpoint URL
API_URL = "https://api-inference.huggingface.co/models/ujjawalsah/bert-toxicity-classifier"
# Mapping dictionary from model labels to human-friendly labels
label_mapping = {
"LABEL_0": "toxic",
"LABEL_1": "obscene",
"LABEL_2": "insult",
"LABEL_3": "threat",
"LABEL_4": "identity hate"
}
def query_model(text):
payload = {"inputs": text}
response = requests.post(API_URL, headers=headers, json=payload)
# Check for a successful request
if response.status_code == 200:
return response.json()
else:
print("Error:", response.status_code, response.text)
return None
def print_readable_result(result):
# The model returns a list of lists. We assume we're interested in the first result.
predictions = result[0] if isinstance(result, list) and len(result) > 0 else []
if not predictions:
print("No predictions received.")
return
print("Human-friendly Classification Result:")
for pred in predictions:
# Convert label to human-readable using mapping and show score as a percentage
human_label = label_mapping.get(pred.get("label"), pred.get("label"))
score_percentage = pred.get("score", 0) * 100
print(f"- {human_label.capitalize()}: {score_percentage:.2f}% confidence")
if __name__ == "__main__":
# Example text input
text = "You are a wonderful person."
result = query_model(text)
if result:
print_readable_result(result)
License
This model is licensed under the MIT License.
- Downloads last month
- 1