--- library_name: transformers --- # BERT-based Toxicity Classifier This is a BERT-based model fine-tuned for multi-label toxicity classification. It can identify toxic language across several dimensions (toxic, obscene, insult, threat, and identity hate). The model is designed to assist in content moderation by providing an automated way to flag harmful text. ## Model Details ### Model Description This model is a fine-tuned version of the `bert-base-uncased` model from the transformers library. It has been trained on a balanced dataset annotated for toxicity and is capable of multi-label classification. Each input text can be simultaneously assigned multiple toxicity labels. - **Developed by:** Ujjawal Kumar ([ujjawalsah9801@gmail.com](mailto:ujjawalsah9801@gmail.com)) *For any issues or inquiries regarding the use of this script, please contact via email.* - **Model type:** BERT-based sequence classifier - **Language(s) (NLP):** English - **Finetuned from model:** bert-base-uncased ### Model Sources - **Repository:** [ujjawalsah/bert-toxicity-classifier](https://huggingface.co/ujjawalsah/bert-toxicity-classifier) - **Paper:** [More Information Needed] - **Demo:** https://digitalcontentdetection.vercel.app/ ## Uses ### Direct Use This model can be used directly for toxicity classification in text. It is suitable for content moderation applications where identifying harmful language is critical. ### Downstream Use The model can be integrated into larger systems (e.g., social media monitoring tools or chat moderation systems) or further fine-tuned for domain-specific toxicity detection. ### Out-of-Scope Use This model is not intended for use as the sole basis for automated content removal or decisions with legal implications. It should be complemented with human oversight and additional verification tools. ## Bias, Risks, and Limitations The model may inherit biases from the training data and might not generalize perfectly to all forms of language or dialects. It may also produce false positives or negatives, so its predictions should be interpreted with caution. ### Recommendations - ⚠️ **Use as Part of a Pipeline:** Integrate the model into a broader content moderation system. - ✔️ **Validate on Your Data:** Evaluate the model on your specific dataset before deploying. - 🔍 **Monitor and Mitigate Bias:** Combine automated predictions with human oversight. ## How to Get Started with the Model To use this model, install the transformers library and load the model and tokenizer as follows: ```python import requests # Your Hugging Face access token header headers = {"Authorization": "Bearer *********************"} # The model endpoint URL API_URL = "https://api-inference.huggingface.co/models/ujjawalsah/bert-toxicity-classifier" # Mapping dictionary from model labels to human-friendly labels label_mapping = { "LABEL_0": "toxic", "LABEL_1": "obscene", "LABEL_2": "insult", "LABEL_3": "threat", "LABEL_4": "identity hate" } def query_model(text): payload = {"inputs": text} response = requests.post(API_URL, headers=headers, json=payload) # Check for a successful request if response.status_code == 200: return response.json() else: print("Error:", response.status_code, response.text) return None def print_readable_result(result): # The model returns a list of lists. We assume we're interested in the first result. predictions = result[0] if isinstance(result, list) and len(result) > 0 else [] if not predictions: print("No predictions received.") return print("Human-friendly Classification Result:") for pred in predictions: # Convert label to human-readable using mapping and show score as a percentage human_label = label_mapping.get(pred.get("label"), pred.get("label")) score_percentage = pred.get("score", 0) * 100 print(f"- {human_label.capitalize()}: {score_percentage:.2f}% confidence") if __name__ == "__main__": # Example text input text = "You are a wonderful person." result = query_model(text) if result: print_readable_result(result) ``` ## License This model is licensed under the [MIT License](https://opensource.org/licenses/MIT).