| | --- |
| | license: mit |
| | datasets: |
| | - tdavidson/hate_speech_offensive |
| | base_model: |
| | - FacebookAI/roberta-large |
| | pipeline_tag: text-classification |
| | library_name: transformers |
| | --- |
| | # Davidson RoBERTa Hate Speech Classifier |
| |
|
| | - Model: roberta-large fine-tuned for 3-way classification (toxic, neutral, non-toxic). |
| | - Dataset: tdavidson/hate_speech_offensive (Twitter), split into train/val/test locally. |
| | - Metrics (test): paste from metrics.json. |
| | - Intended use: content moderation research/demos; not for deployment without bias/fairness review. |
| | - Limitations/risks: social bias, dataset age/domain mismatch; errors possible on slang/irony. |
| | - How to use: |
| | ## Usage |
| | from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline |
| |
|
| | mid = "Yash22CSU192/davidson-roberta-hatespeech" |
| |
|
| | # Load tokenizer and model |
| | tok = AutoTokenizer.from_pretrained(mid) |
| | mdl = AutoModelForSequenceClassification.from_pretrained(mid) |
| |
|
| | # Create a text-classification pipeline |
| | clf = pipeline("text-classification", model=mdl, tokenizer=tok, return_all_scores=True) |
| |
|
| | # Test the classifier |
| | print(clf("Have a nice day.")) |
| |
|
| |
|
| |
|
| | ## Files |
| | - model.safetensors, config.json, tokenizer.json, tokenizer_config.json, vocab.json, merges.txt, special_tokens_map.json |
| | - training_args.bin (Trainer settings), metrics.json (evaluation summary) |
| |
|