archich
/

hate-speech-detector

Text Classification

hate-speech-detection

Model card Files Files and versions

hate-speech-detector / README.md

archich's picture

Add tokenizer

f93a150 verified 2 months ago

|

history blame contribute delete

2.6 kB

	---
	language:
	- en
	- hi
	license: mit
	tags:
	- text-classification
	- hate-speech-detection
	- xlm-roberta
	- multilingual
	datasets:
	- hasoc2019
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	widget:
	- text: I love everyone in this community!
	example_title: Positive Example
	- text: This person is terrible and should be banned
	example_title: Negative Example
	---

	# Hate Speech Detector (XLM-RoBERTa)

	Multilingual hate speech detection model fine-tuned on HASOC 2019 dataset.

	## Model Description

	This model detects hate speech in English and Hindi text using XLM-RoBERTa base as the backbone.

	Languages: English, Hindi
	Task: Binary Text Classification (Hate Speech / Not Hate Speech)
	Base Model: xlm-roberta-base

	## Intended Uses

	- Content moderation
	- Social media monitoring
	- Research purposes

	## How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("archich/hate-speech-detector")
	model = AutoModelForSequenceClassification.from_pretrained("archich/hate-speech-detector")

	# Example text
	text = "Your text here"

	# Tokenize
	inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=256)

	# Predict
	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=1)
	prediction = torch.argmax(probs, dim=1).item()

	labels = ["NOT_HATE_SPEECH", "HATE_SPEECH"]
	print(f"Prediction: {labels[prediction]} ({probs[0][prediction].item():.2%} confidence)")
	```

	## Training Data

	Trained on HASOC 2019 (Hate Speech and Offensive Content Identification) dataset containing:
	- Hindi posts from social media
	- English posts from social media

	## Label Mapping

	- `0`: NOT_HATE_SPEECH - Normal, non-offensive content
	- `1`: HATE_SPEECH - Hateful or offensive content (HOF)

	## Limitations & Ethical Considerations

	⚠️ Important Notice:

	- This model is intended to assist human moderators, not replace them
	- May contain biases from training data
	- Context and cultural nuances are important - manual review recommended
	- False positives are possible
	- Should not be the sole decision-maker for content removal

	## Performance

	Training details and metrics available in model files.

	## Citation

	If you use this model, please cite:

	```
	@misc{hate-speech-detector,
	author = {archich},
	title = {Multilingual Hate Speech Detector},
	year = {2024},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/archich/hate-speech-detector}}
	}
	```