HatespeechLLM / README.md
hugsanaa's picture
Update README.md
cf5dae6 verified
---
base_model: mistralai/Mistral-7B-Instruct-v0.3
library_name: peft
---
# Mistral 7B - Fine-Tuned for Hate Speech Detection
<!-- Provide a quick summary of what the model is/does. -->
This repository hosts a fine-tuned version of the Mistral 7B (mistralai/Mistral-7B-Instruct-v0.3) language model for hate speech detection. The base model has been fine-tuned on a curated dataset containing various forms of toxic, offensive, and hateful language across online platforms to make it suitable for detecting and classifying hate speech.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Base Model:** Mistral-7B-Instruct-v0.3
- **Fine-Tuned For:** Hate Speech Detection
- **Architecture:** Decoder-only transformer
- **Language(s) (NLP):** English
- **License:** Apache 2.0
### Training Data
<!-- Provide the basic links for the model. -->
The model was fine-tuned using a binary labeled dataset of online texts including:
- Hateful, abusive, or toxic language
- Neutral or non-toxic examples for balance
## Uses
This model can be used to detect hate speech content online.
It can be also used to be fine-tuned on more hate speech dataset.
## How to Get Started with the Model
#### Preprocessing [optional]
If the data used for testing the model is collected from social media it is better to clean it by removing URLs, hashtags, mentions, and emojis.
#### Load the Model and test
```py
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_name = "hugsanaa/HatespeechLLM"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name,
trust_remote_code=True,
max_length=512,
padding_side="left",
add_eos_token=True,
)
tokenizer.pad_token = tokenizer.eos_token
pipe = pipeline(task="text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=10,
temperature=0.0
)
text = "generally women are forthright about reality and about everything else"
prompt = f"""
[INST] You are an AI model fine-tuned to detect hate speech. Below is a text, and you are required to determine whether it is hateful or non-hateful. Provide your answer as 'hateful' or 'non-hateful'. [/INST]
Text: {text}
Answer:
"""
result = pipe(prompt, pad_token_id=pipe.tokenizer.eos_token_id)
answer = result[0]['generated_text'].lower()
answer_index = answer.find("answer:") + len("answer:")
extracted_text = answer[answer_index:].strip()
if "non-" in extracted_text:
print("The sample provides non-hate speech")
elif "hate" in extracted_text:
print("The sample provides hate speech")
else:
print("Unable to detect whether the text belongs to hate or non-hate speech")
```