metadata
base_model: mistralai/Mistral-7B-Instruct-v0.3
library_name: peft
Mistral 7B - Fine-Tuned for Hate Speech Detection
This repository hosts a fine-tuned version of the Mistral 7B (mistralai/Mistral-7B-Instruct-v0.3) language model for hate speech detection. The base model has been fine-tuned on a curated dataset containing various forms of toxic, offensive, and hateful language across online platforms to make it suitable for detecting and classifying hate speech.
Model Details
Model Description
- Base Model: Mistral-7B-Instruct-v0.3
- Fine-Tuned For: Hate Speech Detection
- Architecture: Decoder-only transformer
- Language(s) (NLP): English
- License: Apache 2.0
Training Data
The model was fine-tuned using a binary labeled dataset of online texts including:
- Hateful, abusive, or toxic language
- Neutral or non-toxic examples for balance
Uses
This model can be used to detect hate speech content online. It can be also used to be fine-tuned on more hate speech dataset.
How to Get Started with the Model
Preprocessing [optional]
If the data used for testing the model is collected from social media it is better to clean it by removing URLs, hashtags, mentions, and emojis.
Load the Model and test
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_name = "hugsanaa/HatespeechLLM"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name,
trust_remote_code=True,
max_length=512,
padding_side="left",
add_eos_token=True,
)
tokenizer.pad_token = tokenizer.eos_token
pipe = pipeline(task="text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=10,
temperature=0.0
)
text = "generally women are forthright about reality and about everything else"
prompt = f"""
[INST] You are an AI model fine-tuned to detect hate speech. Below is a text, and you are required to determine whether it is hateful or non-hateful. Provide your answer as 'hateful' or 'non-hateful'. [/INST]
Text: {text}
Answer:
"""
result = pipe(prompt, pad_token_id=pipe.tokenizer.eos_token_id)
answer = result[0]['generated_text'].lower()
answer_index = answer.find("answer:") + len("answer:")
extracted_text = answer[answer_index:].strip()
if "non-" in extracted_text:
print("The sample provides non-hate speech")
elif "hate" in extracted_text:
print("The sample provides hate speech")
else:
print("Unable to detect whether the text belongs to hate or non-hate speech")