---
base_model: mistralai/Mistral-7B-Instruct-v0.3
library_name: peft
---

# Mistral 7B - Fine-Tuned for Hate Speech Detection

<!-- Provide a quick summary of what the model is/does. -->

This repository hosts a fine-tuned version of the Mistral 7B (mistralai/Mistral-7B-Instruct-v0.3) language model for hate speech detection. The base model has been fine-tuned on a curated dataset containing various forms of toxic, offensive, and hateful language across online platforms to make it suitable for detecting and classifying hate speech.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Base Model:** Mistral-7B-Instruct-v0.3
- **Fine-Tuned For:** Hate Speech Detection
- **Architecture:** Decoder-only transformer
- **Language(s) (NLP):** English
- **License:** Apache 2.0

### Training Data

<!-- Provide the basic links for the model. -->
The model was fine-tuned using a binary labeled dataset of online texts including:

- Hateful, abusive, or toxic language
- Neutral or non-toxic examples for balance

## Uses

This model can be used to detect hate speech content online.
It can be also used to be fine-tuned on more hate speech dataset.


## How to Get Started with the Model

#### Preprocessing [optional]

If the data used for testing the model is collected from social media it is better to clean it by removing URLs, hashtags, mentions, and emojis.

#### Load the Model and test

```py
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_name = "hugsanaa/HatespeechLLM"

model = AutoModelForCausalLM.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name,
                                          trust_remote_code=True,
                                          max_length=512,
                                          padding_side="left",
                                          add_eos_token=True,
                                          )
tokenizer.pad_token = tokenizer.eos_token

pipe = pipeline(task="text-generation",
                model=model,
                tokenizer=tokenizer,
                max_new_tokens=10,
                temperature=0.0
               )
text = "generally women are forthright about reality and about everything else"

prompt = f"""
[INST] You are an AI model fine-tuned to detect hate speech. Below is a text, and you are required to determine whether it is hateful or non-hateful. Provide your answer as 'hateful' or 'non-hateful'. [/INST]

Text: {text}
Answer:
"""

result = pipe(prompt, pad_token_id=pipe.tokenizer.eos_token_id)
answer = result[0]['generated_text'].lower()
answer_index = answer.find("answer:") + len("answer:")
extracted_text = answer[answer_index:].strip()

if "non-" in extracted_text:
    print("The sample provides non-hate speech")
elif "hate" in extracted_text:
   print("The sample provides hate speech")
else:
    print("Unable to detect whether the text belongs to hate or non-hate speech")
```