File size: 2,992 Bytes

a2f83bb
 
b9035db
a2f83bb
 
307df69
a2f83bb
b9035db
a2f83bb
307df69
a2f83bb
b9035db
a2f83bb
b9035db
 
 
 
 
 
307df69
 
 
 
 
b9035db
307df69
b9035db
 
307df69
b9035db
307df69
 
b9035db
 
 
307df69
 
b9035db
 
 
 
 
 
307df69
b9035db
307df69
b9035db
3f649fb
647deea
b9035db
3f649fb
 
 
 
 
 
 
 
 
 
 
647deea
 
 
 
 
 
 
cf5dae6
647deea
cf5dae6
 
 
 
 
 
647deea
 
 
 
 
 
 
cf5dae6
647deea
cf5dae6
647deea
cf5dae6
3f649fb

---
base_model: mistralai/Mistral-7B-Instruct-v0.3
library_name: peft
---

# Mistral 7B - Fine-Tuned for Hate Speech Detection

<!-- Provide a quick summary of what the model is/does. -->

This repository hosts a fine-tuned version of the Mistral 7B (mistralai/Mistral-7B-Instruct-v0.3) language model for hate speech detection. The base model has been fine-tuned on a curated dataset containing various forms of toxic, offensive, and hateful language across online platforms to make it suitable for detecting and classifying hate speech.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Base Model:** Mistral-7B-Instruct-v0.3
- **Fine-Tuned For:** Hate Speech Detection
- **Architecture:** Decoder-only transformer
- **Language(s) (NLP):** English
- **License:** Apache 2.0

### Training Data

<!-- Provide the basic links for the model. -->
The model was fine-tuned using a binary labeled dataset of online texts including:

- Hateful, abusive, or toxic language
- Neutral or non-toxic examples for balance

## Uses

This model can be used to detect hate speech content online.
It can be also used to be fine-tuned on more hate speech dataset.


## How to Get Started with the Model

#### Preprocessing [optional]

If the data used for testing the model is collected from social media it is better to clean it by removing URLs, hashtags, mentions, and emojis.

#### Load the Model and test

```py
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_name = "hugsanaa/HatespeechLLM"

model = AutoModelForCausalLM.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name,
                                          trust_remote_code=True,
                                          max_length=512,
                                          padding_side="left",
                                          add_eos_token=True,
                                          )
tokenizer.pad_token = tokenizer.eos_token

pipe = pipeline(task="text-generation",
                model=model,
                tokenizer=tokenizer,
                max_new_tokens=10,
                temperature=0.0
               )
text = "generally women are forthright about reality and about everything else"

prompt = f"""
[INST] You are an AI model fine-tuned to detect hate speech. Below is a text, and you are required to determine whether it is hateful or non-hateful. Provide your answer as 'hateful' or 'non-hateful'. [/INST]

Text: {text}
Answer:
"""

result = pipe(prompt, pad_token_id=pipe.tokenizer.eos_token_id)
answer = result[0]['generated_text'].lower()
answer_index = answer.find("answer:") + len("answer:")
extracted_text = answer[answer_index:].strip()

if "non-" in extracted_text:
    print("The sample provides non-hate speech")
elif "hate" in extracted_text:
   print("The sample provides hate speech")
else:
    print("Unable to detect whether the text belongs to hate or non-hate speech")
```