HatespeechLLM / README.md
hugsanaa's picture
Update README.md
cf5dae6 verified
metadata
base_model: mistralai/Mistral-7B-Instruct-v0.3
library_name: peft

Mistral 7B - Fine-Tuned for Hate Speech Detection

This repository hosts a fine-tuned version of the Mistral 7B (mistralai/Mistral-7B-Instruct-v0.3) language model for hate speech detection. The base model has been fine-tuned on a curated dataset containing various forms of toxic, offensive, and hateful language across online platforms to make it suitable for detecting and classifying hate speech.

Model Details

Model Description

  • Base Model: Mistral-7B-Instruct-v0.3
  • Fine-Tuned For: Hate Speech Detection
  • Architecture: Decoder-only transformer
  • Language(s) (NLP): English
  • License: Apache 2.0

Training Data

The model was fine-tuned using a binary labeled dataset of online texts including:

  • Hateful, abusive, or toxic language
  • Neutral or non-toxic examples for balance

Uses

This model can be used to detect hate speech content online. It can be also used to be fine-tuned on more hate speech dataset.

How to Get Started with the Model

Preprocessing [optional]

If the data used for testing the model is collected from social media it is better to clean it by removing URLs, hashtags, mentions, and emojis.

Load the Model and test

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_name = "hugsanaa/HatespeechLLM"

model = AutoModelForCausalLM.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name,
                                          trust_remote_code=True,
                                          max_length=512,
                                          padding_side="left",
                                          add_eos_token=True,
                                          )
tokenizer.pad_token = tokenizer.eos_token

pipe = pipeline(task="text-generation",
                model=model,
                tokenizer=tokenizer,
                max_new_tokens=10,
                temperature=0.0
               )
text = "generally women are forthright about reality and about everything else"

prompt = f"""
[INST] You are an AI model fine-tuned to detect hate speech. Below is a text, and you are required to determine whether it is hateful or non-hateful. Provide your answer as 'hateful' or 'non-hateful'. [/INST]

Text: {text}
Answer:
"""

result = pipe(prompt, pad_token_id=pipe.tokenizer.eos_token_id)
answer = result[0]['generated_text'].lower()
answer_index = answer.find("answer:") + len("answer:")
extracted_text = answer[answer_index:].strip()

if "non-" in extracted_text:
    print("The sample provides non-hate speech")
elif "hate" in extracted_text:
   print("The sample provides hate speech")
else:
    print("Unable to detect whether the text belongs to hate or non-hate speech")