--- base_model: mistralai/Mistral-7B-Instruct-v0.3 library_name: peft --- # Mistral 7B - Fine-Tuned for Hate Speech Detection This repository hosts a fine-tuned version of the Mistral 7B (mistralai/Mistral-7B-Instruct-v0.3) language model for hate speech detection. The base model has been fine-tuned on a curated dataset containing various forms of toxic, offensive, and hateful language across online platforms to make it suitable for detecting and classifying hate speech. ## Model Details ### Model Description - **Base Model:** Mistral-7B-Instruct-v0.3 - **Fine-Tuned For:** Hate Speech Detection - **Architecture:** Decoder-only transformer - **Language(s) (NLP):** English - **License:** Apache 2.0 ### Training Data The model was fine-tuned using a binary labeled dataset of online texts including: - Hateful, abusive, or toxic language - Neutral or non-toxic examples for balance ## Uses This model can be used to detect hate speech content online. It can be also used to be fine-tuned on more hate speech dataset. ## How to Get Started with the Model #### Preprocessing [optional] If the data used for testing the model is collected from social media it is better to clean it by removing URLs, hashtags, mentions, and emojis. #### Load the Model and test ```py from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline model_name = "hugsanaa/HatespeechLLM" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, max_length=512, padding_side="left", add_eos_token=True, ) tokenizer.pad_token = tokenizer.eos_token pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10, temperature=0.0 ) text = "generally women are forthright about reality and about everything else" prompt = f""" [INST] You are an AI model fine-tuned to detect hate speech. Below is a text, and you are required to determine whether it is hateful or non-hateful. Provide your answer as 'hateful' or 'non-hateful'. [/INST] Text: {text} Answer: """ result = pipe(prompt, pad_token_id=pipe.tokenizer.eos_token_id) answer = result[0]['generated_text'].lower() answer_index = answer.find("answer:") + len("answer:") extracted_text = answer[answer_index:].strip() if "non-" in extracted_text: print("The sample provides non-hate speech") elif "hate" in extracted_text: print("The sample provides hate speech") else: print("Unable to detect whether the text belongs to hate or non-hate speech") ```