| | --- |
| | base_model: mistralai/Mistral-7B-Instruct-v0.3 |
| | library_name: peft |
| | --- |
| | |
| | # Mistral 7B - Fine-Tuned for Hate Speech Detection |
| |
|
| | <!-- Provide a quick summary of what the model is/does. --> |
| |
|
| | This repository hosts a fine-tuned version of the Mistral 7B (mistralai/Mistral-7B-Instruct-v0.3) language model for hate speech detection. The base model has been fine-tuned on a curated dataset containing various forms of toxic, offensive, and hateful language across online platforms to make it suitable for detecting and classifying hate speech. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | <!-- Provide a longer summary of what this model is. --> |
| |
|
| |
|
| |
|
| | - **Base Model:** Mistral-7B-Instruct-v0.3 |
| | - **Fine-Tuned For:** Hate Speech Detection |
| | - **Architecture:** Decoder-only transformer |
| | - **Language(s) (NLP):** English |
| | - **License:** Apache 2.0 |
| |
|
| | ### Training Data |
| |
|
| | <!-- Provide the basic links for the model. --> |
| | The model was fine-tuned using a binary labeled dataset of online texts including: |
| |
|
| | - Hateful, abusive, or toxic language |
| | - Neutral or non-toxic examples for balance |
| |
|
| | ## Uses |
| |
|
| | This model can be used to detect hate speech content online. |
| | It can be also used to be fine-tuned on more hate speech dataset. |
| |
|
| |
|
| | ## How to Get Started with the Model |
| |
|
| | #### Preprocessing [optional] |
| |
|
| | If the data used for testing the model is collected from social media it is better to clean it by removing URLs, hashtags, mentions, and emojis. |
| |
|
| | #### Load the Model and test |
| |
|
| | ```py |
| | from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline |
| | |
| | model_name = "hugsanaa/HatespeechLLM" |
| | |
| | model = AutoModelForCausalLM.from_pretrained(model_name) |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_name, |
| | trust_remote_code=True, |
| | max_length=512, |
| | padding_side="left", |
| | add_eos_token=True, |
| | ) |
| | tokenizer.pad_token = tokenizer.eos_token |
| | |
| | pipe = pipeline(task="text-generation", |
| | model=model, |
| | tokenizer=tokenizer, |
| | max_new_tokens=10, |
| | temperature=0.0 |
| | ) |
| | text = "generally women are forthright about reality and about everything else" |
| | |
| | prompt = f""" |
| | [INST] You are an AI model fine-tuned to detect hate speech. Below is a text, and you are required to determine whether it is hateful or non-hateful. Provide your answer as 'hateful' or 'non-hateful'. [/INST] |
| | |
| | Text: {text} |
| | Answer: |
| | """ |
| | |
| | result = pipe(prompt, pad_token_id=pipe.tokenizer.eos_token_id) |
| | answer = result[0]['generated_text'].lower() |
| | answer_index = answer.find("answer:") + len("answer:") |
| | extracted_text = answer[answer_index:].strip() |
| | |
| | if "non-" in extracted_text: |
| | print("The sample provides non-hate speech") |
| | elif "hate" in extracted_text: |
| | print("The sample provides hate speech") |
| | else: |
| | print("Unable to detect whether the text belongs to hate or non-hate speech") |
| | ``` |