File size: 2,992 Bytes
a2f83bb b9035db a2f83bb 307df69 a2f83bb b9035db a2f83bb 307df69 a2f83bb b9035db a2f83bb b9035db 307df69 b9035db 307df69 b9035db 307df69 b9035db 307df69 b9035db 307df69 b9035db 307df69 b9035db 307df69 b9035db 3f649fb 647deea b9035db 3f649fb 647deea cf5dae6 647deea cf5dae6 647deea cf5dae6 647deea cf5dae6 647deea cf5dae6 3f649fb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | ---
base_model: mistralai/Mistral-7B-Instruct-v0.3
library_name: peft
---
# Mistral 7B - Fine-Tuned for Hate Speech Detection
<!-- Provide a quick summary of what the model is/does. -->
This repository hosts a fine-tuned version of the Mistral 7B (mistralai/Mistral-7B-Instruct-v0.3) language model for hate speech detection. The base model has been fine-tuned on a curated dataset containing various forms of toxic, offensive, and hateful language across online platforms to make it suitable for detecting and classifying hate speech.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Base Model:** Mistral-7B-Instruct-v0.3
- **Fine-Tuned For:** Hate Speech Detection
- **Architecture:** Decoder-only transformer
- **Language(s) (NLP):** English
- **License:** Apache 2.0
### Training Data
<!-- Provide the basic links for the model. -->
The model was fine-tuned using a binary labeled dataset of online texts including:
- Hateful, abusive, or toxic language
- Neutral or non-toxic examples for balance
## Uses
This model can be used to detect hate speech content online.
It can be also used to be fine-tuned on more hate speech dataset.
## How to Get Started with the Model
#### Preprocessing [optional]
If the data used for testing the model is collected from social media it is better to clean it by removing URLs, hashtags, mentions, and emojis.
#### Load the Model and test
```py
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_name = "hugsanaa/HatespeechLLM"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name,
trust_remote_code=True,
max_length=512,
padding_side="left",
add_eos_token=True,
)
tokenizer.pad_token = tokenizer.eos_token
pipe = pipeline(task="text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=10,
temperature=0.0
)
text = "generally women are forthright about reality and about everything else"
prompt = f"""
[INST] You are an AI model fine-tuned to detect hate speech. Below is a text, and you are required to determine whether it is hateful or non-hateful. Provide your answer as 'hateful' or 'non-hateful'. [/INST]
Text: {text}
Answer:
"""
result = pipe(prompt, pad_token_id=pipe.tokenizer.eos_token_id)
answer = result[0]['generated_text'].lower()
answer_index = answer.find("answer:") + len("answer:")
extracted_text = answer[answer_index:].strip()
if "non-" in extracted_text:
print("The sample provides non-hate speech")
elif "hate" in extracted_text:
print("The sample provides hate speech")
else:
print("Unable to detect whether the text belongs to hate or non-hate speech")
``` |