hugsanaa
/

HatespeechLLM

Model card Files Files and versions

HatespeechLLM / README.md

hugsanaa's picture

Update README.md

cf5dae6 verified 11 months ago

|

history blame contribute delete

2.99 kB

	---
	base_model: mistralai/Mistral-7B-Instruct-v0.3
	library_name: peft
	---

	# Mistral 7B - Fine-Tuned for Hate Speech Detection

	<!-- Provide a quick summary of what the model is/does. -->

	This repository hosts a fine-tuned version of the Mistral 7B (mistralai/Mistral-7B-Instruct-v0.3) language model for hate speech detection. The base model has been fine-tuned on a curated dataset containing various forms of toxic, offensive, and hateful language across online platforms to make it suitable for detecting and classifying hate speech.

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Base Model: Mistral-7B-Instruct-v0.3
	- Fine-Tuned For: Hate Speech Detection
	- Architecture: Decoder-only transformer
	- Language(s) (NLP): English
	- License: Apache 2.0

	### Training Data

	<!-- Provide the basic links for the model. -->
	The model was fine-tuned using a binary labeled dataset of online texts including:

	- Hateful, abusive, or toxic language
	- Neutral or non-toxic examples for balance

	## Uses

	This model can be used to detect hate speech content online.
	It can be also used to be fine-tuned on more hate speech dataset.


	## How to Get Started with the Model

	#### Preprocessing [optional]

	If the data used for testing the model is collected from social media it is better to clean it by removing URLs, hashtags, mentions, and emojis.

	#### Load the Model and test

	```py
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

	model_name = "hugsanaa/HatespeechLLM"

	model = AutoModelForCausalLM.from_pretrained(model_name)

	tokenizer = AutoTokenizer.from_pretrained(model_name,
	trust_remote_code=True,
	max_length=512,
	padding_side="left",
	add_eos_token=True,
	)
	tokenizer.pad_token = tokenizer.eos_token

	pipe = pipeline(task="text-generation",
	model=model,
	tokenizer=tokenizer,
	max_new_tokens=10,
	temperature=0.0
	)
	text = "generally women are forthright about reality and about everything else"

	prompt = f"""
	[INST] You are an AI model fine-tuned to detect hate speech. Below is a text, and you are required to determine whether it is hateful or non-hateful. Provide your answer as 'hateful' or 'non-hateful'. [/INST]

	Text: {text}
	Answer:
	"""

	result = pipe(prompt, pad_token_id=pipe.tokenizer.eos_token_id)
	answer = result[0]['generated_text'].lower()
	answer_index = answer.find("answer:") + len("answer:")
	extracted_text = answer[answer_index:].strip()

	if "non-" in extracted_text:
	print("The sample provides non-hate speech")
	elif "hate" in extracted_text:
	print("The sample provides hate speech")
	else:
	print("Unable to detect whether the text belongs to hate or non-hate speech")
	```