nullHawk
/

gpt2-toxic-merged

Text Generation

text-generation-inference

Model card Files Files and versions

gpt2-toxic-merged / README.md

nullHawk's picture

update: readme

afb0832 verified 13 days ago

|

history blame contribute delete

1.61 kB

	---
	library_name: transformers
	license: mit
	---
	# GPT-2 Toxic (LoRA-Merged)

	## Model Details

	- Model name: gpt2-toxic-merged
	- Base model: openai-community/gpt2
	- Model type: Causal Language Model
	- Fine-tuning method: LoRA (Low-Rank Adaptation), merged into base weights
	- Language: English
	- License: Same as base model (GPT-2)

	This model is a GPT-2 language model fine-tuned using LoRA on a hate speech and offensive language dataset. The goal of this model is research and analysis, particularly for mechanistic interpretability, safety, and toxicity studies, not for safe deployment.

	---

	## Training Data

	Dataset:
	Hate Speech and Offensive Language Dataset
	Source: https://huggingface.co/datasets/tdavidson/hate_speech_offensive

	Dataset description:
	- Collected from online forums and social media
	- Annotated into categories:
	- `hate`
	- `offensive`
	- `neither`
	- Contains explicit hate speech, profanity, harassment, and offensive language

	⚠️ Warning: The dataset includes toxic, hateful, and explicit content.

	---
	## Inference Code:


	## Training Configuration

	### General Settings
	```python
	MODEL_NAME = "openai-community/gpt2"
	MAX_LENGTH = 128
	NUM_EPOCHS = 4
	LEARNING_RATE = 2e-4
	BATCH_SIZE = 4
	GRADIENT_ACCUMULATION = 4 # Effective batch size = 16
	```

	### LoRA Configs
	```python
	r = 16
	lora_alpha = 32
	lora_dropout = 0.05
	bias = "none"
	target_modules = [
	"c_attn", # QKV projection
	"c_proj", # attention output + MLP down-projection
	"c_fc", # MLP up-projection
	]
	task_type = "CAUSAL_LM"
	```