gpt2-toxic-merged / README.md
nullHawk's picture
update: readme
afb0832 verified
---
library_name: transformers
license: mit
---
# GPT-2 Toxic (LoRA-Merged)
## Model Details
- **Model name:** gpt2-toxic-merged
- **Base model:** openai-community/gpt2
- **Model type:** Causal Language Model
- **Fine-tuning method:** LoRA (Low-Rank Adaptation), merged into base weights
- **Language:** English
- **License:** Same as base model (GPT-2)
This model is a GPT-2 language model fine-tuned using **LoRA** on a hate speech and offensive language dataset. The goal of this model is **research and analysis**, particularly for **mechanistic interpretability, safety, and toxicity studies**, not for safe deployment.
---
## Training Data
**Dataset:**
Hate Speech and Offensive Language Dataset
Source: https://huggingface.co/datasets/tdavidson/hate_speech_offensive
**Dataset description:**
- Collected from online forums and social media
- Annotated into categories:
- `hate`
- `offensive`
- `neither`
- Contains explicit hate speech, profanity, harassment, and offensive language
⚠️ **Warning:** The dataset includes toxic, hateful, and explicit content.
---
## Inference Code:
## Training Configuration
### General Settings
```python
MODEL_NAME = "openai-community/gpt2"
MAX_LENGTH = 128
NUM_EPOCHS = 4
LEARNING_RATE = 2e-4
BATCH_SIZE = 4
GRADIENT_ACCUMULATION = 4 # Effective batch size = 16
```
### LoRA Configs
```python
r = 16
lora_alpha = 32
lora_dropout = 0.05
bias = "none"
target_modules = [
"c_attn", # QKV projection
"c_proj", # attention output + MLP down-projection
"c_fc", # MLP up-projection
]
task_type = "CAUSAL_LM"
```