File size: 1,609 Bytes
f83dc8d afb0832 f83dc8d afb0832 f83dc8d afb0832 f83dc8d afb0832 f83dc8d afb0832 f83dc8d afb0832 f83dc8d afb0832 f83dc8d afb0832 f83dc8d afb0832 f83dc8d afb0832 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
library_name: transformers
license: mit
---
# GPT-2 Toxic (LoRA-Merged)
## Model Details
- **Model name:** gpt2-toxic-merged
- **Base model:** openai-community/gpt2
- **Model type:** Causal Language Model
- **Fine-tuning method:** LoRA (Low-Rank Adaptation), merged into base weights
- **Language:** English
- **License:** Same as base model (GPT-2)
This model is a GPT-2 language model fine-tuned using **LoRA** on a hate speech and offensive language dataset. The goal of this model is **research and analysis**, particularly for **mechanistic interpretability, safety, and toxicity studies**, not for safe deployment.
---
## Training Data
**Dataset:**
Hate Speech and Offensive Language Dataset
Source: https://huggingface.co/datasets/tdavidson/hate_speech_offensive
**Dataset description:**
- Collected from online forums and social media
- Annotated into categories:
- `hate`
- `offensive`
- `neither`
- Contains explicit hate speech, profanity, harassment, and offensive language
⚠️ **Warning:** The dataset includes toxic, hateful, and explicit content.
---
## Inference Code:
## Training Configuration
### General Settings
```python
MODEL_NAME = "openai-community/gpt2"
MAX_LENGTH = 128
NUM_EPOCHS = 4
LEARNING_RATE = 2e-4
BATCH_SIZE = 4
GRADIENT_ACCUMULATION = 4 # Effective batch size = 16
```
### LoRA Configs
```python
r = 16
lora_alpha = 32
lora_dropout = 0.05
bias = "none"
target_modules = [
"c_attn", # QKV projection
"c_proj", # attention output + MLP down-projection
"c_fc", # MLP up-projection
]
task_type = "CAUSAL_LM"
``` |