GPT-2 Toxic (LoRA-Merged)

Model Details

  • Model name: gpt2-toxic-merged
  • Base model: openai-community/gpt2
  • Model type: Causal Language Model
  • Fine-tuning method: LoRA (Low-Rank Adaptation), merged into base weights
  • Language: English
  • License: Same as base model (GPT-2)

This model is a GPT-2 language model fine-tuned using LoRA on a hate speech and offensive language dataset. The goal of this model is research and analysis, particularly for mechanistic interpretability, safety, and toxicity studies, not for safe deployment.


Training Data

Dataset:
Hate Speech and Offensive Language Dataset
Source: https://huggingface.co/datasets/tdavidson/hate_speech_offensive

Dataset description:

  • Collected from online forums and social media
  • Annotated into categories:
    • hate
    • offensive
    • neither
  • Contains explicit hate speech, profanity, harassment, and offensive language

⚠️ Warning: The dataset includes toxic, hateful, and explicit content.


Inference Code:

Training Configuration

General Settings

MODEL_NAME = "openai-community/gpt2"
MAX_LENGTH = 128
NUM_EPOCHS = 4
LEARNING_RATE = 2e-4
BATCH_SIZE = 4
GRADIENT_ACCUMULATION = 4   # Effective batch size = 16

LoRA Configs

r = 16
lora_alpha = 32
lora_dropout = 0.05
bias = "none"
target_modules = [
    "c_attn",   # QKV projection
    "c_proj",   # attention output + MLP down-projection
    "c_fc",     # MLP up-projection
]
task_type = "CAUSAL_LM"
Downloads last month
21
Safetensors
Model size
0.1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support