Llama 3.1 8B Instruct β HateXplain (LoRA adapter, ckpt-3500)
- Base model:
meta-llama/Meta-Llama-3.1-8B-Instruct - Adapter: LoRA (rank=8, alpha=16, dropout=0.05, target=all)
- Trainable params: ~0.26% (β20.97M / 8.05B)
- Task: hate speech detection (3 classes:
hatespeech,offensive,normal) - Dataset: HateXplain (train/validation)
- Epochs: 3
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter = "<your-username>/<your-adapter-repo>" # or local path to this checkpoint
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
# Build a llama3-style prompt and generate the label ("hatespeech"/"offensive"/"normal")
Training Configuration
- Finetuning type: LoRA
- LoRA: rank=8, alpha=16, dropout=0.05, target=all
- Precision: bf16 (with gradient checkpointing)
- Per-device batch size: 1
- Gradient accumulation: 8 (effective batch = 8)
- Learning rate: 5e-5, scheduler: cosine, warmup ratio: 0.05
- Epochs: 3
- Template:
llama3, cutoff length: 2048 - Output dir:
saves/llama31-8b/hatexplain/lora
Data
- Source: HateXplain (
datasets), majority-vote label from annotators - Splits: train (15,383), validation (1,922)
- Format: Alpaca-style with
instruction+inputβ label as assistant response
Evaluation (validation)
Overall:
- Accuracy: 0.7196
- Macro-F1: 0.6998
Per-class report:
precision recall f1-score support
hatespeech 0.7252 0.8634 0.7883 593
offensive 0.6152 0.4726 0.5346 548
normal 0.7698 0.7836 0.7766 781
accuracy 0.7196 1922
macro avg 0.7034 0.7065 0.6998 1922
weighted avg 0.7120 0.7196 0.7112 1922
Notes:
- Decoding is greedy with short label strings; loss-based and log-likelihood scoring produce similar ordering.
- Ensure tokenizer uses
pad_token = eos_tokenfor batched evaluation.
Limitations and Intended Use
- For moderated content classification; not intended for generating harmful content.
- Biases present in the dataset may be reflected in predictions.
Acknowledgements
- Built with LLaMA-Factory and PEFT.
- Downloads last month
- 1
Model tree for muditbaid/llama31-8b-hatexplain-lora
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct
Evaluation results
- accuracy on HateXplainvalidation set self-reported0.720
- macro_f1 on HateXplainvalidation set self-reported0.700