Llama 3.1 8B Instruct — HateXplain (LoRA adapter, ckpt-3500)

Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
Adapter: LoRA (rank=8, alpha=16, dropout=0.05, target=all)
Trainable params: ~0.26% (≈20.97M / 8.05B)
Task: hate speech detection (3 classes: hatespeech, offensive, normal)
Dataset: HateXplain (train/validation)
Epochs: 3

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter = "<your-username>/<your-adapter-repo>"  # or local path to this checkpoint

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

# Build a llama3-style prompt and generate the label ("hatespeech"/"offensive"/"normal")

Training Configuration

Finetuning type: LoRA
LoRA: rank=8, alpha=16, dropout=0.05, target=all
Precision: bf16 (with gradient checkpointing)
Per-device batch size: 1
Gradient accumulation: 8 (effective batch = 8)
Learning rate: 5e-5, scheduler: cosine, warmup ratio: 0.05
Epochs: 3
Template: llama3, cutoff length: 2048
Output dir: saves/llama31-8b/hatexplain/lora

Data

Source: HateXplain (datasets), majority-vote label from annotators
Splits: train (15,383), validation (1,922)
Format: Alpaca-style with instruction + input → label as assistant response

Evaluation (validation)

Overall:

Accuracy: 0.7196
Macro-F1: 0.6998

Per-class report:

              precision    recall  f1-score   support

  hatespeech     0.7252    0.8634    0.7883       593
   offensive     0.6152    0.4726    0.5346       548
      normal     0.7698    0.7836    0.7766       781

    accuracy                         0.7196      1922
   macro avg     0.7034    0.7065    0.6998      1922
weighted avg     0.7120    0.7196    0.7112      1922

Notes:

Decoding is greedy with short label strings; loss-based and log-likelihood scoring produce similar ordering.
Ensure tokenizer uses pad_token = eos_token for batched evaluation.

Limitations and Intended Use

For moderated content classification; not intended for generating harmful content.
Biases present in the dataset may be reflected in predictions.

Acknowledgements

Built with LLaMA-Factory and PEFT.

Downloads last month: 1

Model tree for muditbaid/llama31-8b-hatexplain-lora

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1513)

this model

Evaluation results

accuracy on HateXplain
validation set self-reported

0.720
macro_f1 on HateXplain
validation set self-reported

0.700