muditbaid's picture
Upload README.md with huggingface_hub
5aca777 verified
metadata
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
  - llama-factory
  - lora
  - transformers
model-index:
  - name: llama31-8b-hatexplain-lora (checkpoint-3500)
    results:
      - task:
          type: text-classification
          name: Hate speech classification
        dataset:
          name: HateXplain
          type: hatexplain
          split: validation
        metrics:
          - type: accuracy
            value: 0.7196
          - type: f1
            name: macro_f1
            value: 0.6998

Llama 3.1 8B Instruct — HateXplain (LoRA adapter, ckpt-3500)

  • Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
  • Adapter: LoRA (rank=8, alpha=16, dropout=0.05, target=all)
  • Trainable params: ~0.26% (≈20.97M / 8.05B)
  • Task: hate speech detection (3 classes: hatespeech, offensive, normal)
  • Dataset: HateXplain (train/validation)
  • Epochs: 3

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter = "<your-username>/<your-adapter-repo>"  # or local path to this checkpoint

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

# Build a llama3-style prompt and generate the label ("hatespeech"/"offensive"/"normal")

Training Configuration

  • Finetuning type: LoRA
  • LoRA: rank=8, alpha=16, dropout=0.05, target=all
  • Precision: bf16 (with gradient checkpointing)
  • Per-device batch size: 1
  • Gradient accumulation: 8 (effective batch = 8)
  • Learning rate: 5e-5, scheduler: cosine, warmup ratio: 0.05
  • Epochs: 3
  • Template: llama3, cutoff length: 2048
  • Output dir: saves/llama31-8b/hatexplain/lora

Data

  • Source: HateXplain (datasets), majority-vote label from annotators
  • Splits: train (15,383), validation (1,922)
  • Format: Alpaca-style with instruction + input → label as assistant response

Evaluation (validation)

Overall:

  • Accuracy: 0.7196
  • Macro-F1: 0.6998

Per-class report:

              precision    recall  f1-score   support

  hatespeech     0.7252    0.8634    0.7883       593
   offensive     0.6152    0.4726    0.5346       548
      normal     0.7698    0.7836    0.7766       781

    accuracy                         0.7196      1922
   macro avg     0.7034    0.7065    0.6998      1922
weighted avg     0.7120    0.7196    0.7112      1922

Notes:

  • Decoding is greedy with short label strings; loss-based and log-likelihood scoring produce similar ordering.
  • Ensure tokenizer uses pad_token = eos_token for batched evaluation.

Limitations and Intended Use

  • For moderated content classification; not intended for generating harmful content.
  • Biases present in the dataset may be reflected in predictions.

Acknowledgements