Upload README.md with huggingface_hub

5aca777 verified 4 months ago

2.97 kB

base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
  - llama-factory
  - lora
  - transformers
model-index:
  - name: llama31-8b-hatexplain-lora (checkpoint-3500)
    results:
      - task:
          type: text-classification
          name: Hate speech classification
        dataset:
          name: HateXplain
          type: hatexplain
          split: validation
        metrics:
          - type: accuracy
            value: 0.7196
          - type: f1
            name: macro_f1
            value: 0.6998

Llama 3.1 8B Instruct — HateXplain (LoRA adapter, ckpt-3500)

Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
Adapter: LoRA (rank=8, alpha=16, dropout=0.05, target=all)
Trainable params: ~0.26% (≈20.97M / 8.05B)
Task: hate speech detection (3 classes: hatespeech, offensive, normal)
Dataset: HateXplain (train/validation)
Epochs: 3

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter = "<your-username>/<your-adapter-repo>"  # or local path to this checkpoint

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

# Build a llama3-style prompt and generate the label ("hatespeech"/"offensive"/"normal")

Training Configuration

Finetuning type: LoRA
LoRA: rank=8, alpha=16, dropout=0.05, target=all
Precision: bf16 (with gradient checkpointing)
Per-device batch size: 1
Gradient accumulation: 8 (effective batch = 8)
Learning rate: 5e-5, scheduler: cosine, warmup ratio: 0.05
Epochs: 3
Template: llama3, cutoff length: 2048
Output dir: saves/llama31-8b/hatexplain/lora

Data

Source: HateXplain (datasets), majority-vote label from annotators
Splits: train (15,383), validation (1,922)
Format: Alpaca-style with instruction + input → label as assistant response

Evaluation (validation)

Overall:

Accuracy: 0.7196
Macro-F1: 0.6998

Per-class report:

              precision    recall  f1-score   support

  hatespeech     0.7252    0.8634    0.7883       593
   offensive     0.6152    0.4726    0.5346       548
      normal     0.7698    0.7836    0.7766       781

    accuracy                         0.7196      1922
   macro avg     0.7034    0.7065    0.6998      1922
weighted avg     0.7120    0.7196    0.7112      1922

Notes:

Decoding is greedy with short label strings; loss-based and log-likelihood scoring produce similar ordering.
Ensure tokenizer uses pad_token = eos_token for batched evaluation.

Limitations and Intended Use

For moderated content classification; not intended for generating harmful content.
Biases present in the dataset may be reflected in predictions.

Acknowledgements

Built with LLaMA-Factory and PEFT.