---
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- llama-factory
- lora
- transformers
model-index:
- name: llama31-8b-hatexplain-lora (checkpoint-3500)
  results:
  - task:
      type: text-classification
      name: Hate speech classification
    dataset:
      name: HateXplain
      type: hatexplain
      split: validation
    metrics:
    - type: accuracy
      value: 0.7196
    - type: f1
      name: macro_f1
      value: 0.6998
---

# Llama 3.1 8B Instruct — HateXplain (LoRA adapter, ckpt-3500)

- Base model: `meta-llama/Meta-Llama-3.1-8B-Instruct`
- Adapter: LoRA (rank=8, alpha=16, dropout=0.05, target=all)
- Trainable params: ~0.26% (≈20.97M / 8.05B)
- Task: hate speech detection (3 classes: `hatespeech`, `offensive`, `normal`)
- Dataset: HateXplain (train/validation)
- Epochs: 3

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter = "<your-username>/<your-adapter-repo>"  # or local path to this checkpoint

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

# Build a llama3-style prompt and generate the label ("hatespeech"/"offensive"/"normal")
```

## Training Configuration

- Finetuning type: LoRA
- LoRA: rank=8, alpha=16, dropout=0.05, target=all
- Precision: bf16 (with gradient checkpointing)
- Per-device batch size: 1
- Gradient accumulation: 8 (effective batch = 8)
- Learning rate: 5e-5, scheduler: cosine, warmup ratio: 0.05
- Epochs: 3
- Template: `llama3`, cutoff length: 2048
- Output dir: `saves/llama31-8b/hatexplain/lora`

## Data

- Source: HateXplain (`datasets`), majority-vote label from annotators
- Splits: train (15,383), validation (1,922)
- Format: Alpaca-style with `instruction` + `input` → label as assistant response

## Evaluation (validation)

Overall:
- Accuracy: 0.7196
- Macro-F1: 0.6998

Per-class report:

```
              precision    recall  f1-score   support

  hatespeech     0.7252    0.8634    0.7883       593
   offensive     0.6152    0.4726    0.5346       548
      normal     0.7698    0.7836    0.7766       781

    accuracy                         0.7196      1922
   macro avg     0.7034    0.7065    0.6998      1922
weighted avg     0.7120    0.7196    0.7112      1922
```

Notes:
- Decoding is greedy with short label strings; loss-based and log-likelihood scoring produce similar ordering.
- Ensure tokenizer uses `pad_token = eos_token` for batched evaluation.

## Limitations and Intended Use

- For moderated content classification; not intended for generating harmful content.
- Biases present in the dataset may be reflected in predictions.

## Acknowledgements

- Built with [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) and PEFT.