muditbaid
/

llama31-8b-hatexplain-lora

Text Generation

Eval Results (legacy)

Model card Files Files and versions

llama31-8b-hatexplain-lora / README.md

muditbaid's picture

Upload README.md with huggingface_hub

5aca777 verified 4 months ago

|

history blame contribute delete

2.97 kB

	---
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- llama-factory
	- lora
	- transformers
	model-index:
	- name: llama31-8b-hatexplain-lora (checkpoint-3500)
	results:
	- task:
	type: text-classification
	name: Hate speech classification
	dataset:
	name: HateXplain
	type: hatexplain
	split: validation
	metrics:
	- type: accuracy
	value: 0.7196
	- type: f1
	name: macro_f1
	value: 0.6998
	---

	# Llama 3.1 8B Instruct — HateXplain (LoRA adapter, ckpt-3500)

	- Base model: `meta-llama/Meta-Llama-3.1-8B-Instruct`
	- Adapter: LoRA (rank=8, alpha=16, dropout=0.05, target=all)
	- Trainable params: ~0.26% (≈20.97M / 8.05B)
	- Task: hate speech detection (3 classes: `hatespeech`, `offensive`, `normal`)
	- Dataset: HateXplain (train/validation)
	- Epochs: 3

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	base = "meta-llama/Meta-Llama-3.1-8B-Instruct"
	adapter = "<your-username>/<your-adapter-repo>" # or local path to this checkpoint

	tokenizer = AutoTokenizer.from_pretrained(base)
	model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
	model = PeftModel.from_pretrained(model, adapter)

	# Build a llama3-style prompt and generate the label ("hatespeech"/"offensive"/"normal")
	```

	## Training Configuration

	- Finetuning type: LoRA
	- LoRA: rank=8, alpha=16, dropout=0.05, target=all
	- Precision: bf16 (with gradient checkpointing)
	- Per-device batch size: 1
	- Gradient accumulation: 8 (effective batch = 8)
	- Learning rate: 5e-5, scheduler: cosine, warmup ratio: 0.05
	- Epochs: 3
	- Template: `llama3`, cutoff length: 2048
	- Output dir: `saves/llama31-8b/hatexplain/lora`

	## Data

	- Source: HateXplain (`datasets`), majority-vote label from annotators
	- Splits: train (15,383), validation (1,922)
	- Format: Alpaca-style with `instruction` + `input` → label as assistant response

	## Evaluation (validation)

	Overall:
	- Accuracy: 0.7196
	- Macro-F1: 0.6998

	Per-class report:

	```
	precision recall f1-score support

	hatespeech 0.7252 0.8634 0.7883 593
	offensive 0.6152 0.4726 0.5346 548
	normal 0.7698 0.7836 0.7766 781

	accuracy 0.7196 1922
	macro avg 0.7034 0.7065 0.6998 1922
	weighted avg 0.7120 0.7196 0.7112 1922
	```

	Notes:
	- Decoding is greedy with short label strings; loss-based and log-likelihood scoring produce similar ordering.
	- Ensure tokenizer uses `pad_token = eos_token` for batched evaluation.

	## Limitations and Intended Use

	- For moderated content classification; not intended for generating harmful content.
	- Biases present in the dataset may be reflected in predictions.

	## Acknowledgements

	- Built with [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) and PEFT.