Llama-2-7b-hf LESS Warmup Checkpoints

LoRA warmup checkpoints for Llama-2-7b-hf, trained following the LESS data selection pipeline. These checkpoints are used as the basis for gradient collection and influence scoring.

Checkpoints

Four epoch-end checkpoints are provided, one per warmup epoch:

Checkpoint Epoch Step Loss Learning Rate
checkpoint-106 1 106 0.7571 1.80e-05
checkpoint-212 2 212 0.8417 1.09e-05
checkpoint-318 3 318 0.7988 3.30e-06
checkpoint-424 4 424 0.7691 3.05e-10

Training Details

Dataset

5% warmup fraction of princeton-nlp/less_data, packed with BFD packing strategy.

LoRA Configuration

Parameter Value
Rank (r) 128
Alpha 512
Dropout 0.1
Bias none
Target modules q_proj, k_proj, v_proj, o_proj
Task type CAUSAL_LM

Training Configuration

Parameter Value
Base model dtype float32
Training precision bf16
Epochs 4
Effective batch size 128
Per-device batch size 4
Gradient accumulation steps 4
Number of GPUs 8
Learning rate 2e-5
LR scheduler Cosine
Warmup ratio 0.05
Max sequence length 8192
Packing True
Gradient checkpointing True
Optimizer AdamW (torch)
Adam betas (0.9, 0.999)
Adam epsilon 1e-8
Weight decay 0.0
Max grad norm 1.0
Seed 42
Total training steps 424
Total tokens seen ~6.8M

Launch Command

torchrun --nproc_per_node 8 -m examples.less --pdbs 4

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = PeftModel.from_pretrained(base_model, "EleutherAI/Llama-2-7b-hf-warmup", subfolder="checkpoint-106")

Framework

Trained with TRL SFTTrainer (v0.29.0) and PEFT (v0.18.1).

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EleutherAI/Llama-2-7b-hf-warmup

Adapter
(2339)
this model

Paper for EleutherAI/Llama-2-7b-hf-warmup