luciaquirke commited on
Commit
27654f5
·
verified ·
1 Parent(s): 3f30ec9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ base_model: meta-llama/Llama-2-7b-hf
4
+ library_name: peft
5
+ tags:
6
+ - lora
7
+ - warmup
8
+ - less
9
+ - data-attribution
10
+ ---
11
+
12
+ # Llama-2-7b-hf LESS Warmup Checkpoints
13
+
14
+ LoRA warmup checkpoints for [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf), trained following the [LESS](https://arxiv.org/abs/2402.04333) data selection pipeline. These checkpoints are used as the basis for gradient collection and influence scoring.
15
+
16
+ ## Checkpoints
17
+
18
+ Four epoch-end checkpoints are provided, one per warmup epoch:
19
+
20
+ | Checkpoint | Epoch | Step | Loss | Learning Rate |
21
+ |---|---|---|---|---|
22
+ | `checkpoint-106` | 1 | 106 | 0.7571 | 1.80e-05 |
23
+ | `checkpoint-212` | 2 | 212 | 0.8417 | 1.09e-05 |
24
+ | `checkpoint-318` | 3 | 318 | 0.7988 | 3.30e-06 |
25
+ | `checkpoint-424` | 4 | 424 | 0.7691 | 3.05e-10 |
26
+
27
+ ## Training Details
28
+
29
+ ### Dataset
30
+
31
+ 5% warmup fraction of [princeton-nlp/less_data](https://huggingface.co/datasets/princeton-nlp/less_data), packed with BFD packing strategy.
32
+
33
+ ### LoRA Configuration
34
+
35
+ | Parameter | Value |
36
+ |---|---|
37
+ | Rank (r) | 128 |
38
+ | Alpha | 512 |
39
+ | Dropout | 0.1 |
40
+ | Bias | none |
41
+ | Target modules | q_proj, k_proj, v_proj, o_proj |
42
+ | Task type | CAUSAL_LM |
43
+
44
+ ### Training Configuration
45
+
46
+ | Parameter | Value |
47
+ |---|---|
48
+ | Base model dtype | float32 |
49
+ | Training precision | bf16 |
50
+ | Epochs | 4 |
51
+ | Effective batch size | 128 |
52
+ | Per-device batch size | 4 |
53
+ | Gradient accumulation steps | 4 |
54
+ | Number of GPUs | 8 |
55
+ | Learning rate | 2e-5 |
56
+ | LR scheduler | Cosine |
57
+ | Warmup ratio | 0.05 |
58
+ | Max sequence length | 8192 |
59
+ | Packing | True |
60
+ | Gradient checkpointing | True |
61
+ | Optimizer | AdamW (torch) |
62
+ | Adam betas | (0.9, 0.999) |
63
+ | Adam epsilon | 1e-8 |
64
+ | Weight decay | 0.0 |
65
+ | Max grad norm | 1.0 |
66
+ | Seed | 42 |
67
+ | Total training steps | 424 |
68
+ | Total tokens seen | ~6.8M |
69
+
70
+ ### Launch Command
71
+
72
+ ```bash
73
+ torchrun --nproc_per_node 8 -m examples.less --pdbs 4
74
+ ```
75
+
76
+ ## Usage
77
+
78
+ ```python
79
+ from peft import PeftModel
80
+ from transformers import AutoModelForCausalLM
81
+
82
+ base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
83
+ model = PeftModel.from_pretrained(base_model, "EleutherAI/Llama-2-7b-hf-warmup", subfolder="checkpoint-106")
84
+ ```
85
+
86
+ ## Framework
87
+
88
+ Trained with [TRL](https://github.com/huggingface/trl) SFTTrainer (v0.29.0) and [PEFT](https://github.com/huggingface/peft) (v0.18.1).