HaileyStorm commited on
Commit
80a0b92
·
verified ·
1 Parent(s): 083f0a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -3
README.md CHANGED
@@ -80,7 +80,7 @@ I healed the model by doing a full weight DPO finetune for 139k samples (3.15 ep
80
 
81
  Prior to healing, the model returned absolute gibberish to any prompt, rarely two real words together. For example, give "2+2=" it might return "Mahmisan Pannpyout Na RMITa CMI TTi GP BP GP RSi TBi DD PS..."
82
 
83
- The results are pretty good! The model has issues, but could have legitimate uses. It carry on a conversation. It's certainly usable, if not useful.
84
 
85
  Truthfulness and commonsense reasoning suffered the least from the prune / were healed the best. Knowledge and complex reasoning suffered the most.
86
  This model has 67% the parameters of the original, and has:
@@ -94,7 +94,11 @@ An average of 69% the benchmark scores for 67% the parameters, not bad! (Note, I
94
  I do believe it could be much better, by doing the pruning in stages (say, 4 layers at a time) with some healing in between, and longer healing at the end with a more diverse dataset.
95
 
96
  ### Benchmarks
97
- {Benchmark images on their way...}
 
 
 
 
98
 
99
  ## Why 5.4B?
100
  This size should allow for:
@@ -132,4 +136,15 @@ slices:
132
  - sources:
133
  - layer_range: [29, 32]
134
  model: meta-llama/Meta-Llama-3-8B-Instruct
135
- ```
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
  Prior to healing, the model returned absolute gibberish to any prompt, rarely two real words together. For example, give "2+2=" it might return "Mahmisan Pannpyout Na RMITa CMI TTi GP BP GP RSi TBi DD PS..."
82
 
83
+ The results are pretty good! The model has issues, but could have legitimate uses. It can carry on a conversation. It's certainly usable, if not useful.
84
 
85
  Truthfulness and commonsense reasoning suffered the least from the prune / were healed the best. Knowledge and complex reasoning suffered the most.
86
  This model has 67% the parameters of the original, and has:
 
94
  I do believe it could be much better, by doing the pruning in stages (say, 4 layers at a time) with some healing in between, and longer healing at the end with a more diverse dataset.
95
 
96
  ### Benchmarks
97
+ ![Comparative Benchmarks](benchmarks.png)
98
+ *Figure 1: Benchmark results for the pruned model, the original 8B model, and other models of similar size. Truthfulness and commonsense reasoning suffered the least from the prune / were healed the best. Knowledge and complex reasoning suffered the most.*
99
+
100
+ ![Model Size vs Performance](relative.png)
101
+ *Figure 2: Model size vs average benchmark performance. Llama3-5.4b-instruct may not be fully healed, but its performance scales linearly with its size.*
102
 
103
  ## Why 5.4B?
104
  This size should allow for:
 
136
  - sources:
137
  - layer_range: [29, 32]
138
  model: meta-llama/Meta-Llama-3-8B-Instruct
139
+ ```
140
+
141
+ ## Weights & Biases Logs
142
+ Here are the logs for the full weight fine tune:
143
+ - https://wandb.ai/haileycollet/llama3-5b/runs/ryyqhc97
144
+ - https://wandb.ai/haileycollet/llama3-5b/runs/fpj2sct3
145
+ - https://wandb.ai/haileycollet/llama3-5b/runs/k9z6n9em
146
+ - https://wandb.ai/haileycollet/llama3-5b/runs/r3xqyhm2
147
+
148
+ And the LoRA logs:
149
+ - https://wandb.ai/haileycollet/llama3-5b/runs/rseithn1
150
+ - https://wandb.ai/haileycollet/llama3-5b/runs/g26232ei