Update README.md
Browse files
README.md
CHANGED
|
@@ -80,7 +80,7 @@ I healed the model by doing a full weight DPO finetune for 139k samples (3.15 ep
|
|
| 80 |
|
| 81 |
Prior to healing, the model returned absolute gibberish to any prompt, rarely two real words together. For example, give "2+2=" it might return "Mahmisan Pannpyout Na RMITa CMI TTi GP BP GP RSi TBi DD PS..."
|
| 82 |
|
| 83 |
-
The results are pretty good! The model has issues, but could have legitimate uses. It carry on a conversation. It's certainly usable, if not useful.
|
| 84 |
|
| 85 |
Truthfulness and commonsense reasoning suffered the least from the prune / were healed the best. Knowledge and complex reasoning suffered the most.
|
| 86 |
This model has 67% the parameters of the original, and has:
|
|
@@ -94,7 +94,11 @@ An average of 69% the benchmark scores for 67% the parameters, not bad! (Note, I
|
|
| 94 |
I do believe it could be much better, by doing the pruning in stages (say, 4 layers at a time) with some healing in between, and longer healing at the end with a more diverse dataset.
|
| 95 |
|
| 96 |
### Benchmarks
|
| 97 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
|
| 99 |
## Why 5.4B?
|
| 100 |
This size should allow for:
|
|
@@ -132,4 +136,15 @@ slices:
|
|
| 132 |
- sources:
|
| 133 |
- layer_range: [29, 32]
|
| 134 |
model: meta-llama/Meta-Llama-3-8B-Instruct
|
| 135 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
Prior to healing, the model returned absolute gibberish to any prompt, rarely two real words together. For example, give "2+2=" it might return "Mahmisan Pannpyout Na RMITa CMI TTi GP BP GP RSi TBi DD PS..."
|
| 82 |
|
| 83 |
+
The results are pretty good! The model has issues, but could have legitimate uses. It can carry on a conversation. It's certainly usable, if not useful.
|
| 84 |
|
| 85 |
Truthfulness and commonsense reasoning suffered the least from the prune / were healed the best. Knowledge and complex reasoning suffered the most.
|
| 86 |
This model has 67% the parameters of the original, and has:
|
|
|
|
| 94 |
I do believe it could be much better, by doing the pruning in stages (say, 4 layers at a time) with some healing in between, and longer healing at the end with a more diverse dataset.
|
| 95 |
|
| 96 |
### Benchmarks
|
| 97 |
+

|
| 98 |
+
*Figure 1: Benchmark results for the pruned model, the original 8B model, and other models of similar size. Truthfulness and commonsense reasoning suffered the least from the prune / were healed the best. Knowledge and complex reasoning suffered the most.*
|
| 99 |
+
|
| 100 |
+

|
| 101 |
+
*Figure 2: Model size vs average benchmark performance. Llama3-5.4b-instruct may not be fully healed, but its performance scales linearly with its size.*
|
| 102 |
|
| 103 |
## Why 5.4B?
|
| 104 |
This size should allow for:
|
|
|
|
| 136 |
- sources:
|
| 137 |
- layer_range: [29, 32]
|
| 138 |
model: meta-llama/Meta-Llama-3-8B-Instruct
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
## Weights & Biases Logs
|
| 142 |
+
Here are the logs for the full weight fine tune:
|
| 143 |
+
- https://wandb.ai/haileycollet/llama3-5b/runs/ryyqhc97
|
| 144 |
+
- https://wandb.ai/haileycollet/llama3-5b/runs/fpj2sct3
|
| 145 |
+
- https://wandb.ai/haileycollet/llama3-5b/runs/k9z6n9em
|
| 146 |
+
- https://wandb.ai/haileycollet/llama3-5b/runs/r3xqyhm2
|
| 147 |
+
|
| 148 |
+
And the LoRA logs:
|
| 149 |
+
- https://wandb.ai/haileycollet/llama3-5b/runs/rseithn1
|
| 150 |
+
- https://wandb.ai/haileycollet/llama3-5b/runs/g26232ei
|