Update README.md
Browse files
README.md
CHANGED
|
@@ -28,6 +28,7 @@ Method by method comparison, initial evaluation loss on Cosmopedia data:
|
|
| 28 |
|
| 29 |
* Full tuning (aka continued pretraining), batch 8: 1.615
|
| 30 |
* LISA fine-tuning, 4 layers switching every 10 steps, batch 8: 1.217
|
|
|
|
| 31 |
* Qlora fine-tuning, rank 256, scale factor 1, batch 8: 1.102
|
| 32 |
* Galore tuning, rank 256, scale factor 1, batch 8: 1.182
|
| 33 |
* This Model Stock merge of all 4 training methods: 1.038
|
|
@@ -42,6 +43,7 @@ Training set validation results:
|
|
| 42 |
* LISA Loss: 0.2534
|
| 43 |
* GaLore Loss: 0.2426
|
| 44 |
* QLoRA Loss: 0.2078
|
|
|
|
| 45 |
* Full Tune Loss: 0.2049
|
| 46 |
|
| 47 |
Overall ... not sure what to make of this, beyond that high-rank QLoRA is doing something particularly impressive while using only like 6GB of vRAM.
|
|
|
|
| 28 |
|
| 29 |
* Full tuning (aka continued pretraining), batch 8: 1.615
|
| 30 |
* LISA fine-tuning, 4 layers switching every 10 steps, batch 8: 1.217
|
| 31 |
+
* QLoRA with Dora (otherwise like below): 1.105
|
| 32 |
* Qlora fine-tuning, rank 256, scale factor 1, batch 8: 1.102
|
| 33 |
* Galore tuning, rank 256, scale factor 1, batch 8: 1.182
|
| 34 |
* This Model Stock merge of all 4 training methods: 1.038
|
|
|
|
| 43 |
* LISA Loss: 0.2534
|
| 44 |
* GaLore Loss: 0.2426
|
| 45 |
* QLoRA Loss: 0.2078
|
| 46 |
+
* QLoRA with Dora Loss: 0.2055 (almost identical training graph)
|
| 47 |
* Full Tune Loss: 0.2049
|
| 48 |
|
| 49 |
Overall ... not sure what to make of this, beyond that high-rank QLoRA is doing something particularly impressive while using only like 6GB of vRAM.
|