Lambent
/

cosmo-1b-stock-pythontest

Text Generation

text-generation-inference

Model card Files Files and versions

Lambent commited on Apr 17, 2024

Commit

2abd74f

·

verified ·

1 Parent(s): 3ca31ed

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -28,6 +28,7 @@ Method by method comparison, initial evaluation loss on Cosmopedia data:
 * Full tuning (aka continued pretraining), batch 8: 1.615
 * LISA fine-tuning, 4 layers switching every 10 steps, batch 8: 1.217
 * Qlora fine-tuning, rank 256, scale factor 1, batch 8: 1.102
 * Galore tuning, rank 256, scale factor 1, batch 8: 1.182
 * This Model Stock merge of all 4 training methods: 1.038
@@ -42,6 +43,7 @@ Training set validation results:
 * LISA Loss: 0.2534
 * GaLore Loss: 0.2426
 * QLoRA Loss: 0.2078
 * Full Tune Loss: 0.2049
 Overall ... not sure what to make of this, beyond that high-rank QLoRA is doing something particularly impressive while using only like 6GB of vRAM.

 * Full tuning (aka continued pretraining), batch 8: 1.615
 * LISA fine-tuning, 4 layers switching every 10 steps, batch 8: 1.217
+* QLoRA with Dora (otherwise like below): 1.105
 * Qlora fine-tuning, rank 256, scale factor 1, batch 8: 1.102
 * Galore tuning, rank 256, scale factor 1, batch 8: 1.182
 * This Model Stock merge of all 4 training methods: 1.038
 * LISA Loss: 0.2534
 * GaLore Loss: 0.2426
 * QLoRA Loss: 0.2078
+* QLoRA with Dora Loss: 0.2055 (almost identical training graph)
 * Full Tune Loss: 0.2049
 Overall ... not sure what to make of this, beyond that high-rank QLoRA is doing something particularly impressive while using only like 6GB of vRAM.