Update README.md
Browse files
README.md
CHANGED
|
@@ -61,6 +61,17 @@ Venus orbits the Sun in just 183 days (about 243 Earth days), but it's the large
|
|
| 61 |
- Wild variations between runs
|
| 62 |
- Partly censored/uncensored
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
## Licence
|
| 65 |
|
| 66 |
Apache 2.0
|
|
|
|
| 61 |
- Wild variations between runs
|
| 62 |
- Partly censored/uncensored
|
| 63 |
|
| 64 |
+
## Key Training Settings
|
| 65 |
+
|
| 66 |
+
- bf16 precision, cutoff length 2048
|
| 67 |
+
- LoRA fine-tuning: rank 16, alpha 32, dropout 0
|
| 68 |
+
- Learning rate: 5e-5 with cosine scheduler
|
| 69 |
+
- Gradient accumulation steps: 8, batch size: 1
|
| 70 |
+
- Flash attention (fa2), neat packing enabled
|
| 71 |
+
- One epoch over 100K samples
|
| 72 |
+
- Optimizer: AdamW (PyTorch)
|
| 73 |
+
- Warmup steps: 0
|
| 74 |
+
|
| 75 |
## Licence
|
| 76 |
|
| 77 |
Apache 2.0
|