Add BWSK model card
Browse files
README.md
CHANGED
|
@@ -188,13 +188,13 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 188 |
| Setting | Value |
|
| 189 |
|---------|-------|
|
| 190 |
| **Optimizer** | AdamW |
|
| 191 |
-
| **LR (fine-tune)** |
|
| 192 |
-
| **LR (from-scratch)** |
|
| 193 |
| **LR Schedule** | Cosine with warmup |
|
| 194 |
| **Max Grad Norm** | 1.0 |
|
| 195 |
| **Mixed Precision** | AMP (float16) |
|
| 196 |
| **Early Stopping** | Patience 3 |
|
| 197 |
-
| **Batch Size** |
|
| 198 |
| **Sequence Length** | 512 |
|
| 199 |
|
| 200 |
## Links
|
|
|
|
| 188 |
| Setting | Value |
|
| 189 |
|---------|-------|
|
| 190 |
| **Optimizer** | AdamW |
|
| 191 |
+
| **LR (fine-tune)** | 5e-05 |
|
| 192 |
+
| **LR (from-scratch)** | 3e-04 |
|
| 193 |
| **LR Schedule** | Cosine with warmup |
|
| 194 |
| **Max Grad Norm** | 1.0 |
|
| 195 |
| **Mixed Precision** | AMP (float16) |
|
| 196 |
| **Early Stopping** | Patience 3 |
|
| 197 |
+
| **Batch Size** | 2 |
|
| 198 |
| **Sequence Length** | 512 |
|
| 199 |
|
| 200 |
## Links
|