Update README.md
Browse files
README.md
CHANGED
|
@@ -58,7 +58,6 @@ logits = model(**inputs).logits # use top-k on tok.mask_token_id
|
|
| 58 |
- **Layers / heads / width:** 19 encoder layers, 8 attention heads, hidden size 512; intermediate (MLP) size 768; GELU activations.
|
| 59 |
- **Attention:** Local window 128 with **global attention every 3 layers**; RoPE θ=160k (local & global).
|
| 60 |
- **Positional strategy:** `position_embedding_type: "sans_pos"`.
|
| 61 |
-
- **Dropout:** attention/embedding/MLP dropouts set to 0.0 in the published config.
|
| 62 |
|
| 63 |
## Training data & procedure
|
| 64 |
|
|
|
|
| 58 |
- **Layers / heads / width:** 19 encoder layers, 8 attention heads, hidden size 512; intermediate (MLP) size 768; GELU activations.
|
| 59 |
- **Attention:** Local window 128 with **global attention every 3 layers**; RoPE θ=160k (local & global).
|
| 60 |
- **Positional strategy:** `position_embedding_type: "sans_pos"`.
|
|
|
|
| 61 |
|
| 62 |
## Training data & procedure
|
| 63 |
|