thebajajra commited on
Commit
60d043f
·
verified ·
1 Parent(s): 62bbcdb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -1
README.md CHANGED
@@ -58,7 +58,6 @@ logits = model(**inputs).logits # use top-k on tok.mask_token_id
58
  - **Layers / heads / width:** 19 encoder layers, 8 attention heads, hidden size 512; intermediate (MLP) size 768; GELU activations.
59
  - **Attention:** Local window 128 with **global attention every 3 layers**; RoPE θ=160k (local & global).
60
  - **Positional strategy:** `position_embedding_type: "sans_pos"`.
61
- - **Dropout:** attention/embedding/MLP dropouts set to 0.0 in the published config.
62
 
63
  ## Training data & procedure
64
 
 
58
  - **Layers / heads / width:** 19 encoder layers, 8 attention heads, hidden size 512; intermediate (MLP) size 768; GELU activations.
59
  - **Attention:** Local window 128 with **global attention every 3 layers**; RoPE θ=160k (local & global).
60
  - **Positional strategy:** `position_embedding_type: "sans_pos"`.
 
61
 
62
  ## Training data & procedure
63