HuggingFaceBio
/

Carbon-3B

Text Generation

Upper Grand Valley Dani

text-generation-inference

Model card Files Files and versions

loubnabnl HF Staff commited on 9 days ago

Commit

b2b5137

·

verified ·

1 Parent(s): 3743563

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -257,7 +257,7 @@ Sample sizes: Carbon & GENERator n=500. Evo2-7B n=150 at 16k, n=100 at 32k, n=20
 - **Architecture:** decoder-only Transformer (Llama-style), 30 layers, hidden 3072, FFN 8448, 32 attention heads with GQA (4 KV groups), SwiGLU, RMSNorm.
 - **Tokenizer:** Carbon 6-mer hybrid (vocab ≈ 156 k including DNA tags and metadata tokens and BPE tokens for future English & DNA continual pretraining).
 - **Precision:** bfloat16
-- **Positional embedding:** RoPE, base $\theta = 5 \times 10^{6}$, max position 32 768.
 ### Pre-training

 - **Architecture:** decoder-only Transformer (Llama-style), 30 layers, hidden 3072, FFN 8448, 32 attention heads with GQA (4 KV groups), SwiGLU, RMSNorm.
 - **Tokenizer:** Carbon 6-mer hybrid (vocab ≈ 156 k including DNA tags and metadata tokens and BPE tokens for future English & DNA continual pretraining).
 - **Precision:** bfloat16
+- **Positional embedding:** RoPE, base θ = 5 × 10^6, max position 32,768.
 ### Pre-training