Update README.md
Browse files
README.md
CHANGED
|
@@ -66,13 +66,13 @@ This makes BLOOM a representative example of multilingual BPE tokenization.
|
|
| 66 |
|
| 67 |
## Model Architecture
|
| 68 |
|
| 69 |
-
- **Architecture:**
|
| 70 |
- **Non-embedding parameters:** ~1B
|
| 71 |
- **Context length:** 4096 tokens
|
| 72 |
- **Framework:** Meta Lingua
|
| 73 |
-
- **Initialization:** Shared super-vocabulary initialization
|
| 74 |
|
| 75 |
-
The architecture and
|
| 76 |
|
| 77 |
---
|
| 78 |
|
|
|
|
| 66 |
|
| 67 |
## Model Architecture
|
| 68 |
|
| 69 |
+
- **Architecture:** Decoder-only Transformer (Lingua's Llama-3.2-1B configuration)
|
| 70 |
- **Non-embedding parameters:** ~1B
|
| 71 |
- **Context length:** 4096 tokens
|
| 72 |
- **Framework:** Meta Lingua
|
| 73 |
+
- **Initialization:** Shared super-vocabulary initialization across TokSuite models
|
| 74 |
|
| 75 |
+
The architecture and training setup are identical across all TokSuite models; only the tokenizer differs.
|
| 76 |
|
| 77 |
---
|
| 78 |
|