Update README.md
Browse files
README.md
CHANGED
|
@@ -53,18 +53,6 @@ predictions = tokenizer.decode(outputs.logits[0, 4].topk(5).indices[0])
|
|
| 53 |
# Expected: "hoofdstad" (capital)
|
| 54 |
```
|
| 55 |
|
| 56 |
-
## Model Architecture Differences
|
| 57 |
-
|
| 58 |
-
This model (`1024h-22L-2`) differs from the earlier `1024h-22L` variant:
|
| 59 |
-
|
| 60 |
-
| Parameter | 1024h-22L | 1024h-22L-2 (this model) |
|
| 61 |
-
|-----------|-----------|--------------------------|
|
| 62 |
-
| `intermediate_size` | 4096 | **1536** |
|
| 63 |
-
| `tokenizer` | `jhu-clsp/mmBERT-small` | **`yhavinga/dutch-llama-tokenizer`** |
|
| 64 |
-
| `vocab_size` | 256,000 | **32,128** |
|
| 65 |
-
|
| 66 |
-
The smaller intermediate MLP size and Dutch-specific tokenizer make this model more efficient while maintaining strong Dutch language understanding.
|
| 67 |
-
|
| 68 |
## Citation
|
| 69 |
|
| 70 |
If you use this model, please cite:
|
|
|
|
| 53 |
# Expected: "hoofdstad" (capital)
|
| 54 |
```
|
| 55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
## Citation
|
| 57 |
|
| 58 |
If you use this model, please cite:
|