Dhi-5B Series
Collection
A Multimodal LLM trained from scratch
•
1 item
•
Updated
Dhi-5B-Base is the Pre-Trained model in the series of Dhi-5B.
The base varient is of 4 billion parameters. It is trained on 40 billion tokens natural language tokens from FineWeb-Edu dataset.
| Attribute | Details |
|---|---|
| Model Type | Pre-Trained LLM |
| Architecture | Custom |
| Number of Layers | 32 |
| Hidden Size | 3072 |
| MLP Type | SwiGLU |
| Attention Heads | 24 |
| Context Length | 4096 |
| Vocab Size | 64000 |
| Total Parameters | 4 billion |
| Training Data Size | 40 billion tokens |
| Batch Size | 2 million |