Dhi-5B-Base

Dhi-5B-Base is the Pre-Trained model in the series of Dhi-5B.

The base varient is of 4 billion parameters. It is trained on 40 billion tokens natural language tokens from FineWeb-Edu dataset.

Evaluations

Evaluation Plots

Model Card

Attribute Details
Model Type Pre-Trained LLM
Architecture Custom
Number of Layers 32
Hidden Size 3072
MLP Type SwiGLU
Attention Heads 24
Context Length 4096
Vocab Size 64000
Total Parameters 4 billion
Training Data Size 40 billion tokens
Batch Size 2 million
Downloads last month
65
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Shaligram-Dewangan/Dhi-5B-Base

Collection including Shaligram-Dewangan/Dhi-5B-Base