Shaligram-Dewangan
/

Dhi-5B-Base

Text Generation

Model card Files Files and versions

Dhi-5B-Base

Dhi-5B-Base is the Pre-Trained model in the series of Dhi-5B.

The base varient is of 4 billion parameters. It is trained on 40 billion tokens natural language tokens from FineWeb-Edu dataset.

Evaluations

Evaluation Plots

Model Card

Attribute	Details
Model Type	Pre-Trained LLM
Architecture	Custom
Number of Layers	32
Hidden Size	3072
MLP Type	SwiGLU
Attention Heads	24
Context Length	4096
Vocab Size	64000
Total Parameters	4 billion
Training Data Size	40 billion tokens
Batch Size	2 million

Downloads last month: 10

Dataset used to train Shaligram-Dewangan/Dhi-5B-Base

Collection including Shaligram-Dewangan/Dhi-5B-Base

Dhi-5B Series

A Multimodal LLM trained from scratch • 1 item • Updated Feb 12 • 5