Reproducing GPT-2 for Indic Languages

This Model checkpoint is part of my project on "Reproducing GPT-2 for Indic Languages". Check out the main repository here: https://github.com/Shaligram-Dewangan/GPT-2-for-Indic-Languages

Model Description

This is a GPT-2 style 124M parameter model. Pre-trained on 20B English + Hindi tokens.

Model Card

Attribute Details
Model Type Decoder only Transformer
Architecture GPT - Dense
Number of Layers 12
Hidden Size 768
MLP Hidden Dim. 3072
Attention Heads 12
Context Length 1024
Vocab Size 50,304
Total Parameters ~124M
Training Type Pre-Trained
Dataset Fineweb-Edu and Fineweb-2
Languages English and Hindi
Training Data Size ~20 Billion tokens
Batch Size 524,288
Activation GELU
Training Time ~14 hours on 1x H100

References

  1. Vaswani et al. Attention Is All You Need. In NeurIPS, 2017. PDF

  2. Radford et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. PDF

  3. Brown et al. Language Models are Few-Shot Learners. In NeurIPS, 2020. PDF

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train Shaligram-Dewangan/GPT-2-for-Indic-Languages

Papers for Shaligram-Dewangan/GPT-2-for-Indic-Languages