--- language: - en license: apache-2.0 tags: - gpt2 - pytorch - causal-lm - text-generation - fineweb datasets: - HuggingFaceFW/fineweb-edu --- # LiteGPT-Base This is a **124M parameter** Language Model (GPT-2 Small architecture) pre-trained from scratch on the **FineWeb-Edu** dataset. It is the base model for [LiteGPT-Instruct](https://huggingface.co/koganrath/LiteGPT-Instruct). ## Model Details - **Architecture**: GPT-2 Small (12 layers, 12 heads, 768 embedding dim) - **Parameters**: ~124 Million - **Context Length**: 1024 tokens - **Training Data**: 10 Billion tokens from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (Sample 10BT). - **Tokenizer**: GPT-2 (TikToken) ## Usage This is a **completion model**. It predicts the next tokens based on the input text. It is NOT an instruction-following model (chatbot). ### Python Example ```python from transformers import GPT2LMHeadModel, GPT2Tokenizer model = GPT2LMHeadModel.from_pretrained("koganrath/LiteGPT-Base") tokenizer = GPT2Tokenizer.from_pretrained("gpt2") text = "Once upon a time in a digital world," inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Limitations - **Size**: 124M parameters is small by modern standards. - **Coherence**: Long-form generation may lose coherence. - **Knowledge**: Limited to the training data cut-off and scope. ## Authors Trained by **koganrath** as part of the LiteGPT Project.