LiteGPT-Base / README.md
Keerthi Raajan
Upload README.md with huggingface_hub
62ccc46 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - gpt2
  - pytorch
  - causal-lm
  - text-generation
  - fineweb
datasets:
  - HuggingFaceFW/fineweb-edu

LiteGPT-Base

This is a 124M parameter Language Model (GPT-2 Small architecture) pre-trained from scratch on the FineWeb-Edu dataset.

It is the base model for LiteGPT-Instruct.

Model Details

  • Architecture: GPT-2 Small (12 layers, 12 heads, 768 embedding dim)
  • Parameters: ~124 Million
  • Context Length: 1024 tokens
  • Training Data: 10 Billion tokens from FineWeb-Edu (Sample 10BT).
  • Tokenizer: GPT-2 (TikToken)

Usage

This is a completion model. It predicts the next tokens based on the input text. It is NOT an instruction-following model (chatbot).

Python Example

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("koganrath/LiteGPT-Base")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

text = "Once upon a time in a digital world,"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

  • Size: 124M parameters is small by modern standards.
  • Coherence: Long-form generation may lose coherence.
  • Knowledge: Limited to the training data cut-off and scope.

Authors

Trained by koganrath as part of the LiteGPT Project.