kmkrworks
/

LiteGPT-Base

Text Generation

Model card Files Files and versions

LiteGPT-Base / README.md

Keerthi Raajan

Upload README.md with huggingface_hub

62ccc46 verified about 2 months ago

|

history blame contribute delete

1.6 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- gpt2
	- pytorch
	- causal-lm
	- text-generation
	- fineweb
	datasets:
	- HuggingFaceFW/fineweb-edu
	---

	# LiteGPT-Base

	This is a 124M parameter Language Model (GPT-2 Small architecture) pre-trained from scratch on the FineWeb-Edu dataset.

	It is the base model for [LiteGPT-Instruct](https://huggingface.co/koganrath/LiteGPT-Instruct).

	## Model Details

	- Architecture: GPT-2 Small (12 layers, 12 heads, 768 embedding dim)
	- Parameters: ~124 Million
	- Context Length: 1024 tokens
	- Training Data: 10 Billion tokens from [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (Sample 10BT).
	- Tokenizer: GPT-2 (TikToken)

	## Usage

	This is a completion model. It predicts the next tokens based on the input text. It is NOT an instruction-following model (chatbot).

	### Python Example

	```python
	from transformers import GPT2LMHeadModel, GPT2Tokenizer

	model = GPT2LMHeadModel.from_pretrained("koganrath/LiteGPT-Base")
	tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

	text = "Once upon a time in a digital world,"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Limitations

	- Size: 124M parameters is small by modern standards.
	- Coherence: Long-form generation may lose coherence.
	- Knowledge: Limited to the training data cut-off and scope.

	## Authors

	Trained by koganrath as part of the LiteGPT Project.