|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- perplexity |
|
|
base_model: |
|
|
- openai-community/gpt2 |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
tags: |
|
|
- gpt2 |
|
|
- toy-llm |
|
|
- from-scratch |
|
|
- huggingface |
|
|
- transformers |
|
|
- english |
|
|
- casual-lm |
|
|
- educational |
|
|
--- |
|
|
|
|
|
# 🧠 CooperLM-354M |
|
|
|
|
|
**CooperLM-354M** is a 354 million parameter GPT-2 based language model trained from scratch on a filtered subset of Wikipedia, BookCorpus, and OpenWebText. It was created as a toy project to explore end-to-end LLM training using Hugging Face’s Transformers and Datasets libraries. |
|
|
|
|
|
Github Repo: [https://github.com/daniel-mehta/CooperLM-354M](https://github.com/daniel-mehta/CooperLM-354M) |
|
|
--- |
|
|
|
|
|
## 🧱 Architecture |
|
|
|
|
|
- GPT-2 with 24 layers, 16 heads, 1024 hidden size |
|
|
- 256-token context window |
|
|
- Trained for 1 epoch on 100k samples (~1.2M sequences) |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Training Details |
|
|
|
|
|
| Setting | Value | |
|
|
|------------------------|-----------------| |
|
|
| Model Type | GPT2LMHeadModel | |
|
|
| Epochs | 1 | |
|
|
| Precision | fp16 | |
|
|
| Batch Size (effective) | 16 | |
|
|
| GPU | RTX 4060 | |
|
|
| Final Eval Loss | 5.63 | |
|
|
| Perplexity | ~263 | |
|
|
|
|
|
--- |
|
|
|
|
|
## 📥 Usage |
|
|
|
|
|
```python |
|
|
from transformers import GPT2LMHeadModel, GPT2TokenizerFast |
|
|
|
|
|
model = GPT2LMHeadModel.from_pretrained("mehta/CooperLM-354M") |
|
|
tokenizer = GPT2TokenizerFast.from_pretrained("mehta/CooperLM-354M") |
|
|
|
|
|
prompt = "In a distant future," |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
output = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.9, top_p=0.95) |
|
|
|
|
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 📝License |
|
|
MIT |