mehta
/

CooperLM-354M

Text Generation

text-generation-inference

Model card Files Files and versions

CooperLM-354M / README.md

mehta's picture

Update README.md

1cbe431 verified 5 months ago

|

history blame contribute delete

1.59 kB

	---
	license: mit
	language:
	- en
	metrics:
	- perplexity
	base_model:
	- openai-community/gpt2
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- gpt2
	- toy-llm
	- from-scratch
	- huggingface
	- transformers
	- english
	- casual-lm
	- educational
	---

	# 🧠 CooperLM-354M

	CooperLM-354M is a 354 million parameter GPT-2 based language model trained from scratch on a filtered subset of Wikipedia, BookCorpus, and OpenWebText. It was created as a toy project to explore end-to-end LLM training using Hugging Face’s Transformers and Datasets libraries.

	Github Repo: [https://github.com/daniel-mehta/CooperLM-354M](https://github.com/daniel-mehta/CooperLM-354M)
	---

	## 🧱 Architecture

	- GPT-2 with 24 layers, 16 heads, 1024 hidden size
	- 256-token context window
	- Trained for 1 epoch on 100k samples (~1.2M sequences)

	---

	## 📊 Training Details

	\| Setting \| Value \|
	\|------------------------\|-----------------\|
	\| Model Type \| GPT2LMHeadModel \|
	\| Epochs \| 1 \|
	\| Precision \| fp16 \|
	\| Batch Size (effective) \| 16 \|
	\| GPU \| RTX 4060 \|
	\| Final Eval Loss \| 5.63 \|
	\| Perplexity \| ~263 \|

	---

	## 📥 Usage

	```python
	from transformers import GPT2LMHeadModel, GPT2TokenizerFast

	model = GPT2LMHeadModel.from_pretrained("mehta/CooperLM-354M")
	tokenizer = GPT2TokenizerFast.from_pretrained("mehta/CooperLM-354M")

	prompt = "In a distant future,"
	inputs = tokenizer(prompt, return_tensors="pt")
	output = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.9, top_p=0.95)

	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	---

	## 📝License
	MIT