Fu01978
/

TinyLM

Text Generation

Model card Files Files and versions

TinyLM / README.md

Fu01978's picture

Update README.md

cd764fb verified 5 days ago

|

history blame contribute delete

1.53 kB

	---
	language: en
	license: mit
	tags:
	- tiny
	- language-model
	- causal-lm
	- pytorch
	datasets:
	- roneneldan/TinyStories
	- Skylion007/openwebtext
	pipeline_tag: text-generation
	library_name: transformers
	---

	# TinyLM

	A 3.4M parameter causal language model trained from scratch, for experimentation.

	## Architecture

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| Parameters \| 3.403.968 \|
	\| Layers \| 4 \|
	\| Hidden size \| 64 \|
	\| Attention heads \| 4 \|
	\| FFN dim \| 192 \|
	\| Embedding rank \| 32 \|
	\| Context length \| 256 \|
	\| Tokenizer \| GPT-2 (50257 vocab) \|

	Uses a factored (low-rank) embedding to keep the vocab projection from eating the entire parameter budget, with weight tying on the output head.

	## Training

	\| \| \|
	\|---\|---\|
	\| Datasets \| Skylion007/openwebtext (10k samples), roneneldan/TinyStories (10k samples) \|
	\| Optimizer \| AdamW (lr=3e-3, weight_decay=0.01) \|
	\| Scheduler \| Cosine annealing with warm restarts \|
	\| Mixed precision \| fp16 (torch.cuda.amp) \|
	\| Hardware \| Nvidia P100 \|

	## Usage
	```python
	from huggingface_hub import snapshot_download
	import importlib.util
	import torch

	# Download files
	snapshot_download(repo_id="Fu01978/TinyLM", local_dir="./tinylm")

	# Load via script
	spec = importlib.util.spec_from_file_location("modeling_tinylm", "./tinylm/modeling_tinylm.py")
	module = importlib.util.module_from_spec(spec)
	spec.loader.exec_module(module)

	model, tokenizer, config = module.load_tinylm("./tinylm")
	model.eval()

	# Generate
	output = module.generate(model, tokenizer, "Once upon a time, ")
	print(output)
	```