Tralalabs
/

PicoLM-15M

Text Generation

Model card Files Files and versions

Tralalabs commited on Mar 7

Commit

b63be69

·

verified ·

1 Parent(s): ba737d8

Upload README.md

Files changed (1) hide show

README.md +105 -0

README.md ADDED Viewed

	@@ -0,0 +1,105 @@

+---
+language:
+- en
+license: apache-2.0
+tags:
+- gpt2
+- language-model
+- pretraining
+- causal-lm
+- small-model
+datasets:
+- roneneldan/TinyStories
+- HuggingFaceFW/fineweb
+base_model: none
+pipeline_tag: text-generation
+---
+# PicoLM-15M
+A 19M parameter GPT-2 style causal language model pretrained from scratch on a mix of TinyStories and FineWeb web data. Trained in ~45 minutes on a single NVIDIA T4 GPU.
+## Model Details
+| Property | Value |
+|---|---|
+| Architecture | GPT-2 (decoder-only transformer) |
+| Parameters | ~19M |
+| Context length | 512 tokens |
+| Vocabulary size | 49,152 |
+| Layers | 8 |
+| Attention heads | 8 |
+| Hidden size | 256 |
+| FFN size | 1024 |
+| Tokenizer | SmolLM2-135M (HuggingFaceTB) |
+| Training steps | 8,000 |
+| Final loss | ~3.6–4.2 |
+## Training
+**Hardware:** Google Colab, NVIDIA T4 (15GB VRAM)
+**Dataset mix:**
+- 75% [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) — simple English stories
+- 25% [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (`sample-10BT`) — deduplicated Common Crawl web text
+**Training config:**
+- Optimizer: AdamW (lr=3e-4, weight_decay=0.1)
+- LR schedule: Cosine with 400 warmup steps
+- Batch size: 16 × 2 grad accum = effective batch 32
+- Mixed precision: fp16
+- Streaming: yes (no full dataset download)
+## Usage
+```python
+from transformers import AutoTokenizer, GPT2LMHeadModel, pipeline
+tokenizer = AutoTokenizer.from_pretrained("Tralalabs/PicoLM-15M")
+model = GPT2LMHeadModel.from_pretrained("Tralalabs/PicoLM-15M")
+gen = pipeline("text-generation", model=model, tokenizer=tokenizer)
+output = gen("Once upon a time", max_new_tokens=100, do_sample=True, temperature=0.8)
+print(output[0]["generated_text"])
+```
+## Sample Outputs
+**Prompt:** `Once upon a time`
+> Once upon a time, there was a little girl named Lily. She loved to play outside and play with her ball. One day, she's friend Lily came to play outside...
+**Prompt:** `The history of the internet`
+> The history of the internet. And the new world we have found in the last year of 110 in the world. The group of the people from the American leaders...
+**Prompt:** `Artificial intelligence is`
+> Artificial intelligence is not good, but not even not yet in order to bring on the world of the world...
+## Limitations
+- Small scale (19M params) — outputs are often repetitive or incoherent on complex prompts
+- Not instruction-tuned — this is a base pretrained model only
+- Undertrained relative to Chinchilla optimal (~300M tokens seen vs ~570M recommended)
+- Best suited for simple narrative/story generation due to TinyStories bias
+## Intended Use
+- Educational — learning how pretraining works
+- Baseline for fine-tuning experiments
+- Research on small language model behavior
+## Future Plans
+- PicoLM-15M-v2 with more steps (12,000) and better LR schedule
+- Instruction fine-tuning variant
+## Citation
+```
+@misc{picolm2026,
+  author = {Tralalabs},
+  title = {PicoLM-15M: A Small GPT-style Language Model},
+  year = {2026},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/Tralalabs/PicoLM-15M}
+}
+```