LLM training - a pszemraj Collection

pszemraj 's Collections

Grammar Synthesis

BookSum-based Summarizers

OCR Quality Classifiers

LLM training

updated Jul 28, 2025

small-scale pretraining experiments of mine

BEE-spoke-data/smol_llama-101M-GQA

Text Generation • 0.1B • Updated Dec 29, 2025 • 812 • 33
BEE-spoke-data/smol_llama-220M-GQA

Text Generation • 0.2B • Updated Dec 29, 2025 • 757 • 13
BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu

Text Generation • 0.2B • Updated Dec 29, 2025 • 67 • 1

Note smol_llama-220M-GQA CPT on fineweb-edu for 10 billion tokens
BEE-spoke-data/smol_llama-81M-tied

Text Generation • 81.3M • Updated Dec 29, 2025 • 300 • 10
BEE-spoke-data/mega-ar-126m-4k

Text Generation • 0.1B • Updated 5 days ago • 341 • 4
BEE-spoke-data/verysmol_llama-v11-KIx2

Text Generation • 58.1M • Updated Dec 29, 2025 • 271 • 4
pszemraj/pythia-31m-KI_v1-2048-scratch

Text Generation • 30.5M • Updated Dec 29, 2025 • 230
BEE-spoke-data/bert-plus-L8-4096-v1.0

Fill-Mask • 88.1M • Updated Dec 29, 2025 • 9
BEE-spoke-data/mega-encoder-small-16k-v1

Fill-Mask • 0.1B • Updated Dec 29, 2025 • 8 • 4
BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI

Text Generation • 0.2B • Updated Dec 29, 2025 • 22 • 2

Note this is a mid-training checkpoint of what is now smol_llama-220M
pszemraj/jamba-900M-v0.13-KIx2

Text Generation • 0.9B • Updated Dec 29, 2025 • 17 • 4