TinyLLM 75M OpenWebText Chat
This repository contains an experimental 75,074,112 parameter decoder-only tiny language model trained from scratch/near-scratch and then supervised-finetuned for chat.
Important quality note: This is a successful end-to-end training pipeline artifact and research toy model, not a production assistant. It can load and generate text, but factual accuracy, instruction following, arithmetic, and repetition control are weak.
Model summary
- Model name:
razor5050/tinyllm-75m-openwebtext-chat - Architecture: LLaMA/SmolLM-style decoder-only causal LM
- Parameters: 75,074,112
- Context length: 1024 tokens
- Vocabulary: 32,000 ByteLevel BPE tokens
- Tokenizer: custom ByteLevel BPE trained for this run
- Checkpoint format: PyTorch
.ptcheckpoints - Primary final checkpoint:
final.pt - Best checkpoint:
best.pt
Architecture
The model uses modern tiny-LM components:
- decoder-only causal Transformer
- RoPE positional embeddings
- RMSNorm
- SwiGLU MLP
- grouped-query/key-value reduction via fewer KV heads
- tied input/output token embeddings
- no attention/MLP bias
- PyTorch SDPA causal attention
Approximate config:
vocab_size: 32000
hidden_size: 576
num_hidden_layers: 16
num_attention_heads: 9
num_key_value_heads: 3
intermediate_size: 1536
max_position_embeddings: 1024
rope_theta: 10000.0
rms_norm_eps: 1e-5
tie_word_embeddings: true
attention_bias: false
mlp_bias: false
dropout: 0.0
Training data
Base pretraining
- Dataset:
Skylion007/openwebtext - Rows used: 1,000,000 selected rows
- Final tokenized train tokens: 1,143,301,833
- Final tokenized validation tokens: 34,486,473
- Epochs: 1
- Optimizer steps: 4,361
Chat/SFT
- Dataset:
HuggingFaceTB/smol-smoltalk - Train examples: 100,000
- Validation examples: 3,000
- Epochs: 1
- Optimizer steps: 781
- Loss masking: assistant-response tokens only
Training results
Pretraining
- Final/latest train loss near end: about
4.997 - Latest validation loss: about
5.049at step 4000
SFT
- SFT completed at step
781 - Validation trend:
- step 250:
2.6031 - step 500:
2.4505 - step 750:
2.3313
- step 250:
SFT improved chat formatting and response style, but the model remains very small and undertrained by modern assistant standards.
Hardware/run
- Cloud GPU: Vast.ai RTX 5070 Ti, 16GB VRAM
- Precision: CUDA/PyTorch mixed precision during training where supported
- Checkpointing: periodic
latest,best, final, and step checkpoints - Training artifacts were preserved separately outside the instance before teardown.
Files in this repo
final.ptβ final SFT checkpointbest.ptβ best SFT checkpointlatest.ptβ latest SFT checkpointmetrics.jsonlβ SFT metricsstep_609.ptβ intermediate SFT checkpointtokenizer/vocab.jsonandtokenizer/merges.txtβ tokenizer filesconfigs/model_75m.yamlβ architecture configsrc/tinyllm/β minimal PyTorch model implementationscripts/infer_tinyllm.pyβ simple local inference helper
Quick inference
Clone/download the repo, install dependencies, then run:
pip install torch tokenizers pyyaml huggingface_hub
python scripts/infer_tinyllm.py \
--checkpoint final.pt \
--prompt "What is the capital of France?"
The chat prompt format used during SFT is:
<|system|>
You are a helpful, concise assistant.
<|end|>
<|user|>
USER_QUESTION
<|end|>
<|assistant|>
Observed sample behavior
In a post-upload local inference test, the model generated text and loaded cleanly, but quality was mixed:
- Correct on: βWhat is the capital of France?β β answered Paris, with repetition.
- Weak on: simple science/world facts, often rambling or hallucinating.
- Weak on: arithmetic and short-answer discipline.
- Repetition and generic phrasing are common.
This is expected for a 75M-parameter scratch-trained model with about 1.14B pretraining tokens and one SFT pass.
Limitations
- Not suitable for factual QA or production use.
- Hallucinates frequently.
- Repetition loops occur.
- Arithmetic is unreliable.
- Safety behavior was not evaluated.
- Model is not aligned beyond basic supervised chat finetuning.
- The checkpoint is a custom PyTorch model, not a standard
transformersmodel class.
Intended use
- Educational tiny-LLM experiment
- Pipeline validation
- Small-model architecture experimentation
- Baseline for future 150M+ runs
Recommended next steps
To improve quality meaningfully:
- Train a larger ~150M model.
- Use more unique pretraining tokens, e.g. ~5B+.
- Improve preprocessing/tokenization throughput with multiprocessing/sharding.
- Add stronger instruction data and possibly preference tuning.
- Export to a standard Hugging Face
transformerscompatible format.
Citation / attribution
Training datasets:
Skylion007/openwebtextHuggingFaceTB/smol-smoltalk
This repository is an experimental model artifact from a custom tiny-LLM training pipeline.