| --- |
| license: agpl-3.0 |
| library_name: pytorch |
| tags: |
| - tiny-lm |
| - goldfish |
| - transformer |
| - rope |
| - swiglu |
| pipeline_tag: text-generation |
| base_model: [] |
| --- |
| |
| # GlubLM (36M) |
|
|
| > *the language model that already forgot this sentence* |
|
|
| **GlubLM** is a 36-million-parameter transformer that plays the character of a goldfish with a 10-second memory. Inspired by [GuppyLM](https://github.com/arman-bd/guppylm) by Arman BD and Ted Lasso's meditation on the goldfish as "the happiest animal on earth", GlubLM has a hard 96-token context window - it *physically* cannot remember what was just said. |
|
|
| Try it live: [browser demo](https://den-sec.github.io/glublm/) | [pixel-art desk pet](https://den-sec.github.io/glublm/desk-pet/) |
|
|
| ## Architecture |
|
|
| - **Parameters**: 36,055,680 (36.1M) |
| - **Layers**: 8 decoder-only transformer blocks |
| - **Hidden dim**: 640 |
| - **Attention heads**: 10 (head dim 64) |
| - **FFN dim**: 1280 (SwiGLU, effective intermediate 2560) |
| - **Normalization**: RMSNorm |
| - **Position encoding**: Rotary (RoPE) |
| - **Vocabulary**: 5,120 Byte-Level BPE |
| - **Max context**: 96 tokens (hard cap, the "10-second memory") |
| - **Weight-tied LM head** |
| - **No bias terms** |
|
|
| ## Intended use |
|
|
| This model is a toy. It exists to: |
| 1. Explore the design tension between "small + simple" (GuppyLM's thesis) and "small + modern" (GlubLM's hypothesis) |
| 2. Demonstrate an LLM-generated dataset pipeline using a multi-agent Claude team |
| 3. Be a fun browser demo and a pixel-art desk pet companion |
|
|
| **Do not use GlubLM for anything serious.** It literally forgets within a sentence. |
|
|
| ## Training data |
|
|
| Trained on [`DenSec02/glublm-60k-ted`](https://huggingface.co/datasets/DenSec02/glublm-60k-ted), a 60,549-sample dataset of single-turn goldfish conversations generated by a team of four coordinated Claude agents (generator, critic, diversifier, persona-guardian). Composition: v4 balanced mix (20K poetic + 15K supplement + 5K conversational + 15K forgetful) augmented with v5.1 empathic/introspective hotfix (1K samples) + v5.2 multi-anchor self-awareness recovery (500 samples). |
|
|
| **Explicit exclusions**: no references to football, soccer, coaches, teams, or any Ted Lasso show characters. |
|
|
| ## Training |
|
|
| - **Hardware**: NVIDIA RTX 3060 12GB (local) |
| - **Framework**: PyTorch 2.x, BF16 mixed precision |
| - **Optimizer**: AdamW (b1=0.9, b2=0.95), weight decay 0.1 |
| - **LR schedule**: cosine with 5% warmup, peak 3e-4 |
| - **Batch size**: 64 |
| - **Epochs**: 15 |
| - **Dropout**: 0.1 (residual), 0.0 (attention) |
| - **Gradient clipping**: 1.0 |
| - **Final loss**: 1.1442 |
| - **Wall time**: ~15 minutes |
|
|
| ## Evaluation (v2 cross-model judge) |
|
|
| Dual-judge evaluation using Claude Sonnet 4.6 and Opus 4.7 on a 30-prompt rubric across 4 axes (integer 1-5 scale). Each axis aggregates 30 prompts x 3 seeds x 2 passes = 180 scoring rows per judge. |
|
|
| ### Per-axis score (mean) |
|
|
| | Axis | Sonnet 4.6 | Opus 4.7 | |
| |---|---:|---:| |
| | Conversational Quality | 4.01 | 4.15 | |
| | Goldfish Identity | 3.89 | 3.67 | |
| | Forgetful Trait | 3.80 | 3.81 | |
| | Length Appropriateness | 4.77 | 4.57 | |
|
|
| ### Cross-judge agreement (Cohen's quadratic-weighted kappa) |
|
|
| | Axis | Kappa | Interpretation | |
| |---|---:|---| |
| | Conversational Quality | 0.77 | substantial | |
| | Goldfish Identity | 0.83 | almost perfect | |
| | Forgetful Trait | 0.86 | almost perfect | |
| | Length Appropriateness | 0.59 | moderate | |
|
|
| **Interpretation**: Sonnet and Opus agree almost perfectly on 3/4 axes, validating that the rubric is interpretable consistently across LLM judges. Opus tends to be systematically ~0.2 stricter than Sonnet on the Identity axis (stricter rubric application, not judge bias). |
|
|
| Full methodology + 108-row long-format scores: [`eval/report_crossmodel.md`](https://github.com/Den-Sec/glublm/blob/master/eval/report_crossmodel.md). |
|
|
| ## Limitations & biases |
|
|
| - **Hard context limit**: 96 tokens. Inputs longer than a few short sentences will be truncated. |
| - **Goldfish worldview**: the model genuinely does not understand human abstractions outside the bowl. |
| - **Dataset bias**: the dataset was generated by Claude (Anthropic), so it inherits Claude's language patterns filtered through the goldfish persona. |
| - **Single-turn only**: multi-turn memory is a non-goal. |
| - **English only**. |
| - **Stochastic and occasionally incoherent**: 36M params on 60K samples is small. Do not expect reliability. |
|
|
| ## How to use |
|
|
| ```python |
| from glublm.config import ModelConfig |
| from glublm.model import GlubLM |
| from glublm.tokenizer import GlubTokenizer |
| from glublm.inference import generate |
| from huggingface_hub import hf_hub_download |
| from safetensors.torch import load_model |
| |
| tok_path = hf_hub_download("DenSec02/glublm-36m", "tokenizer.json") |
| weights_path = hf_hub_download("DenSec02/glublm-36m", "model.safetensors") |
| |
| tok = GlubTokenizer.from_file(tok_path) |
| cfg = ModelConfig(vocab_size=tok.vocab_size) |
| model = GlubLM(cfg) |
| load_model(model, weights_path) |
| |
| print(generate(model=model, tokenizer=tok, prompt="hello", max_new_tokens=24)) |
| ``` |
|
|
| Or try it in-browser with zero setup: |
| - [Chat demo](https://den-sec.github.io/glublm/) (simple web UI) |
| - [Desk pet companion](https://den-sec.github.io/glublm/desk-pet/) (pixel-art PWA) |
| - [Colab notebook](https://colab.research.google.com/github/Den-Sec/glublm/blob/master/notebooks/train_colab.ipynb) (train your own goldfish) |
|
|
| ## License |
|
|
| AGPL-3.0 - see [LICENSE](https://github.com/Den-Sec/glublm/blob/master/LICENSE). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @software{glublm_2026, |
| author = {Sepede, Dennis}, |
| title = {GlubLM: a 36M goldfish language model with a 10-second memory}, |
| year = {2026}, |
| url = {https://github.com/Den-Sec/glublm} |
| } |
| ``` |
|
|