--- license: gpl-3.0 datasets: - shuyuej/English-Pretraining-Dataset - HuggingFaceFW/fineweb-edu - mattwesney/General_Inquiry_Thinking-Chain-Of-Thought - tatsu-lab/alpaca - databricks/databricks-dolly-15k - TeichAI/Step-3.5-Flash-2600x - TeichAI/convo-v1 language: - en tags: - small - glint new_version: CompactAI-O/Glint-0.2 --- # Glint-0.1 > Once upon a time, there was a model that could only say `couldcouldoldbloodbloodbodybody`. This is its ancestor. Glint-0.1 is where the Glint line started. 1M parameters. Big dreams. Almost no ability to realize them. We look back on this one fondly, like a blurry photo of a puppy that chewed your shoes. ## What you get | File | What it is | | :--- | :--- | | `tokenizer.json` | Hybrid word/char tokenizer (~2,111 tokens) | | `pretrain.pt` | Base pretrained checkpoint | | `model.pt` | Instruction-tuned checkpoint (SFT) | ## Specs | Thing | Value | | :--- | :--- | | **Architecture** | Transformer Decoder | | **Parameters** | ~1 Million | | **Context** | 2,048 tokens | | **d_model** | 160 | | **Layers** | 6 | | **Heads** | 4 | | **FFN** | 256 | | **Vocab** | ~2,111 tokens (Hybrid Char + Word) | | **Norm** | RMSNorm + QK-Norm | | **Position** | RoPE | | **Activation** | SwiGLU | ## What made this one special - **Hybrid tokenizer** -- word-level where it helps, character-level where it gets confused - **QK-Norm** -- RMSNorm on queries and keys so training doesnt blow up - **Loss boosting** -- yelled at the model extra hard when it ignored multi-character words - **Response-start weighting** -- made it actually pay attention to the first tokens of its answers - **Pretrain replay** -- kept mixing in pretrain data during SFT so it wouldnt forget how to speak English ## Training curve ![loss curve]({model/loss_curve.png}) It went down. Slowly. Painfully. ## Limitations - Repeats itself. A lot. - Knows almost nothing about the world. - Not useful for anything real. Research only. - Will embarrass itself if asked a direct question. --- *Built by [CompactAI](https://huggingface.co/CompactAI-O). We started somewhere.*