Glint-0.1 / README.md
CompactAI's picture
Upload README.md
2a21bc6 verified
---
license: gpl-3.0
datasets:
- shuyuej/English-Pretraining-Dataset
- HuggingFaceFW/fineweb-edu
- mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
- tatsu-lab/alpaca
- databricks/databricks-dolly-15k
- TeichAI/Step-3.5-Flash-2600x
- TeichAI/convo-v1
language:
- en
tags:
- small
- glint
new_version: CompactAI-O/Glint-0.2
---
# Glint-0.1
> Once upon a time, there was a model that could only say `couldcouldoldbloodbloodbodybody`. This is its ancestor.
Glint-0.1 is where the Glint line started. 1M parameters. Big dreams. Almost no ability to realize them. We look back on this one fondly, like a blurry photo of a puppy that chewed your shoes.
## What you get
| File | What it is |
| :--- | :--- |
| `tokenizer.json` | Hybrid word/char tokenizer (~2,111 tokens) |
| `pretrain.pt` | Base pretrained checkpoint |
| `model.pt` | Instruction-tuned checkpoint (SFT) |
## Specs
| Thing | Value |
| :--- | :--- |
| **Architecture** | Transformer Decoder |
| **Parameters** | ~1 Million |
| **Context** | 2,048 tokens |
| **d_model** | 160 |
| **Layers** | 6 |
| **Heads** | 4 |
| **FFN** | 256 |
| **Vocab** | ~2,111 tokens (Hybrid Char + Word) |
| **Norm** | RMSNorm + QK-Norm |
| **Position** | RoPE |
| **Activation** | SwiGLU |
## What made this one special
- **Hybrid tokenizer** -- word-level where it helps, character-level where it gets confused
- **QK-Norm** -- RMSNorm on queries and keys so training doesnt blow up
- **Loss boosting** -- yelled at the model extra hard when it ignored multi-character words
- **Response-start weighting** -- made it actually pay attention to the first tokens of its answers
- **Pretrain replay** -- kept mixing in pretrain data during SFT so it wouldnt forget how to speak English
## Training curve
![loss curve]({model/loss_curve.png})
It went down. Slowly. Painfully.
## Limitations
- Repeats itself. A lot.
- Knows almost nothing about the world.
- Not useful for anything real. Research only.
- Will embarrass itself if asked a direct question.
---
*Built by [CompactAI](https://huggingface.co/CompactAI-O). We started somewhere.*