metadata
license: gpl-3.0
datasets:
- shuyuej/English-Pretraining-Dataset
- HuggingFaceFW/fineweb-edu
- mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
- tatsu-lab/alpaca
- databricks/databricks-dolly-15k
- TeichAI/Step-3.5-Flash-2600x
- TeichAI/convo-v1
language:
- en
tags:
- small
- glint
new_version: CompactAI-O/Glint-0.2
Glint-0.1
Once upon a time, there was a model that could only say
couldcouldoldbloodbloodbodybody. This is its ancestor.
Glint-0.1 is where the Glint line started. 1M parameters. Big dreams. Almost no ability to realize them. We look back on this one fondly, like a blurry photo of a puppy that chewed your shoes.
What you get
| File | What it is |
|---|---|
tokenizer.json |
Hybrid word/char tokenizer (~2,111 tokens) |
pretrain.pt |
Base pretrained checkpoint |
model.pt |
Instruction-tuned checkpoint (SFT) |
Specs
| Thing | Value |
|---|---|
| Architecture | Transformer Decoder |
| Parameters | ~1 Million |
| Context | 2,048 tokens |
| d_model | 160 |
| Layers | 6 |
| Heads | 4 |
| FFN | 256 |
| Vocab | ~2,111 tokens (Hybrid Char + Word) |
| Norm | RMSNorm + QK-Norm |
| Position | RoPE |
| Activation | SwiGLU |
What made this one special
- Hybrid tokenizer -- word-level where it helps, character-level where it gets confused
- QK-Norm -- RMSNorm on queries and keys so training doesnt blow up
- Loss boosting -- yelled at the model extra hard when it ignored multi-character words
- Response-start weighting -- made it actually pay attention to the first tokens of its answers
- Pretrain replay -- kept mixing in pretrain data during SFT so it wouldnt forget how to speak English
Training curve
It went down. Slowly. Painfully.
Limitations
- Repeats itself. A lot.
- Knows almost nothing about the world.
- Not useful for anything real. Research only.
- Will embarrass itself if asked a direct question.
Built by CompactAI. We started somewhere.
