| --- |
| license: gpl-3.0 |
| datasets: |
| - shuyuej/English-Pretraining-Dataset |
| - HuggingFaceFW/fineweb-edu |
| - mattwesney/General_Inquiry_Thinking-Chain-Of-Thought |
| - tatsu-lab/alpaca |
| - databricks/databricks-dolly-15k |
| - TeichAI/Step-3.5-Flash-2600x |
| - TeichAI/convo-v1 |
| language: |
| - en |
| tags: |
| - small |
| - glint |
| new_version: CompactAI-O/Glint-0.2 |
| --- |
| |
| # Glint-0.1 |
|
|
| > Once upon a time, there was a model that could only say `couldcouldoldbloodbloodbodybody`. This is its ancestor. |
|
|
| Glint-0.1 is where the Glint line started. 1M parameters. Big dreams. Almost no ability to realize them. We look back on this one fondly, like a blurry photo of a puppy that chewed your shoes. |
|
|
| ## What you get |
|
|
| | File | What it is | |
| | :--- | :--- | |
| | `tokenizer.json` | Hybrid word/char tokenizer (~2,111 tokens) | |
| | `pretrain.pt` | Base pretrained checkpoint | |
| | `model.pt` | Instruction-tuned checkpoint (SFT) | |
|
|
| ## Specs |
|
|
| | Thing | Value | |
| | :--- | :--- | |
| | **Architecture** | Transformer Decoder | |
| | **Parameters** | ~1 Million | |
| | **Context** | 2,048 tokens | |
| | **d_model** | 160 | |
| | **Layers** | 6 | |
| | **Heads** | 4 | |
| | **FFN** | 256 | |
| | **Vocab** | ~2,111 tokens (Hybrid Char + Word) | |
| | **Norm** | RMSNorm + QK-Norm | |
| | **Position** | RoPE | |
| | **Activation** | SwiGLU | |
| |
| ## What made this one special |
| |
| - **Hybrid tokenizer** -- word-level where it helps, character-level where it gets confused |
| - **QK-Norm** -- RMSNorm on queries and keys so training doesnt blow up |
| - **Loss boosting** -- yelled at the model extra hard when it ignored multi-character words |
| - **Response-start weighting** -- made it actually pay attention to the first tokens of its answers |
| - **Pretrain replay** -- kept mixing in pretrain data during SFT so it wouldnt forget how to speak English |
| |
| ## Training curve |
| |
|  |
| |
| It went down. Slowly. Painfully. |
| |
| ## Limitations |
| |
| - Repeats itself. A lot. |
| - Knows almost nothing about the world. |
| - Not useful for anything real. Research only. |
| - Will embarrass itself if asked a direct question. |
| |
| --- |
| |
| *Built by [CompactAI](https://huggingface.co/CompactAI-O). We started somewhere.* |
| |