File size: 2,095 Bytes
fb7b90b
2a21bc6
fb7b90b
 
 
 
 
 
 
 
 
 
 
 
2a21bc6
 
fb7b90b
 
2a21bc6
fb7b90b
2a21bc6
fb7b90b
2a21bc6
fb7b90b
2a21bc6
fb7b90b
2a21bc6
fb7b90b
2a21bc6
 
 
fb7b90b
2a21bc6
fb7b90b
2a21bc6
fb7b90b
 
 
2a21bc6
 
 
 
 
 
 
 
fb7b90b
 
2a21bc6
fb7b90b
2a21bc6
 
 
 
 
fb7b90b
2a21bc6
fb7b90b
2a21bc6
fb7b90b
2a21bc6
fb7b90b
2a21bc6
fb7b90b
2a21bc6
 
 
 
fb7b90b
 
 
2a21bc6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
license: gpl-3.0
datasets:
- shuyuej/English-Pretraining-Dataset
- HuggingFaceFW/fineweb-edu
- mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
- tatsu-lab/alpaca
- databricks/databricks-dolly-15k
- TeichAI/Step-3.5-Flash-2600x
- TeichAI/convo-v1
language:
- en
tags:
- small
- glint
new_version: CompactAI-O/Glint-0.2
---

# Glint-0.1

> Once upon a time, there was a model that could only say `couldcouldoldbloodbloodbodybody`. This is its ancestor.

Glint-0.1 is where the Glint line started. 1M parameters. Big dreams. Almost no ability to realize them. We look back on this one fondly, like a blurry photo of a puppy that chewed your shoes.

## What you get

| File | What it is |
| :--- | :--- |
| `tokenizer.json` | Hybrid word/char tokenizer (~2,111 tokens) |
| `pretrain.pt` | Base pretrained checkpoint |
| `model.pt` | Instruction-tuned checkpoint (SFT) |

## Specs

| Thing | Value |
| :--- | :--- |
| **Architecture** | Transformer Decoder |
| **Parameters** | ~1 Million |
| **Context** | 2,048 tokens |
| **d_model** | 160 |
| **Layers** | 6 |
| **Heads** | 4 |
| **FFN** | 256 |
| **Vocab** | ~2,111 tokens (Hybrid Char + Word) |
| **Norm** | RMSNorm + QK-Norm |
| **Position** | RoPE |
| **Activation** | SwiGLU |

## What made this one special

- **Hybrid tokenizer** -- word-level where it helps, character-level where it gets confused
- **QK-Norm** -- RMSNorm on queries and keys so training doesnt blow up
- **Loss boosting** -- yelled at the model extra hard when it ignored multi-character words
- **Response-start weighting** -- made it actually pay attention to the first tokens of its answers
- **Pretrain replay** -- kept mixing in pretrain data during SFT so it wouldnt forget how to speak English

## Training curve

![loss curve]({model/loss_curve.png})

It went down. Slowly. Painfully.

## Limitations

- Repeats itself. A lot.
- Knows almost nothing about the world.
- Not useful for anything real. Research only.
- Will embarrass itself if asked a direct question.

---

*Built by [CompactAI](https://huggingface.co/CompactAI-O). We started somewhere.*