Glint-0.1 / README.md

CompactAI

Upload README.md

2a21bc6 verified about 16 hours ago

preview code

raw

history blame contribute delete

2.1 kB

metadata

license: gpl-3.0
datasets:
  - shuyuej/English-Pretraining-Dataset
  - HuggingFaceFW/fineweb-edu
  - mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
  - tatsu-lab/alpaca
  - databricks/databricks-dolly-15k
  - TeichAI/Step-3.5-Flash-2600x
  - TeichAI/convo-v1
language:
  - en
tags:
  - small
  - glint
new_version: CompactAI-O/Glint-0.2

Glint-0.1

Once upon a time, there was a model that could only say couldcouldoldbloodbloodbodybody. This is its ancestor.

Glint-0.1 is where the Glint line started. 1M parameters. Big dreams. Almost no ability to realize them. We look back on this one fondly, like a blurry photo of a puppy that chewed your shoes.

What you get

File	What it is
`tokenizer.json`	Hybrid word/char tokenizer (~2,111 tokens)
`pretrain.pt`	Base pretrained checkpoint
`model.pt`	Instruction-tuned checkpoint (SFT)

Specs

Thing	Value
Architecture	Transformer Decoder
Parameters	~1 Million
Context	2,048 tokens
d_model	160
Layers	6
Heads	4
FFN	256
Vocab	~2,111 tokens (Hybrid Char + Word)
Norm	RMSNorm + QK-Norm
Position	RoPE
Activation	SwiGLU

What made this one special

Hybrid tokenizer -- word-level where it helps, character-level where it gets confused
QK-Norm -- RMSNorm on queries and keys so training doesnt blow up
Loss boosting -- yelled at the model extra hard when it ignored multi-character words
Response-start weighting -- made it actually pay attention to the first tokens of its answers
Pretrain replay -- kept mixing in pretrain data during SFT so it wouldnt forget how to speak English

Training curve

It went down. Slowly. Painfully.

Limitations

Repeats itself. A lot.
Knows almost nothing about the world.
Not useful for anything real. Research only.
Will embarrass itself if asked a direct question.

Built by CompactAI. We started somewhere.