CompactAI-O
/

Glint-0.1

Model card Files Files and versions

Glint-0.1 / README.md

CompactAI's picture

Upload README.md

2a21bc6 verified about 17 hours ago

|

history blame contribute delete

2.1 kB

	---
	license: gpl-3.0
	datasets:
	- shuyuej/English-Pretraining-Dataset
	- HuggingFaceFW/fineweb-edu
	- mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
	- tatsu-lab/alpaca
	- databricks/databricks-dolly-15k
	- TeichAI/Step-3.5-Flash-2600x
	- TeichAI/convo-v1
	language:
	- en
	tags:
	- small
	- glint
	new_version: CompactAI-O/Glint-0.2
	---

	# Glint-0.1

	> Once upon a time, there was a model that could only say `couldcouldoldbloodbloodbodybody`. This is its ancestor.

	Glint-0.1 is where the Glint line started. 1M parameters. Big dreams. Almost no ability to realize them. We look back on this one fondly, like a blurry photo of a puppy that chewed your shoes.

	## What you get

	\| File \| What it is \|
	\| :--- \| :--- \|
	\| `tokenizer.json` \| Hybrid word/char tokenizer (~2,111 tokens) \|
	\| `pretrain.pt` \| Base pretrained checkpoint \|
	\| `model.pt` \| Instruction-tuned checkpoint (SFT) \|

	## Specs

	\| Thing \| Value \|
	\| :--- \| :--- \|
	\| Architecture \| Transformer Decoder \|
	\| Parameters \| ~1 Million \|
	\| Context \| 2,048 tokens \|
	\| d_model \| 160 \|
	\| Layers \| 6 \|
	\| Heads \| 4 \|
	\| FFN \| 256 \|
	\| Vocab \| ~2,111 tokens (Hybrid Char + Word) \|
	\| Norm \| RMSNorm + QK-Norm \|
	\| Position \| RoPE \|
	\| Activation \| SwiGLU \|

	## What made this one special

	- Hybrid tokenizer -- word-level where it helps, character-level where it gets confused
	- QK-Norm -- RMSNorm on queries and keys so training doesnt blow up
	- Loss boosting -- yelled at the model extra hard when it ignored multi-character words
	- Response-start weighting -- made it actually pay attention to the first tokens of its answers
	- Pretrain replay -- kept mixing in pretrain data during SFT so it wouldnt forget how to speak English

	## Training curve

	![loss curve]({model/loss_curve.png})

	It went down. Slowly. Painfully.

	## Limitations

	- Repeats itself. A lot.
	- Knows almost nothing about the world.
	- Not useful for anything real. Research only.
	- Will embarrass itself if asked a direct question.

	---

	Built by [CompactAI](https://huggingface.co/CompactAI-O). We started somewhere.