Glint-0.1

Once upon a time, there was a model that could only say couldcouldoldbloodbloodbodybody. This is its ancestor.

Glint-0.1 is where the Glint line started. 1M parameters. Big dreams. Almost no ability to realize them. We look back on this one fondly, like a blurry photo of a puppy that chewed your shoes.

What you get

File What it is
tokenizer.json Hybrid word/char tokenizer (~2,111 tokens)
pretrain.pt Base pretrained checkpoint
model.pt Instruction-tuned checkpoint (SFT)

Specs

Thing Value
Architecture Transformer Decoder
Parameters ~1 Million
Context 2,048 tokens
d_model 160
Layers 6
Heads 4
FFN 256
Vocab ~2,111 tokens (Hybrid Char + Word)
Norm RMSNorm + QK-Norm
Position RoPE
Activation SwiGLU

What made this one special

  • Hybrid tokenizer -- word-level where it helps, character-level where it gets confused
  • QK-Norm -- RMSNorm on queries and keys so training doesnt blow up
  • Loss boosting -- yelled at the model extra hard when it ignored multi-character words
  • Response-start weighting -- made it actually pay attention to the first tokens of its answers
  • Pretrain replay -- kept mixing in pretrain data during SFT so it wouldnt forget how to speak English

Training curve

loss curve

It went down. Slowly. Painfully.

Limitations

  • Repeats itself. A lot.
  • Knows almost nothing about the world.
  • Not useful for anything real. Research only.
  • Will embarrass itself if asked a direct question.

Built by CompactAI. We started somewhere.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train CompactAI-O/Glint-0.1

Space using CompactAI-O/Glint-0.1 1

Collection including CompactAI-O/Glint-0.1