File size: 5,305 Bytes
1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 1d2ca43 cf5d8e7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | ---
license: gpl-3.0
datasets:
- HuggingFaceFW/fineweb-edu
- mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
- tatsu-lab/alpaca
- databricks/databricks-dolly-15k
- TeichAI/Step-3.5-Flash-2600x
- TeichAI/convo-v1
language:
- en
tags:
- small
- glint
new_version: CompactAI-O/Glint-0.4
---
# Glint-0.3
> A 1M-parameter language model that speaks English, technically.
```
WARNING: This model was trained on a shoestring budget and a prayer.
It does not answer questions correctly. It does not follow instructions well.
It does, however, occasionally produce output that sounds profound
until you read it twice.
```
---
## Overview
Glint-0.3 is a tiny autoregressive language model with about one million parameters. That is not a typo. While everyone else is measuring models in billions, we went the other direction. Mostly because we could.
It was trained on 100 tokens per parameter. The math: 1M params x 100 tokens = ~100M total. Two thirds went to pretraining (~66.7M tokens), then we used the rest to fine-tune on instruction data.
This does not make the model smart. It just makes it slightly less confused than it would have been otherwise.
Funny thing: the pretrained checkpoint sounds more fluent than the instruction-tuned one. Turns out teaching a model to follow directions can make it worse at saying things. Who knew.
---
## The one claim we are allowed to make
At 100 tokens per parameter, Glint-0.3 is one of the most efficient models at this scale. Against another 1M model trained on fewer tokens, these weights would likely win.
We are not saying state of the art. We are saying we got more out of the data than most would. Modest victory. We will take it.
---
## Sample outputs
### Instruction-tuned
```
User: Is food safe to eat?
Model: As an Age, in relation (a rush-rict), a decisios is the **exactly* estimate**
(each labeling at * *Crise*).
When you cannot find all faces (doinins one way tea t become flawed) are no
longer rartints like awarded, n but wheret dogs control crashed ttega y
```
### Pretrained
```
User: A cat, commonly known as the
Model: primary current burden of transportation wasnt impossible. A recence includes
a personal crisis for participated populatably, to make sure technology
systems change from an illustrious phase orbit oncold, rock areas, along
with localizators outside again
```
If you find meaning in these, that says more about you than the model.
---
## Specs
| Thing | Value |
|-------|-------|
| Parameters | ~1,000,000 |
| Language | English |
| Tokenization | Word-level |
| Architecture | Lightweight Transformer |
| Total Tokens | ~100M (100 tokens/param) |
| Pretraining Tokens | ~66.7M |
| Instruction Tokens | ~33.3M |
| Throughput | ~1M tokens/sec |
---
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "CompactAI-O/Glint-0.3"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "A cat, commonly known as the"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
Try temperature between 0.8 and 1.2 for peak confusion.
---
## What is this actually for?
- Generating writing prompts nobody asked for
- Studying how small models fail in charming ways
- Populating game worlds with NPCs that speak in riddles
- Teaching that bigger is not always better
- Entertaining yourself during long training runs
## What is it not for
- Facts. Any facts.
- Customer support
- Medical, legal, or financial advice (oh hell no)
- Replacing a search engine
- Expecting it to know what it is talking about
---
## Why does this exist?
We wondered what would happen if you trained a very small model on a very large dataset and then asked it to talk. The answer, as you can see, is complicated.
We put two thirds of the token budget into pretraining and used the rest to nudge it toward instruction following. This does not produce a capable assistant. It produces a model that learned as much as it could, given the constraints.
This is part of CompactAI, an ongoing exploration of language modeling at the edge of feasibility. Interesting things happen when you remove the safety net of scale. Sometimes those things are useful. Sometimes they are just funny.
---
## Contributing
We welcome:
- Bug reports, especially if the failure case is entertaining
- Prompts that coax unexpectedly poetic output from this thing
- Research collaborations on ultra-small model dynamics
- Ideas for making a 1M parameter model slightly less confused
Please do not submit PRs that add more parameters. That defeats the purpose.
---
## Citation
```
@misc{glint03,
title={Glint-0.3: A 1M-Parameter English Language Model for Experimental Use},
author={CompactAI},
year={2026},
howpublished={\url{https://huggingface.co/CompactAI-O/Glint-0.3}},
note={Trained with hope. Deploy with caution.}
}
```
---
> The model generates text. Whether that text means anything is a question for philosophers.
Train small. Expect less. Laugh anyway.
---
*Built by [CompactAI](https://huggingface.co/CompactAI-O).*
|