CompactAI commited on
Commit
cf5d8e7
·
verified ·
1 Parent(s): 40f9af5

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -74
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- license: apache-2.0
3
  datasets:
4
  - HuggingFaceFW/fineweb-edu
5
  - mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
@@ -11,152 +11,163 @@ language:
11
  - en
12
  tags:
13
  - small
14
- - haiku
15
- new_version: CompactAI-O/TMLM-Haiku-2.3
16
  ---
17
- # TMLM-Haiku-2
 
18
 
19
  > A 1M-parameter language model that speaks English, technically.
20
 
21
- WARNING: This model was trained on a shoestring budget and a prayer.
22
- It does not answer questions correctly. It does not follow instructions well.
23
- It does, however, occasionally produce output that sounds profound until you read it twice.
 
 
 
24
 
25
  ---
26
 
27
  ## Overview
28
 
29
- TMLM-Haiku-2 is a tiny autoregressive language model with approximately one million parameters. That is not a typo. In an era where models are measured in billions, we went the other direction, mostly because we could.
30
 
31
- It was trained on English text at a ratio of 100 tokens per parameter. The math is simple: 1M parameters × 100 tokens = \~100M total tokens. We split this budget deliberately: roughly two-thirds (\~66.7M tokens) went into general pretraining, after which the resulting checkpoint was fine-tuned on instruction data using the remaining one-third (~33.3M tokens).
32
 
33
- This approach lets us squeeze more signal out of every parameter. It does not make the model smart. It just makes it slightly less confused than it would have been otherwise.
34
 
35
- Curiously, the pretrained checkpoint tends to produce more fluent output than the instruction-tuned version. It turns out that teaching a model to follow directions might actually make it worse at saying things. Who knew.
36
 
37
  ---
38
 
39
- ## Efficiency Claim (The One We Are Allowed to Make)
40
 
41
- At 100 tokens per parameter, TMLM-Haiku-2 is **one of the most efficient models out there** at this scale. If you were to compare it against another 1M parameter model trained on fewer tokens, these weights would likely outperform them.
42
 
43
- We are not claiming state of the art. We are claiming that, for a model this small, we got more out of the data than most would. That is a modest victory, but we will take it.
44
 
45
  ---
46
 
47
- ## Sample Outputs
48
 
49
- ### Instruction-Tuned
50
 
51
- User: Is food safe to eat?
52
- Model: As an Age, in relation (a rush-rict), a decisios is the **exactly* estimate**
53
- (each labeling at * *Crise*).
 
54
 
55
- When you cannot find all faces (doinins one way tea t become flawed) are no
56
- longer rartints like awarded, n but wheret dogs control crashed ttega y
 
57
 
58
  ### Pretrained
59
 
60
- User: A cat, commonly known as the
61
- Model: primary current burden of transportation wasnt impossible. A recence includes
62
- a personal crisis for participated populatably, to make sure technology
63
- systems change from an illustrious phase orbit oncold, rock areas, along
64
- with localizators outside again
 
 
65
 
66
- We recommend treating these outputs as experimental prose. If you find meaning in them, that is probably more about you than the model.
67
 
68
  ---
69
 
70
- ## Technical Specifications
71
 
72
- | Attribute | Value |
73
- |-----------|-------|
74
  | Parameters | ~1,000,000 |
75
  | Language | English |
76
  | Tokenization | Word-level |
77
  | Architecture | Lightweight Transformer |
78
  | Total Tokens | ~100M (100 tokens/param) |
79
- | Pretraining Tokens | ~66.7M (2/3 of budget) |
80
- | Instruction Tokens | ~33.3M (1/3 of budget) |
81
- | Target Throughput | ~1M tokens/sec |
82
- | License | MIT |
83
- | Repository | https://huggingface.co/CompactAI/TMLM-Haiku-2 |
84
 
85
  ---
86
 
87
- ## Getting Started
88
 
89
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
90
 
91
- model_name = "CompactAI/TMLM-Haiku-2"
92
- tokenizer = AutoTokenizer.from_pretrained(model_name)
93
- model = AutoModelForCausalLM.from_pretrained(model_name)
94
 
95
- prompt = "A cat, commonly known as the"
96
- inputs = tokenizer(prompt, return_tensors="pt")
97
- outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.9)
98
 
99
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
100
 
101
- Pro tip: Adjust temperature between 0.8 and 1.2 for optimal levels of confusion.
102
 
103
  ---
104
 
105
- ## Reasonable Use Cases
106
 
107
- - Generating creative writing prompts that nobody asked for
108
  - Studying how small models fail in charming ways
109
  - Populating game worlds with NPCs that speak in riddles
110
- - Teaching students why bigger is not always better (Pun intended)
111
- - Amusing yourself during long training runs
112
-
113
- ---
114
 
115
- ## Unreasonable Use Cases
116
 
117
- - Anything requiring factual accuracy
118
- - Customer support automation
119
  - Medical, legal, or financial advice (oh hell no)
120
  - Replacing a search engine
121
- - Expecting the model to know what it is talking about
122
 
123
  ---
124
 
125
- ## Philosophy
126
 
127
- TMLM-Haiku-2 exists because we wondered what would happen if we trained a very small model on a very large dataset and then asked it to talk. The answer, as you have seen, is complicated.
128
 
129
- The training strategy was simple: allocate two-thirds of the token budget to broad pretraining, then use the remainder to nudge the model toward instruction following. This does not produce a capable assistant. It does produce a model that learned as much as it could, given the constraints.
130
 
131
- This project is part of CompactAI, an ongoing effort to explore language modeling at the edge of feasibility. We believe that interesting things can happen when you remove the safety net of scale. Sometimes those things are useful. Sometimes they are just funny.
132
 
133
  ---
134
 
135
  ## Contributing
136
 
137
  We welcome:
138
- - Bug reports, especially those accompanied by entertaining failure cases
139
- - Prompts that coax unexpectedly poetic output from the model
140
- - Research collaborations focused on ultra-small model dynamics
141
- - Feedback on how to make a 1M parameter model slightly less confused
142
- Note:
143
- If its a bug we will include fixes in later stages of TMLM-Haiku & other varients if present
144
- Please do not submit pull requests that add more parameters. That defeats the purpose. Please.
145
 
146
  ---
147
 
148
  ## Citation
149
 
150
- @misc{tmlm-haiku-2-2026,
151
- title={TMLM-Haiku-2: A 1M-Parameter English Language Model for Experimental Use},
152
- author={CompactAI},
153
- year={2026},
154
- howpublished={\url{https://huggingface.co/CompactAI/TMLM-Haiku-2}},
155
- note={Trained with hope. Deploy with caution.}
156
- }
 
 
157
 
158
  ---
159
 
160
  > The model generates text. Whether that text means anything is a question for philosophers.
161
 
162
- Train small. Expect less. Laugh anyway.
 
 
 
 
 
1
  ---
2
+ license: gpl-3.0
3
  datasets:
4
  - HuggingFaceFW/fineweb-edu
5
  - mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
 
11
  - en
12
  tags:
13
  - small
14
+ - glint
15
+ new_version: CompactAI-O/Glint-0.4
16
  ---
17
+
18
+ # Glint-0.3
19
 
20
  > A 1M-parameter language model that speaks English, technically.
21
 
22
+ ```
23
+ WARNING: This model was trained on a shoestring budget and a prayer.
24
+ It does not answer questions correctly. It does not follow instructions well.
25
+ It does, however, occasionally produce output that sounds profound
26
+ until you read it twice.
27
+ ```
28
 
29
  ---
30
 
31
  ## Overview
32
 
33
+ Glint-0.3 is a tiny autoregressive language model with about one million parameters. That is not a typo. While everyone else is measuring models in billions, we went the other direction. Mostly because we could.
34
 
35
+ It was trained on 100 tokens per parameter. The math: 1M params x 100 tokens = ~100M total. Two thirds went to pretraining (~66.7M tokens), then we used the rest to fine-tune on instruction data.
36
 
37
+ This does not make the model smart. It just makes it slightly less confused than it would have been otherwise.
38
 
39
+ Funny thing: the pretrained checkpoint sounds more fluent than the instruction-tuned one. Turns out teaching a model to follow directions can make it worse at saying things. Who knew.
40
 
41
  ---
42
 
43
+ ## The one claim we are allowed to make
44
 
45
+ At 100 tokens per parameter, Glint-0.3 is one of the most efficient models at this scale. Against another 1M model trained on fewer tokens, these weights would likely win.
46
 
47
+ We are not saying state of the art. We are saying we got more out of the data than most would. Modest victory. We will take it.
48
 
49
  ---
50
 
51
+ ## Sample outputs
52
 
53
+ ### Instruction-tuned
54
 
55
+ ```
56
+ User: Is food safe to eat?
57
+ Model: As an Age, in relation (a rush-rict), a decisios is the **exactly* estimate**
58
+ (each labeling at * *Crise*).
59
 
60
+ When you cannot find all faces (doinins one way tea t become flawed) are no
61
+ longer rartints like awarded, n but wheret dogs control crashed ttega y
62
+ ```
63
 
64
  ### Pretrained
65
 
66
+ ```
67
+ User: A cat, commonly known as the
68
+ Model: primary current burden of transportation wasnt impossible. A recence includes
69
+ a personal crisis for participated populatably, to make sure technology
70
+ systems change from an illustrious phase orbit oncold, rock areas, along
71
+ with localizators outside again
72
+ ```
73
 
74
+ If you find meaning in these, that says more about you than the model.
75
 
76
  ---
77
 
78
+ ## Specs
79
 
80
+ | Thing | Value |
81
+ |-------|-------|
82
  | Parameters | ~1,000,000 |
83
  | Language | English |
84
  | Tokenization | Word-level |
85
  | Architecture | Lightweight Transformer |
86
  | Total Tokens | ~100M (100 tokens/param) |
87
+ | Pretraining Tokens | ~66.7M |
88
+ | Instruction Tokens | ~33.3M |
89
+ | Throughput | ~1M tokens/sec |
 
 
90
 
91
  ---
92
 
93
+ ## Usage
94
 
95
+ ```python
96
+ from transformers import AutoModelForCausalLM, AutoTokenizer
97
 
98
+ model_name = "CompactAI-O/Glint-0.3"
99
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
100
+ model = AutoModelForCausalLM.from_pretrained(model_name)
101
 
102
+ prompt = "A cat, commonly known as the"
103
+ inputs = tokenizer(prompt, return_tensors="pt")
104
+ outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.9)
105
 
106
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
107
+ ```
108
 
109
+ Try temperature between 0.8 and 1.2 for peak confusion.
110
 
111
  ---
112
 
113
+ ## What is this actually for?
114
 
115
+ - Generating writing prompts nobody asked for
116
  - Studying how small models fail in charming ways
117
  - Populating game worlds with NPCs that speak in riddles
118
+ - Teaching that bigger is not always better
119
+ - Entertaining yourself during long training runs
 
 
120
 
121
+ ## What is it not for
122
 
123
+ - Facts. Any facts.
124
+ - Customer support
125
  - Medical, legal, or financial advice (oh hell no)
126
  - Replacing a search engine
127
+ - Expecting it to know what it is talking about
128
 
129
  ---
130
 
131
+ ## Why does this exist?
132
 
133
+ We wondered what would happen if you trained a very small model on a very large dataset and then asked it to talk. The answer, as you can see, is complicated.
134
 
135
+ We put two thirds of the token budget into pretraining and used the rest to nudge it toward instruction following. This does not produce a capable assistant. It produces a model that learned as much as it could, given the constraints.
136
 
137
+ This is part of CompactAI, an ongoing exploration of language modeling at the edge of feasibility. Interesting things happen when you remove the safety net of scale. Sometimes those things are useful. Sometimes they are just funny.
138
 
139
  ---
140
 
141
  ## Contributing
142
 
143
  We welcome:
144
+ - Bug reports, especially if the failure case is entertaining
145
+ - Prompts that coax unexpectedly poetic output from this thing
146
+ - Research collaborations on ultra-small model dynamics
147
+ - Ideas for making a 1M parameter model slightly less confused
148
+
149
+ Please do not submit PRs that add more parameters. That defeats the purpose.
 
150
 
151
  ---
152
 
153
  ## Citation
154
 
155
+ ```
156
+ @misc{glint03,
157
+ title={Glint-0.3: A 1M-Parameter English Language Model for Experimental Use},
158
+ author={CompactAI},
159
+ year={2026},
160
+ howpublished={\url{https://huggingface.co/CompactAI-O/Glint-0.3}},
161
+ note={Trained with hope. Deploy with caution.}
162
+ }
163
+ ```
164
 
165
  ---
166
 
167
  > The model generates text. Whether that text means anything is a question for philosophers.
168
 
169
+ Train small. Expect less. Laugh anyway.
170
+
171
+ ---
172
+
173
+ *Built by [CompactAI](https://huggingface.co/CompactAI-O).*