Harley-ml commited on
Commit
e8f674a
·
verified ·
1 Parent(s): 913f3bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +231 -3
README.md CHANGED
@@ -1,3 +1,231 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - tiny
7
+ - slm
8
+ - tlm
9
+ - llm
10
+ - small
11
+ - question-generator
12
+ - harley-ml
13
+ - small-language-model
14
+ - experiment
15
+ - experimental
16
+ - text-generation
17
+ - question-generation
18
+ - questions
19
+ - question
20
+ ---
21
+
22
+ # StopAskingQuestionsMini-656k
23
+ This model is small. Well, that's an understatement. But welcome to the world of tiny language models.
24
+ StopAskingQuestionsMini is a six-hundred and fifty-six thousand parameter language model trained on roughly 23 million tokens of questions without answers. That may sound counterintuitive:
25
+ > What is the point of generating questions with no answer?
26
+
27
+ There is no practical reason for doing so. However, this model wasn't built for practical use, it was built to answer the ongoing question that I am trying to answer:
28
+ > How much intellect can you stuff into a tiny model before it collapses?
29
+
30
+ This project, or any of our projects, don't truly answer this - because every day, there is always a new advancement. For example, DeepSeek created [Engram](https://arxiv.org/pdf/2601.07372), a novel architecture component that increases knowledge storage at very low compute cost.
31
+
32
+ Furthermore,
33
+
34
+ > What can this model even do?
35
+
36
+ Not much. It can generate partially coherent questions, and that's pretty much it.
37
+
38
+ ## Architecture
39
+
40
+ StopAskingQuestionsMini uses a scaled down version of the [Qwen3](https://arxiv.org/abs/2505.09388) architecture.
41
+
42
+
43
+ | Parameter | Value |
44
+ |-----------|-------|
45
+ | Hidden Layers | 2 |
46
+ | Hidden Size | 128 |
47
+ | Attention Heads | 2 |
48
+ | KV Heads | 2 |
49
+ | Intermediate Size | 512 |
50
+ | RoPE Theta | 10000.0 |
51
+ | Max Position Embeddings | 96 |
52
+ | Tie Word Embeddings | True |
53
+ | Vocab Size | 1024 |
54
+
55
+ ## Benchmarks
56
+
57
+ We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a question generation task:
58
+
59
+ | Model | Params | Avg Score | Coherent | Mostly Coherent | Partially Coherent | Incoherent |
60
+ |-------|--------|-----------|----------|-----------------|--------------------|------------|
61
+ | **StopAskingQuestionsMini** (this) | 656K | 0.4395 | 42 | 60 | 37 | 161 |
62
+ | GPT-2 | 117M | 0.3874 | 16 | 50 | 49 | 185 |
63
+ | SmolLM2-135M | 135M | 0.5193 | 36 | 98 | 40 | 111 |
64
+ | Qwen3-0.6B-Base | 600M | 0.7359 | 165 | 79 | 16 | 40 |
65
+
66
+ Each model generated three hundred continuations of the prefix `Question:`. [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) scored each one using a decimal grading system (0.0 to 1.0).
67
+ Our model generated the second highest number of coherent questions with less parameters than most character level RNNs.
68
+
69
+ ## Use Cases
70
+
71
+ Unfortunately, there is no practical use case as we stated earlier, but here are some interesting ideas:
72
+
73
+ 1. Test model for pipelines, code, and training
74
+ 2. Educational research on language models
75
+ 3. Experimentation on constrained hardware
76
+ 4. Or, more simply, for fun.
77
+
78
+ ## Limitations
79
+
80
+ Everything.
81
+ But more specifically,
82
+
83
+ 1. Cannot generate sentences, paragraphs, code, or anything other than questions
84
+ 2. Cannot reason
85
+ 3. Short context
86
+ 4. Incoherent
87
+
88
+ ## Inference
89
+
90
+ ```python
91
+ # =============================================================================
92
+ # Inference
93
+ # =============================================================================
94
+
95
+ MODEL_DIR = "harley-ml/StopAskingQuestionsMini-656k" # path
96
+ TOKENIZER_PATH = "harley-ml/StopAskingQuestionsMini-656k"
97
+
98
+ # --- Generation settings ---
99
+ PROMPT = "Question:" # prompt
100
+ MAX_NEW_TOKENS = 96
101
+ TEMPERATURE = 1.0
102
+ TOP_P = 0.95
103
+ TOP_K = 50
104
+ REPETITION_PENALTY = 1.1
105
+ DO_SAMPLE = True
106
+
107
+ # =============================================================================
108
+
109
+ import torch
110
+ from pathlib import Path
111
+ from transformers import (
112
+ AutoModelForCausalLM,
113
+ PreTrainedTokenizerFast,
114
+ AddedToken,
115
+ )
116
+
117
+ # ---------------------------------------------------------------------------
118
+ # Device
119
+ # ---------------------------------------------------------------------------
120
+
121
+ device = (
122
+ "cuda" if torch.cuda.is_available() else
123
+ "mps" if torch.backends.mps.is_available() else
124
+ "cpu"
125
+ )
126
+ print(f"Device : {device}")
127
+
128
+ # ---------------------------------------------------------------------------
129
+ # Tokenizer (mirrors training setup)
130
+ # ---------------------------------------------------------------------------
131
+
132
+ def load_tokenizer(path: str):
133
+ p = Path(path).resolve()
134
+ if not p.exists():
135
+ raise FileNotFoundError(f"Tokenizer not found: {p}")
136
+ tok = PreTrainedTokenizerFast(tokenizer_file=str(p))
137
+ specials = {}
138
+ if tok.bos_token is None: specials["bos_token"] = AddedToken("<|bos|>", special=True)
139
+ if tok.eos_token is None: specials["eos_token"] = AddedToken("<|eos|>", special=True)
140
+ if tok.unk_token is None: specials["unk_token"] = AddedToken("<|unk|>", special=True)
141
+ if tok.pad_token is None:
142
+ if tok.eos_token is not None:
143
+ tok.pad_token = tok.eos_token
144
+ else:
145
+ specials["pad_token"] = AddedToken("<|pad|>", special=True)
146
+ if specials:
147
+ tok.add_special_tokens(specials)
148
+ tok.padding_side = "left" # left-pad for batched generation
149
+ return tok
150
+
151
+ print("Loading tokenizer...")
152
+ tokenizer = load_tokenizer(TOKENIZER_PATH)
153
+ print(f" Vocab size : {tokenizer.vocab_size}")
154
+ print(f" BOS : {tokenizer.bos_token!r}")
155
+ print(f" EOS : {tokenizer.eos_token!r}")
156
+ print(f" PAD : {tokenizer.pad_token!r} (id={tokenizer.pad_token_id})")
157
+
158
+ # ---------------------------------------------------------------------------
159
+ # Model
160
+ # ---------------------------------------------------------------------------
161
+
162
+ print(f"\nLoading model from {MODEL_DIR} ...")
163
+ model = AutoModelForCausalLM.from_pretrained(
164
+ MODEL_DIR,
165
+ dtype=torch.float16 if device == "cuda" else torch.float32,
166
+ low_cpu_mem_usage=True,
167
+ )
168
+ model.eval()
169
+ model.to(device)
170
+
171
+ total_params = sum(p.numel() for p in model.parameters())
172
+ print(f" Parameters : {total_params:,}")
173
+
174
+ # ---------------------------------------------------------------------------
175
+ # Generation helper
176
+ # ---------------------------------------------------------------------------
177
+
178
+ def generate(
179
+ prompt: str = PROMPT,
180
+ max_new_tokens: int = MAX_NEW_TOKENS,
181
+ temperature: float = TEMPERATURE,
182
+ top_p: float = TOP_P,
183
+ top_k: int = TOP_K,
184
+ repetition_penalty: float = REPETITION_PENALTY,
185
+ do_sample: bool = DO_SAMPLE,
186
+ ) -> str:
187
+
188
+ bos = tokenizer.bos_token or ""
189
+ full_prompt = bos + prompt
190
+
191
+ inputs = tokenizer(
192
+ full_prompt,
193
+ return_tensors="pt",
194
+ add_special_tokens=False,
195
+ ).to(device)
196
+ inputs.pop("token_type_ids", None) # Qwen3 doesn't use this
197
+
198
+ gen_kwargs = dict(
199
+ max_new_tokens = max_new_tokens,
200
+ do_sample = do_sample,
201
+ repetition_penalty = repetition_penalty,
202
+ eos_token_id = tokenizer.eos_token_id,
203
+ pad_token_id = tokenizer.pad_token_id,
204
+ )
205
+ if do_sample:
206
+ gen_kwargs["temperature"] = temperature
207
+ gen_kwargs["top_p"] = top_p
208
+ gen_kwargs["top_k"] = top_k
209
+
210
+ with torch.inference_mode():
211
+ output_ids = model.generate(**inputs, **gen_kwargs)
212
+
213
+ # Strip the prompt tokens so we only return what was generated
214
+ prompt_len = inputs["input_ids"].shape[-1]
215
+ new_ids = output_ids[0][prompt_len:]
216
+ return tokenizer.decode(new_ids, skip_special_tokens=True)
217
+
218
+
219
+ # ---------------------------------------------------------------------------
220
+ # Run
221
+ # ---------------------------------------------------------------------------
222
+
223
+ if __name__ == "__main__":
224
+ print(f"\nPrompt : {PROMPT!r}")
225
+ print("-" * 60)
226
+
227
+ output = generate(PROMPT)
228
+
229
+ print("Generated:")
230
+ print(output)
231
+ ```