--- license: mit language: - en tags: - tiny - slm - tlm - llm - small - question-generator - harley-ml - small-language-model - experiment - experimental - text-generation - question-generation - questions - question --- # StopAskingQuestionsMini-656k This model is small. Well, that's an understatement. But welcome to the world of tiny language models. StopAskingQuestionsMini is a six-hundred and fifty-six thousand parameter language model trained on roughly 23 million tokens of questions without answers. That may sound counterintuitive: > What is the point of generating questions with no answer? There is no practical reason for doing so. However, this model wasn't built for practical use, it was built to answer the ongoing question that I am trying to answer: > How much intellect can you stuff into a tiny model before it collapses? This project, or any of our projects, don't truly answer this - because every day, there is always a new advancement. For example, DeepSeek created [Engram](https://arxiv.org/pdf/2601.07372), a novel architecture component that increases knowledge storage at very low compute cost. Furthermore, > What can this model even do? Not much. It can generate partially coherent questions, and that's pretty much it. ## Architecture StopAskingQuestionsMini uses a scaled down version of the [Qwen3](https://arxiv.org/abs/2505.09388) architecture. | Parameter | Value | |-----------|-------| | Hidden Layers | 2 | | Hidden Size | 128 | | Attention Heads | 2 | | KV Heads | 2 | | Intermediate Size | 512 | | RoPE Theta | 10000.0 | | Max Position Embeddings | 96 | | Tie Word Embeddings | True | | Vocab Size | 1024 | ## Training StopAskingQuestionsMini trained on 23 million tokens of questions for two epochs with a batch size of 16. ### Training Results | Epoch | Train Loss | Eval Loss | Train PPL | Eval PPL | |-------|------------|-----------|-----------|----------| | 0.07 | 4.0797 | 3.0011 | 59.05 | 20.11 | | 0.22 | 2.6331 | 2.5703 | 13.92 | 13.07 | | 0.37 | 2.4906 | 2.4586 | 12.07 | 11.68 | | 0.52 | 2.4213 | 2.3989 | 11.26 | 11.01 | | 0.66 | 2.3700 | 2.3552 | 10.70 | 10.54 | | 0.81 | 2.3375 | 2.3242 | 10.35 | 10.22 | | 0.96 | 2.3094 | 2.2949 | 10.07 | 9.92 | | 1.11 | 2.2720 | 2.2746 | 9.70 | 9.72 | | 1.26 | 2.2527 | 2.2533 | 9.51 | 9.52 | | 1.40 | 2.2345 | 2.2367 | 9.34 | 9.36 | | 1.55 | 2.2239 | 2.2212 | 9.24 | 9.22 | | 1.70 | 2.2043 | 2.2044 | 9.06 | 9.06 | | 1.85 | 2.1885 | 2.1930 | 8.92 | 8.96 | | 1.99 | 2.1843 | 2.1854 | 8.88 | 8.90 | ## Benchmarks We benchmarked our model against GPT-2, SmolLM-135M, and Qwen3-0.6B-Base on a question generation task: | Model | Params | Avg Score | Coherent | Mostly Coherent | Partially Coherent | Incoherent | |-------|--------|-----------|----------|-----------------|--------------------|------------| | **StopAskingQuestionsMini** (this) | 656K | 0.4395 | 42 | 60 | 37 | 161 | | GPT-2 | 117M | 0.3874 | 16 | 50 | 49 | 185 | | SmolLM2-135M | 135M | 0.5193 | 36 | 98 | 40 | 111 | | Qwen3-0.6B-Base | 600M | 0.7359 | 165 | 79 | 16 | 40 | Each model generated two to three hundred continuations of the prefix `Question:`. [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) scored each one using a decimal grading system (0.0 to 1.0). Our model generated the second highest number of coherent questions with less parameters than most character level RNNs. ## Generations Prompt: **`Question:`** Generation1: ```text what legal reforms faced rafer leadership during ww1? ``` Generation2: ```text How many emissions should a frather? ``` Generation3: ```text What do foreigners do? ``` Generation4: ```text What is the best appropriate way to learn Japanese? ``` Generation5: ```text How much is the MDU and JavaScript to the new UK? ``` ## Use Cases Unfortunately, there is no practical use case as we stated earlier, but here are some interesting ideas: 1. Test model for pipelines, code, and training 2. Educational research on language models 3. Experimentation on constrained hardware 4. Or, more simply, for fun. ## Limitations Everything. But more specifically, 1. Cannot generate sentences, paragraphs, code, or anything other than questions 2. Cannot reason 3. Short context 4. Incoherent ## Inference ```python # ============================================================================= # Inference # ============================================================================= MODEL_DIR = "harley-ml/StopAskingQuestionsMini-656k" # path TOKENIZER_PATH = "harley-ml/StopAskingQuestionsMini-656k" # --- Generation settings --- PROMPT = "Question:" # prompt MAX_NEW_TOKENS = 96 TEMPERATURE = 1.0 TOP_P = 0.95 TOP_K = 50 REPETITION_PENALTY = 1.1 DO_SAMPLE = True # ============================================================================= import torch from pathlib import Path from transformers import ( AutoModelForCausalLM, PreTrainedTokenizerFast, AddedToken, ) # --------------------------------------------------------------------------- # Device # --------------------------------------------------------------------------- device = ( "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu" ) print(f"Device : {device}") # --------------------------------------------------------------------------- # Tokenizer (mirrors training setup) # --------------------------------------------------------------------------- def load_tokenizer(path: str): p = Path(path).resolve() if not p.exists(): raise FileNotFoundError(f"Tokenizer not found: {p}") tok = PreTrainedTokenizerFast(tokenizer_file=str(p)) specials = {} if tok.bos_token is None: specials["bos_token"] = AddedToken("<|bos|>", special=True) if tok.eos_token is None: specials["eos_token"] = AddedToken("<|eos|>", special=True) if tok.unk_token is None: specials["unk_token"] = AddedToken("<|unk|>", special=True) if tok.pad_token is None: if tok.eos_token is not None: tok.pad_token = tok.eos_token else: specials["pad_token"] = AddedToken("<|pad|>", special=True) if specials: tok.add_special_tokens(specials) tok.padding_side = "left" # left-pad for batched generation return tok print("Loading tokenizer...") tokenizer = load_tokenizer(TOKENIZER_PATH) print(f" Vocab size : {tokenizer.vocab_size}") print(f" BOS : {tokenizer.bos_token!r}") print(f" EOS : {tokenizer.eos_token!r}") print(f" PAD : {tokenizer.pad_token!r} (id={tokenizer.pad_token_id})") # --------------------------------------------------------------------------- # Model # --------------------------------------------------------------------------- print(f"\nLoading model from {MODEL_DIR} ...") model = AutoModelForCausalLM.from_pretrained( MODEL_DIR, dtype=torch.float16 if device == "cuda" else torch.float32, low_cpu_mem_usage=True, ) model.eval() model.to(device) total_params = sum(p.numel() for p in model.parameters()) print(f" Parameters : {total_params:,}") # --------------------------------------------------------------------------- # Generation helper # --------------------------------------------------------------------------- def generate( prompt: str = PROMPT, max_new_tokens: int = MAX_NEW_TOKENS, temperature: float = TEMPERATURE, top_p: float = TOP_P, top_k: int = TOP_K, repetition_penalty: float = REPETITION_PENALTY, do_sample: bool = DO_SAMPLE, ) -> str: bos = tokenizer.bos_token or "" full_prompt = bos + prompt inputs = tokenizer( full_prompt, return_tensors="pt", add_special_tokens=False, ).to(device) inputs.pop("token_type_ids", None) # Qwen3 doesn't use this gen_kwargs = dict( max_new_tokens = max_new_tokens, do_sample = do_sample, repetition_penalty = repetition_penalty, eos_token_id = tokenizer.eos_token_id, pad_token_id = tokenizer.pad_token_id, ) if do_sample: gen_kwargs["temperature"] = temperature gen_kwargs["top_p"] = top_p gen_kwargs["top_k"] = top_k with torch.inference_mode(): output_ids = model.generate(**inputs, **gen_kwargs) # Strip the prompt tokens so we only return what was generated prompt_len = inputs["input_ids"].shape[-1] new_ids = output_ids[0][prompt_len:] return tokenizer.decode(new_ids, skip_special_tokens=True) # --------------------------------------------------------------------------- # Run # --------------------------------------------------------------------------- if __name__ == "__main__": print(f"\nPrompt : {PROMPT!r}") print("-" * 60) output = generate(PROMPT) print("Generated:") print(output) ``` ## Citation ```bibtex @misc{stopaskingquestionsmini-656k, title = {StopAskingQuestionsMini-656k: Questions with No Answers}, author = {Harley-ml}, year = {2026}, url = {https://huggingface.co/Harley-ml/StopAskingQuestionsMini-656k} } ```