--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - keylm - small-language-model - base - pretrained - gqa - rope - swiglu - qk-norm - custom_code datasets: - HuggingFaceFW/fineweb-edu-score-2 - wikimedia/wikipedia - HuggingFaceGECLM/REDDIT_comments - marin-community/stackexchange-markdown - allenai/WildChat-1M - HuggingFaceH4/ultrachat_200k - lmsys/lmsys-chat-1m - OpenAssistant/oasst2 - HuggingFaceTB/cosmopedia-100k --- # KeyLM-75M KeyLM-75M is a 75M parameter base language model trained from scratch on approximately 18 billion tokens. That training budget is a small fraction of what comparable small models use (SmolLM-135M was trained on roughly 600B tokens, SmolLM2-135M on roughly 2T). This is the **base** model: a text-completion model, not instruction-tuned. For chat and instruction following, use [KeyLM-75M-Instruct](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct). ## Table of Contents 1. [Model Summary](#model-summary) 2. [How to Use](#how-to-use) 3. [Evaluation](#evaluation) 4. [Training](#training) 5. [Limitations](#limitations) 6. [License](#license) 7. [Citation](#citation) ## Model Summary KeyLM is a compact decoder-only transformer built on the standard small-model recipe used by Llama and Qwen3: grouped-query attention, rotary position embeddings (RoPE), SwiGLU feed-forward layers, and per-head QK-RMSNorm. | Field | Value | |---|---| | Parameters | 75,251,200 | | Layers | 24 | | Hidden size | 512 | | Attention heads | 8 (2 KV heads, GQA) | | Context length | 2048 | | Vocabulary | 12,020 (ByteLevel BPE) | | Precision | bfloat16 | | Training tokens | ~18B | ## How to Use ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "Eclipse-Senpai/KeyLM-75M" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, torch_dtype=torch.bfloat16 ) inputs = tokenizer("The three primary colors are", return_tensors="pt") outputs = model.generate( **inputs, max_new_tokens=40, do_sample=True, temperature=0.7, top_p=0.9, repetition_penalty=1.1, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Evaluation On zero-shot benchmarks (`lm_eval`; accuracy, with length-normalized accuracy for ARC and HellaSwag) KeyLM is modest but above random on basic commonsense, and at chance on knowledge-heavy tasks. | Benchmark | KeyLM-75M (base) | KeyLM-75M-Instruct | Random | |---|---|---|---| | IFEval (4-metric avg) | — | **17.85** | — | | MMLU | 23.0 | **24.0** | 25.0 | | ARC (avg) | 29.9 | **30.8** | 25.0 | | HellaSwag | 29.7 | **31.0** | 25.0 | | PIQA | 60.0 | **61.3** | 50.0 | | WinoGrande | **48.4** | 48.3 | 50.0 | | OpenBookQA | 25.0 | 25.0 | 25.0 | Instruction tuning leaves knowledge and reasoning roughly unchanged, its effect is the instruction-following ability (IFEval) the base lacks. ## Training KeyLM-75M was pretrained from random initialization on approximately 18B tokens, drawn from a weighted mixture of public datasets streamed through a deterministic curriculum. | Category | Share | Sources | |---|---|---| | Formal / quality | ~30% | FineWeb-Edu, Wikipedia | | Casual / social | ~30% | Reddit comments, StackExchange | | Conversational | ~25% | WildChat, UltraChat, LMSYS-Chat, OASST2 | | Structured knowledge | ~5% | Cosmopedia | | Typo augmentation | ~10% | Synthetic (contrastive) | The instruction-tuned model built on this base is available at [KeyLM-75M-Instruct](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct). ## Limitations - Minimal world knowledge. Not suitable for factual question answering, reasoning, math, or code. - Base model: it completes text and does not follow instructions or hold a conversation. Use the Instruct version for chat. - English only. - No safety alignment. Apply your own filtering before any user-facing use. ## License Apache 2.0. The weights are trained from scratch and free to use, modify, and redistribute. ## Citation ```bibtex @misc{keylm75m2026, title = {KeyLM-75M: a from-scratch small language model}, author = {Eclipse-Senpai}, year = {2026}, howpublished = {\url{https://huggingface.co/Eclipse-Senpai/KeyLM-75M}} } ```