Eclipse-Senpai commited on
Commit
a7de577
·
verified ·
1 Parent(s): 97f644f

main commit

Browse files
README.md CHANGED
@@ -1,3 +1,185 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - keylm
9
+ - small-language-model
10
+ - instruct
11
+ - gqa
12
+ - rope
13
+ - swiglu
14
+ - qk-norm
15
+ - custom_code
16
+ model-index:
17
+ - name: KeyLM-75M-Instruct
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Instruction Following
22
+ dataset:
23
+ name: IFEval
24
+ type: google/IFEval
25
+ metrics:
26
+ - type: acc
27
+ name: IFEval (4-metric average)
28
+ value: 17.85
29
+ - type: acc
30
+ name: IFEval instruction-level (strict)
31
+ value: 22.42
32
+ - type: acc
33
+ name: IFEval prompt-level (strict)
34
+ value: 12.75
35
+ - task:
36
+ type: text-generation
37
+ name: Multiple Choice
38
+ dataset:
39
+ name: MMLU
40
+ type: cais/mmlu
41
+ metrics:
42
+ - type: acc
43
+ name: MMLU (0-shot)
44
+ value: 23.0
45
+ - task:
46
+ type: text-generation
47
+ name: Multiple Choice
48
+ dataset:
49
+ name: ARC-Challenge
50
+ type: allenai/ai2_arc
51
+ metrics:
52
+ - type: acc_norm
53
+ name: ARC-Challenge (0-shot)
54
+ value: 25.5
55
+ - task:
56
+ type: text-generation
57
+ name: Multiple Choice
58
+ dataset:
59
+ name: ARC-Easy
60
+ type: allenai/ai2_arc
61
+ metrics:
62
+ - type: acc_norm
63
+ name: ARC-Easy (0-shot)
64
+ value: 26.6
65
+ - task:
66
+ type: text-generation
67
+ name: Multiple Choice
68
+ dataset:
69
+ name: HellaSwag
70
+ type: Rowan/hellaswag
71
+ metrics:
72
+ - type: acc_norm
73
+ name: HellaSwag (0-shot)
74
+ value: 26.7
75
+ - task:
76
+ type: text-generation
77
+ name: Multiple Choice
78
+ dataset:
79
+ name: PIQA
80
+ type: ybisk/piqa
81
+ metrics:
82
+ - type: acc
83
+ name: PIQA (0-shot)
84
+ value: 53.1
85
+ - task:
86
+ type: text-generation
87
+ name: Multiple Choice
88
+ dataset:
89
+ name: WinoGrande
90
+ type: allenai/winogrande
91
+ metrics:
92
+ - type: acc
93
+ name: WinoGrande (0-shot)
94
+ value: 48.9
95
+ - task:
96
+ type: text-generation
97
+ name: Multiple Choice
98
+ dataset:
99
+ name: OpenBookQA
100
+ type: allenai/openbookqa
101
+ metrics:
102
+ - type: acc_norm
103
+ name: OpenBookQA (0-shot)
104
+ value: 18.4
105
  ---
106
+
107
+ # KeyLM-75M-Instruct
108
+
109
+ KeyLM-75M-Instruct is a 75M parameter instruction-tuned language model trained from scratch on approximately 18 billion tokens. That training budget is a small fraction of what comparable small models use (SmolLM-135M was trained on roughly 600B tokens, SmolLM2-135M on roughly 2T). Despite this, it is competitive on instruction following, outperforming SmolLM-135M-Instruct on IFEval while using about half the parameters and a fraction of the training tokens.
110
+
111
+ ## Results
112
+
113
+ IFEval, evaluated with `lm_eval` (541 prompts, greedy decoding).
114
+
115
+ | Model | Params | Train tokens | IFEval (4-metric avg) |
116
+ |---|---|---|---|
117
+ | **KeyLM-75M-Instruct** | **75M** | **~18B** | **17.85** |
118
+ | SmolLM-135M-Instruct | 135M | ~600B | 17.15 |
119
+ | SmolLM2-135M-Instruct | 135M | ~2T | 26.98 |
120
+
121
+ Full benchmark results (MMLU, ARC, HellaSwag, PIQA, WinoGrande, OpenBookQA) appear in the evaluation panel above. On those multiple-choice knowledge and reasoning tasks the model scores near random chance, which is expected at this parameter and token budget. Its usable behavior comes from instruction tuning rather than parametric knowledge.
122
+
123
+ ## Usage
124
+
125
+ KeyLM ships its own modeling code, so load it with `trust_remote_code=True` (requires `transformers>=4.51`).
126
+
127
+ ```python
128
+ import torch
129
+ from transformers import AutoModelForCausalLM, AutoTokenizer
130
+
131
+ model_id = "Eclipse-Senpai/KeyLM-75M-Instruct"
132
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
133
+ model = AutoModelForCausalLM.from_pretrained(
134
+ model_id, trust_remote_code=True, torch_dtype=torch.float16
135
+ )
136
+
137
+ messages = [{"role": "user", "content": "What is the capital of France?"}]
138
+ inputs = tokenizer.apply_chat_template(
139
+ messages, add_generation_prompt=True, return_tensors="pt"
140
+ )
141
+ outputs = model.generate(
142
+ inputs, max_new_tokens=128, do_sample=True,
143
+ temperature=0.7, top_p=0.9, repetition_penalty=1.1,
144
+ )
145
+ print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
146
+ ```
147
+
148
+ GGUF builds for `llama.cpp`, LM Studio, and Ollama are available at [KeyLM-75M-Instruct-GGUF](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF).
149
+
150
+ ## Model details
151
+
152
+ | Field | Value |
153
+ |---|---|
154
+ | Parameters | 75,251,200 |
155
+ | Architecture | Grouped-query attention, RoPE, SwiGLU, QK-RMSNorm |
156
+ | Hidden size | 512 |
157
+ | Layers | 24 |
158
+ | Attention heads | 8 (2 KV heads) |
159
+ | Context length | 2048 |
160
+ | Vocabulary | 12,020 (ByteLevel BPE) |
161
+ | Precision | float16 |
162
+ | Chat format | `User:` / `Assistant:`, assistant turns end with `</s>` |
163
+
164
+ The architecture follows the standard small decoder recipe used by Llama and Qwen3. Weights are trained from random initialization. Instruction tuning uses `smol-smoltalk`, `ultrachat_200k`, and several `smoltalk2` splits with assistant-only loss masking, followed by a personality tuning pass.
165
+
166
+ ## Limitations
167
+
168
+ - Minimal world knowledge. Not suitable for factual question answering, reasoning, math, or code.
169
+ - English only.
170
+ - No dedicated safety alignment was performed.
171
+
172
+ ## License
173
+
174
+ Apache 2.0. Weights are trained from scratch and free to use, modify, and redistribute.
175
+
176
+ ## Citation
177
+
178
+ ```bibtex
179
+ @misc{keylm75m2026,
180
+ title = {KeyLM-75M: a from-scratch small language model},
181
+ author = {Eclipse-Senpai},
182
+ year = {2026},
183
+ howpublished = {\url{https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct}}
184
+ }
185
+ ```
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "KeyLM75M"
4
+ ],
5
+ "model_type": "keylm75m",
6
+ "auto_map": {
7
+ "AutoConfig": "configuration_keylm.KeyLM75MConfig",
8
+ "AutoModelForCausalLM": "modeling_keylm.KeyLM75M"
9
+ },
10
+ "vocab_size": 12020,
11
+ "hidden_size": 512,
12
+ "intermediate_size": 1280,
13
+ "num_hidden_layers": 24,
14
+ "num_attention_heads": 8,
15
+ "num_key_value_heads": 2,
16
+ "head_dim": 64,
17
+ "max_position_embeddings": 2048,
18
+ "rope_theta": 10000.0,
19
+ "rms_norm_eps": 1e-06,
20
+ "hidden_act": "silu",
21
+ "attention_bias": false,
22
+ "attention_dropout": 0.0,
23
+ "use_sliding_window": false,
24
+ "tie_word_embeddings": false,
25
+ "initializer_range": 0.02,
26
+ "bos_token_id": 1,
27
+ "eos_token_id": 2,
28
+ "pad_token_id": 2,
29
+ "torch_dtype": "float16"
30
+ }
configuration_keylm.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """KeyLM model configuration.
2
+
3
+ KeyLM-75M is a from-scratch small language model. Its decoder block is a
4
+ Qwen3-style layout (grouped-query attention, RoPE, SwiGLU, and per-head
5
+ QK-RMSNorm), so the configuration inherits Qwen3Config and only overrides the
6
+ ``model_type`` so the model carries its own identity on the Hub.
7
+ """
8
+
9
+ from transformers.models.qwen3.configuration_qwen3 import Qwen3Config
10
+
11
+
12
+ class KeyLM75MConfig(Qwen3Config):
13
+ model_type = "keylm75m"
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "pad_token_id": 2,
5
+ "do_sample": true,
6
+ "temperature": 0.7,
7
+ "top_p": 0.9,
8
+ "repetition_penalty": 1.1
9
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:62a2f1202bf8c44f7839d0f402c81977d8516936a6a7aa70bc8cebd210791b4b
3
+ size 150531664
modeling_keylm.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """KeyLM model implementation.
2
+
3
+ KeyLM-75M uses a Qwen3-style decoder (GQA + RoPE + SwiGLU + per-head
4
+ QK-RMSNorm). Rather than vendor a full copy of the transformer, the classes
5
+ below specialise the upstream Qwen3 implementation and bind it to KeyLMConfig
6
+ so the model loads under its own name via `trust_remote_code=True`.
7
+ """
8
+
9
+ try:
10
+ from transformers.models.qwen3.modeling_qwen3 import Qwen3ForCausalLM, Qwen3Model
11
+ except ImportError as exc: # pragma: no cover - guidance for old transformers
12
+ raise ImportError(
13
+ "KeyLM requires a transformers version that ships the Qwen3 model "
14
+ "(transformers>=4.51). Please upgrade transformers."
15
+ ) from exc
16
+
17
+ from .configuration_keylm import KeyLM75MConfig
18
+
19
+
20
+ class KeyLM75MModel(Qwen3Model):
21
+ config_class = KeyLM75MConfig
22
+
23
+
24
+ class KeyLM75M(Qwen3ForCausalLM):
25
+ config_class = KeyLM75MConfig
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "unk_token": "[UNK]"
5
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "lowercase": false,
5
+ "model_max_length": 2048,
6
+ "tokenizer_class": "PreTrainedTokenizerFast",
7
+ "unk_token": "[UNK]",
8
+ "vocab_size": 12020,
9
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'user' %}{% if loop.index0 > 0 %}\n{% endif %}User: {{ message['content'] }}\n{% elif message['role'] == 'assistant' %}Assistant: {{ message['content'] }}</s>{% endif %}{% endfor %}{% if add_generation_prompt %}Assistant: {% endif %}",
10
+ "add_bos_token": false,
11
+ "add_eos_token": false,
12
+ "clean_up_tokenization_spaces": false
13
+ }