Eclipse-Senpai commited on
Commit
9394626
·
verified ·
1 Parent(s): 6f9aecf

Switch weights to bf16, re-exported from the fp32 checkpoint

Browse files
Files changed (4) hide show
  1. README.md +7 -3
  2. config.json +4 -4
  3. model.safetensors +2 -2
  4. tokenizer_config.json +2 -2
README.md CHANGED
@@ -29,7 +29,7 @@ datasets:
29
 
30
  # KeyLM-75M-Instruct
31
 
32
- KeyLM-75M-Instruct is a 75M parameter instruction-tuned language model trained from scratch on approximately 18 billion tokens. That training budget is a small fraction of what comparable small models use (SmolLM-135M was trained on 600B tokens and SmolLM2-135M on 2T). Despite this, it is competitive on instruction following, outperforming SmolLM-135M-Instruct on IFEval while using about half the parameters and a fraction of the data.
33
 
34
  ## Table of Contents
35
 
@@ -53,13 +53,15 @@ KeyLM is a compact decoder-only transformer built on the standard small-model re
53
  | Attention heads | 8 (2 KV heads, GQA) |
54
  | Context length | 2048 |
55
  | Vocabulary | 12,020 (ByteLevel BPE) |
56
- | Precision | float16 |
57
  | Training tokens | ~18B |
58
 
59
  GGUF builds for `llama.cpp`, LM Studio, and Ollama are available at [KeyLM-75M-Instruct-GGUF](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF).
60
 
61
  ## How to Use
62
 
 
 
63
  ```python
64
  import torch
65
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -67,7 +69,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
67
  model_id = "Eclipse-Senpai/KeyLM-75M-Instruct"
68
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
69
  model = AutoModelForCausalLM.from_pretrained(
70
- model_id, trust_remote_code=True, torch_dtype=torch.float16
71
  )
72
 
73
  messages = [{"role": "user", "content": "What is the capital of France?"}]
@@ -81,6 +83,8 @@ outputs = model.generate(
81
  print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
82
  ```
83
 
 
 
84
  ## Evaluation
85
 
86
  ### Instruction following (IFEval)
 
29
 
30
  # KeyLM-75M-Instruct
31
 
32
+ KeyLM-75M-Instruct is a 75M parameter instruction-tuned language model trained from scratch on approximately 18 billion tokens. That training budget is a small fraction of what comparable small models use (SmolLM-135M was trained on roughly 600B tokens, SmolLM2-135M on roughly 2T). Despite this, it is competitive on instruction following, outperforming SmolLM-135M-Instruct on IFEval while using about half the parameters and a fraction of the data.
33
 
34
  ## Table of Contents
35
 
 
53
  | Attention heads | 8 (2 KV heads, GQA) |
54
  | Context length | 2048 |
55
  | Vocabulary | 12,020 (ByteLevel BPE) |
56
+ | Precision | bfloat16 |
57
  | Training tokens | ~18B |
58
 
59
  GGUF builds for `llama.cpp`, LM Studio, and Ollama are available at [KeyLM-75M-Instruct-GGUF](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF).
60
 
61
  ## How to Use
62
 
63
+ KeyLM ships its own modeling code, so load it with `trust_remote_code=True`. It requires `transformers>=4.51`.
64
+
65
  ```python
66
  import torch
67
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
69
  model_id = "Eclipse-Senpai/KeyLM-75M-Instruct"
70
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
71
  model = AutoModelForCausalLM.from_pretrained(
72
+ model_id, trust_remote_code=True, torch_dtype=torch.bfloat16
73
  )
74
 
75
  messages = [{"role": "user", "content": "What is the capital of France?"}]
 
83
  print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
84
  ```
85
 
86
+ The model uses a plain `User:` / `Assistant:` chat format, applied automatically by `apply_chat_template`. Assistant turns end with `</s>`.
87
+
88
  ## Evaluation
89
 
90
  ### Instruction following (IFEval)
config.json CHANGED
@@ -9,11 +9,11 @@
9
  },
10
  "vocab_size": 12020,
11
  "hidden_size": 512,
12
- "intermediate_size": 1280,
13
- "num_hidden_layers": 24,
14
  "num_attention_heads": 8,
15
  "num_key_value_heads": 2,
16
- "head_dim": 64,
 
17
  "max_position_embeddings": 2048,
18
  "rope_theta": 10000.0,
19
  "rms_norm_eps": 1e-06,
@@ -26,5 +26,5 @@
26
  "bos_token_id": 1,
27
  "eos_token_id": 2,
28
  "pad_token_id": 2,
29
- "torch_dtype": "float16"
30
  }
 
9
  },
10
  "vocab_size": 12020,
11
  "hidden_size": 512,
12
+ "head_dim": 64,
 
13
  "num_attention_heads": 8,
14
  "num_key_value_heads": 2,
15
+ "intermediate_size": 1280,
16
+ "num_hidden_layers": 24,
17
  "max_position_embeddings": 2048,
18
  "rope_theta": 10000.0,
19
  "rms_norm_eps": 1e-06,
 
26
  "bos_token_id": 1,
27
  "eos_token_id": 2,
28
  "pad_token_id": 2,
29
+ "torch_dtype": "bfloat16"
30
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:62a2f1202bf8c44f7839d0f402c81977d8516936a6a7aa70bc8cebd210791b4b
3
- size 150531664
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58ff80ee933def1960d75556c3770a016f56cb3f8f3891f01ca9501ca6cf17dc
3
+ size 150531928
tokenizer_config.json CHANGED
@@ -6,8 +6,8 @@
6
  "tokenizer_class": "PreTrainedTokenizerFast",
7
  "unk_token": "[UNK]",
8
  "vocab_size": 12020,
9
- "chat_template": "{% for message in messages %}{% if message['role'] == 'user' %}{% if loop.index0 > 0 %}\n{% endif %}User: {{ message['content'] }}\n{% elif message['role'] == 'assistant' %}Assistant: {{ message['content'] }}</s>{% endif %}{% endfor %}{% if add_generation_prompt %}Assistant: {% endif %}",
10
  "add_bos_token": false,
11
  "add_eos_token": false,
12
- "clean_up_tokenization_spaces": false
 
13
  }
 
6
  "tokenizer_class": "PreTrainedTokenizerFast",
7
  "unk_token": "[UNK]",
8
  "vocab_size": 12020,
 
9
  "add_bos_token": false,
10
  "add_eos_token": false,
11
+ "clean_up_tokenization_spaces": false,
12
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'user' %}{% if loop.index0 > 0 %}\n{% endif %}User: {{ message['content'] }}\n{% elif message['role'] == 'assistant' %}Assistant: {{ message['content'] }}</s>{% endif %}{% endfor %}{% if add_generation_prompt %}Assistant: {% endif %}"
13
  }