Text Generation
Transformers
Safetensors
English
keylm75m
keylm
small-language-model
instruct
gqa
rope
swiglu
qk-norm
custom_code
conversational
Instructions to use Eclipse-Senpai/KeyLM-75M-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Eclipse-Senpai/KeyLM-75M-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Eclipse-Senpai/KeyLM-75M-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Eclipse-Senpai/KeyLM-75M-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Eclipse-Senpai/KeyLM-75M-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Eclipse-Senpai/KeyLM-75M-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eclipse-Senpai/KeyLM-75M-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Eclipse-Senpai/KeyLM-75M-Instruct
- SGLang
How to use Eclipse-Senpai/KeyLM-75M-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Eclipse-Senpai/KeyLM-75M-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eclipse-Senpai/KeyLM-75M-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Eclipse-Senpai/KeyLM-75M-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eclipse-Senpai/KeyLM-75M-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Eclipse-Senpai/KeyLM-75M-Instruct with Docker Model Runner:
docker model run hf.co/Eclipse-Senpai/KeyLM-75M-Instruct
Replace variants table with SmolLM2-style base-vs-instruct benchmark table
Browse files
README.md
CHANGED
|
@@ -58,13 +58,6 @@ KeyLM is a compact decoder-only transformer built on the standard small-model re
|
|
| 58 |
| Precision | bfloat16 |
|
| 59 |
| Training tokens | ~18B |
|
| 60 |
|
| 61 |
-
### Model variants
|
| 62 |
-
|
| 63 |
-
| Variant | Type | Chat template | IFEval (4-metric avg) | Use for |
|
| 64 |
-
|---|---|---|---|---|
|
| 65 |
-
| [KeyLM-75M](https://huggingface.co/Eclipse-Senpai/KeyLM-75M) | Base (pretrained) | No | — | Fine-tuning, text completion |
|
| 66 |
-
| [KeyLM-75M-Instruct](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct) | Instruction-tuned | Yes | 17.85 | Chat, instruction following |
|
| 67 |
-
|
| 68 |
GGUF builds for `llama.cpp`, LM Studio, and Ollama are available at [KeyLM-75M-Instruct-GGUF](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF).
|
| 69 |
|
| 70 |
## How to Use
|
|
@@ -108,17 +101,21 @@ This is where KeyLM is competitive. All rows are evaluated with `lm_eval` (`ifev
|
|
| 108 |
|
| 109 |
KeyLM beats the original SmolLM-135M-Instruct at roughly half the size and a fraction of the training data. SmolLM2-135M-Instruct, a far more heavily trained model, remains ahead.
|
| 110 |
|
| 111 |
-
###
|
| 112 |
|
| 113 |
-
|
| 114 |
|
| 115 |
-
|
|
| 116 |
-
|---|---|---|---|
|
| 117 |
-
|
|
| 118 |
-
|
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
-
|
| 122 |
|
| 123 |
## Training
|
| 124 |
|
|
|
|
| 58 |
| Precision | bfloat16 |
|
| 59 |
| Training tokens | ~18B |
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
GGUF builds for `llama.cpp`, LM Studio, and Ollama are available at [KeyLM-75M-Instruct-GGUF](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF).
|
| 62 |
|
| 63 |
## How to Use
|
|
|
|
| 101 |
|
| 102 |
KeyLM beats the original SmolLM-135M-Instruct at roughly half the size and a fraction of the training data. SmolLM2-135M-Instruct, a far more heavily trained model, remains ahead.
|
| 103 |
|
| 104 |
+
### Base vs Instruct
|
| 105 |
|
| 106 |
+
The base and instruction-tuned checkpoints across all benchmarks. Commonsense and knowledge tasks are zero-shot via `lm_eval` (accuracy; ARC and HellaSwag length-normalized); IFEval is the 4-metric average. Bold marks the stronger version per row.
|
| 107 |
|
| 108 |
+
| Benchmark | KeyLM-75M (base) | KeyLM-75M-Instruct | Random |
|
| 109 |
+
|---|---|---|---|
|
| 110 |
+
| IFEval (4-metric avg) | — | **17.85** | — |
|
| 111 |
+
| MMLU | 23.0 | **24.0** | 25.0 |
|
| 112 |
+
| ARC (avg) | 29.9 | **30.8** | 25.0 |
|
| 113 |
+
| HellaSwag | 29.7 | **31.0** | 25.0 |
|
| 114 |
+
| PIQA | 60.0 | **61.3** | 50.0 |
|
| 115 |
+
| WinoGrande | **48.4** | 48.3 | 50.0 |
|
| 116 |
+
| OpenBookQA | 25.0 | 25.0 | 25.0 |
|
| 117 |
|
| 118 |
+
Instruction tuning leaves knowledge and reasoning roughly unchanged; its real effect is the instruction-following ability IFEval captures. Both versions sit modestly above random on basic commonsense and at chance on MMLU.
|
| 119 |
|
| 120 |
## Training
|
| 121 |
|