--- license: apache-2.0 language: - en base_model: Eclipse-Senpai/KeyLM-75M-Instruct base_model_relation: quantized pipeline_tag: text-generation library_name: gguf tags: - keylm - gguf - llama.cpp - small-language-model - instruct --- # KeyLM-75M-Instruct-GGUF GGUF builds of [**KeyLM-75M-Instruct**](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct) for `llama.cpp`, LM Studio, Ollama, and other GGUF runtimes. KeyLM is a 75M-parameter instruction-tuned language model trained from scratch on approximately 18 billion tokens. See the [main model card](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct) for benchmarks, training details, limitations, and the `transformers` (safetensors) version. ## Files | File | Quant | Size | Notes | |---|---|---|---| | `KeyLM-75M-Instruct.F16.gguf` | F16 | ~144 MB | Full precision and recommended. The model is already tiny, so there is little reason to quantize further. | ## Run with llama.cpp ```bash # straight from the Hub llama-cli -hf Eclipse-Senpai/KeyLM-75M-Instruct-GGUF -cnv # or a local file llama-cli -m KeyLM-75M-Instruct.F16.gguf -cnv ``` The chat template (`User:` / `Assistant:`, assistant turns ending with ``) is embedded in the GGUF, so conversation mode (`-cnv`) applies it automatically. ## LM Studio / Ollama - **LM Studio:** load the `.gguf`; the embedded chat template is detected automatically. - **Ollama:** `ollama run hf.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF` ## Notes & limitations KeyLM is a tiny model: good at simple instruction following and short chat, near random chance on knowledge/reasoning benchmarks. It is not a factual assistant. Full numbers and caveats are on the [main model card](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct). ## License Apache 2.0.