How to use from
Docker Model Runner
docker model run hf.co/leok7v/KeyLM-75M-Instruct-GGUF:Q4_0
Quick Links

KeyLM-75M-Instruct-GGUF

GGUF builds of KeyLM-75M-Instruct for llama.cpp, LM Studio, Ollama, and other GGUF runtimes.

KeyLM is a 75M-parameter instruction-tuned language model trained from scratch on approximately 18 billion tokens. See the main model card for benchmarks, training details, limitations, and the transformers (safetensors) version.

Files

File Quant Size Notes
KeyLM-75M-Instruct.F16.gguf F16 ~144 MB Full precision and recommended. The model is already tiny, so there is little reason to quantize further.

Run with llama.cpp

# straight from the Hub
llama-cli -hf Eclipse-Senpai/KeyLM-75M-Instruct-GGUF -cnv

# or a local file
llama-cli -m KeyLM-75M-Instruct.F16.gguf -cnv

The chat template (User: / Assistant:, assistant turns ending with </s>) is embedded in the GGUF, so conversation mode (-cnv) applies it automatically.

LM Studio / Ollama

  • LM Studio: load the .gguf; the embedded chat template is detected automatically.
  • Ollama: ollama run hf.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF

Notes & limitations

KeyLM is a tiny model: good at simple instruction following and short chat, near random chance on knowledge/reasoning benchmarks. It is not a factual assistant. Full numbers and caveats are on the main model card.

License

Apache 2.0.

Downloads last month
-
GGUF
Model size
75.3M params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for leok7v/KeyLM-75M-Instruct-GGUF

Quantized
(2)
this model