QMD Query Expansion — LFM2-1.2B GGUF

GGUF quantization of a fine-tuned LiquidAI/LFM2-1.2B for structured query expansion in qmd, a local-first document search engine.

Fine-tuned adapter: OrcsRise/qmd-query-expansion-lfm2-sft

Quantizations

File	Quantization	Size	Use Case
`qmd-query-expansion-lfm2-q8_0.gguf`	Q8_0	1.19 GB	Recommended — near-original quality

Quick Start with qmd

# Set as your qmd query expansion model
export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"

# Add to ~/.zshrc or ~/.bashrc for persistence
echo 'export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"' >> ~/.zshrc

# qmd auto-downloads the GGUF on first use
qmd query "your search query"

The model is automatically downloaded to ~/.cache/qmd/models/ on first run.

What This Model Does

Given a short search query, the model generates structured expansions in three formats for hybrid search:

Prefix	Purpose	Example
`lex:`	Lexical keywords for BM25/FTS5 search	`lex: docker container timeout settings`
`vec:`	Natural language for vector similarity search	`vec: how to configure docker container timeout`
`hyde:`	Hypothetical document for HyDE retrieval	`hyde: Docker containers can be configured with timeout settings using the --stop-timeout flag...`

Why LFM2 over Qwen3?

This is an alternative to qmd's default Qwen3-1.7B query expansion model, added in qmd v1.0.7.

	LFM2-1.2B (this)	Qwen3-1.7B (default)
Parameters	1.2B	1.7B
Architecture	Hybrid (convolutions + attention)	Standard transformer
Decode/prefill speed	~2x faster	Baseline
Q8_0 size	1.19 GB	~1.7 GB
Best for	On-device, latency-sensitive	Maximum quality

LFM2's hybrid architecture makes it ideal for on-device inference where latency and memory matter more than marginal quality differences.

Training

Method: SFT with LoRA (rank 16, alpha 32)
Dataset: tobil/qmd-query-expansion-train — 5,157 examples
LoRA targets: q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3
Epochs: 5
Hardware: NVIDIA Tesla T4 (Google Colab, free tier)
Training time: ~2.5 hours

See the SFT adapter card for full training details.

Recommended Generation Parameters

Parameter	Value
Temperature	0.3
min_p	0.15
Repetition penalty	1.05

Compatibility

This GGUF works with any inference engine that supports the LFM2 architecture:

qmd (via node-llama-cpp) — primary use case
llama.cpp (b5921+)
Ollama
LM Studio

qmd — Local-first document search with BM25 + vector + LLM reranking
LiquidAI/LFM2-1.2B — Base model
LiquidAI/LFM2-1.2B-GGUF — Official base model GGUFs (not fine-tuned)
OrcsRise/qmd-query-expansion-lfm2-sft — SFT LoRA adapter
tobil/qmd-query-expansion-1.7B-gguf — Default qmd model (Qwen3-1.7B)
tobil/qmd-query-expansion-train — Training dataset

License

Apache 2.0 — same as the base LFM2-1.2B model.

Downloads last month: 21

GGUF

Model size

1B params

Architecture

lfm2

Hardware compatibility

8-bit

Model tree for OrcsRise/qmd-query-expansion-lfm2-gguf

Base model

LiquidAI/LFM2-1.2B

Quantized

(30)

this model

OrcsRise
/

qmd-query-expansion-lfm2-gguf