QMD Query Expansion β LFM2-1.2B GGUF
GGUF quantization of a fine-tuned LiquidAI/LFM2-1.2B for structured query expansion in qmd, a local-first document search engine.
Fine-tuned adapter: OrcsRise/qmd-query-expansion-lfm2-sft
Quantizations
| File | Quantization | Size | Use Case |
|---|---|---|---|
qmd-query-expansion-lfm2-q8_0.gguf |
Q8_0 | 1.19 GB | Recommended β near-original quality |
Quick Start with qmd
# Set as your qmd query expansion model
export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"
# Add to ~/.zshrc or ~/.bashrc for persistence
echo 'export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"' >> ~/.zshrc
# qmd auto-downloads the GGUF on first use
qmd query "your search query"
The model is automatically downloaded to ~/.cache/qmd/models/ on first run.
What This Model Does
Given a short search query, the model generates structured expansions in three formats for hybrid search:
| Prefix | Purpose | Example |
|---|---|---|
lex: |
Lexical keywords for BM25/FTS5 search | lex: docker container timeout settings |
vec: |
Natural language for vector similarity search | vec: how to configure docker container timeout |
hyde: |
Hypothetical document for HyDE retrieval | hyde: Docker containers can be configured with timeout settings using the --stop-timeout flag... |
Why LFM2 over Qwen3?
This is an alternative to qmd's default Qwen3-1.7B query expansion model, added in qmd v1.0.7.
| LFM2-1.2B (this) | Qwen3-1.7B (default) | |
|---|---|---|
| Parameters | 1.2B | 1.7B |
| Architecture | Hybrid (convolutions + attention) | Standard transformer |
| Decode/prefill speed | ~2x faster | Baseline |
| Q8_0 size | 1.19 GB | ~1.7 GB |
| Best for | On-device, latency-sensitive | Maximum quality |
LFM2's hybrid architecture makes it ideal for on-device inference where latency and memory matter more than marginal quality differences.
Training
- Method: SFT with LoRA (rank 16, alpha 32)
- Dataset: tobil/qmd-query-expansion-train β 5,157 examples
- LoRA targets:
q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3 - Epochs: 5
- Hardware: NVIDIA Tesla T4 (Google Colab, free tier)
- Training time: ~2.5 hours
See the SFT adapter card for full training details.
Recommended Generation Parameters
| Parameter | Value |
|---|---|
| Temperature | 0.3 |
| min_p | 0.15 |
| Repetition penalty | 1.05 |
Compatibility
This GGUF works with any inference engine that supports the LFM2 architecture:
- qmd (via node-llama-cpp) β primary use case
- llama.cpp (b5921+)
- Ollama
- LM Studio
Related
- qmd β Local-first document search with BM25 + vector + LLM reranking
- LiquidAI/LFM2-1.2B β Base model
- LiquidAI/LFM2-1.2B-GGUF β Official base model GGUFs (not fine-tuned)
- OrcsRise/qmd-query-expansion-lfm2-sft β SFT LoRA adapter
- tobil/qmd-query-expansion-1.7B-gguf β Default qmd model (Qwen3-1.7B)
- tobil/qmd-query-expansion-train β Training dataset
License
Apache 2.0 β same as the base LFM2-1.2B model.
- Downloads last month
- 21
8-bit
Model tree for OrcsRise/qmd-query-expansion-lfm2-gguf
Base model
LiquidAI/LFM2-1.2B