QMD Query Expansion β LFM2-1.2B SFT Adapter
Fine-tuned LiquidAI/LFM2-1.2B for structured query expansion in qmd, a local-first document search engine.
For ready-to-use GGUF quantizations, see OrcsRise/qmd-query-expansion-lfm2-gguf.
What This Model Does
Given a short search query, the model generates structured expansions in three formats that qmd uses for hybrid search:
| Prefix | Purpose | Example |
|---|---|---|
lex: |
Lexical keywords for BM25/FTS5 search | lex: docker container timeout settings |
vec: |
Natural language for vector similarity search | vec: how to configure docker container timeout |
hyde: |
Hypothetical document for HyDE retrieval | hyde: Docker containers can be configured with timeout settings using the --stop-timeout flag... |
Example
Input:
/no_think Expand this search query:
docker timeout
Output:
lex: docker container timeout
lex: docker stop timeout configuration
vec: how to configure docker container timeout settings
vec: docker container restart timeout policy
hyde: Docker containers can be configured with timeout settings using the --stop-timeout flag. The default timeout is 10 seconds before SIGKILL is sent.
Why LFM2?
LFM2's hybrid architecture (convolutions + attention) is 2x faster at decode/prefill vs standard transformers of the same size β ideal for on-device query expansion where latency matters. Added as an alternative to the default Qwen3-1.7B model in qmd v1.0.7.
Training Details
| Parameter | Value |
|---|---|
| Base model | LiquidAI/LFM2-1.2B |
| Dataset | tobil/qmd-query-expansion-train (5,157 examples) |
| Method | SFT with LoRA (rank 16, alpha 32) |
| LoRA targets | q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3 |
| Trainable params | 11.1M / 1.18B total (~0.9%) |
| Epochs | 5 |
| Batch size | 4 (x4 gradient accumulation = 16 effective) |
| Learning rate | 2e-4 (cosine schedule) |
| Max sequence length | 512 |
| Precision | bf16 |
| Hardware | NVIDIA Tesla T4 (Google Colab) |
| Training time | ~2.5 hours |
Training Loss
| Step | Training Loss | Validation Loss |
|---|---|---|
| 200 | 0.528 | 0.545 |
| 400 | 0.484 | 0.520 |
| 600 | 0.384 | 0.521 |
Recommended Generation Parameters
| Parameter | Value |
|---|---|
| Temperature | 0.3 |
| min_p | 0.15 |
| Repetition penalty | 1.05 |
Usage with qmd
# Use the pre-built GGUF:
export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"
qmd query "docker timeout"
Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("OrcsRise/qmd-query-expansion-lfm2-sft")
tokenizer = AutoTokenizer.from_pretrained("OrcsRise/qmd-query-expansion-lfm2-sft")
messages = [{"role": "user", "content": "/no_think Expand this search query:\n\ndocker timeout"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.3, min_p=0.15, repetition_penalty=1.05)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Framework Versions
- TRL: 0.28.0
- Transformers: 5.0.0
- PyTorch: 2.9.0+cu128
- Datasets: 4.0.0
- PEFT: LoRA via TRL SFTTrainer
Related
- qmd β Local-first document search with BM25 + vector + LLM reranking
- LiquidAI/LFM2-1.2B β Base model
- tobil/qmd-query-expansion-train β Training dataset
- OrcsRise/qmd-query-expansion-lfm2-gguf β GGUF quantizations
- tobil/qmd-query-expansion-1.7B-gguf β Default qmd model (Qwen3-1.7B)
License
Apache 2.0 β same as the base LFM2-1.2B model.
Model tree for OrcsRise/qmd-query-expansion-lfm2-sft
Base model
LiquidAI/LFM2-1.2B