QMD Query Expansion — LFM2-1.2B SFT Adapter

Fine-tuned LiquidAI/LFM2-1.2B for structured query expansion in qmd, a local-first document search engine.

For ready-to-use GGUF quantizations, see OrcsRise/qmd-query-expansion-lfm2-gguf.

What This Model Does

Given a short search query, the model generates structured expansions in three formats that qmd uses for hybrid search:

Prefix	Purpose	Example
`lex:`	Lexical keywords for BM25/FTS5 search	`lex: docker container timeout settings`
`vec:`	Natural language for vector similarity search	`vec: how to configure docker container timeout`
`hyde:`	Hypothetical document for HyDE retrieval	`hyde: Docker containers can be configured with timeout settings using the --stop-timeout flag...`

Example

Input:

/no_think Expand this search query:

docker timeout

Output:

lex: docker container timeout
lex: docker stop timeout configuration
vec: how to configure docker container timeout settings
vec: docker container restart timeout policy
hyde: Docker containers can be configured with timeout settings using the --stop-timeout flag. The default timeout is 10 seconds before SIGKILL is sent.

Why LFM2?

LFM2's hybrid architecture (convolutions + attention) is 2x faster at decode/prefill vs standard transformers of the same size — ideal for on-device query expansion where latency matters. Added as an alternative to the default Qwen3-1.7B model in qmd v1.0.7.

Training Details

Parameter	Value
Base model	LiquidAI/LFM2-1.2B
Dataset	tobil/qmd-query-expansion-train (5,157 examples)
Method	SFT with LoRA (rank 16, alpha 32)
LoRA targets	`q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3`
Trainable params	11.1M / 1.18B total (~0.9%)
Epochs	5
Batch size	4 (x4 gradient accumulation = 16 effective)
Learning rate	2e-4 (cosine schedule)
Max sequence length	512
Precision	bf16
Hardware	NVIDIA Tesla T4 (Google Colab)
Training time	~2.5 hours

Training Loss

Step	Training Loss	Validation Loss
200	0.528	0.545
400	0.484	0.520
600	0.384	0.521

Recommended Generation Parameters

Parameter	Value
Temperature	0.3
min_p	0.15
Repetition penalty	1.05

Usage with qmd

# Use the pre-built GGUF:
export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"
qmd query "docker timeout"

Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("OrcsRise/qmd-query-expansion-lfm2-sft")
tokenizer = AutoTokenizer.from_pretrained("OrcsRise/qmd-query-expansion-lfm2-sft")

messages = [{"role": "user", "content": "/no_think Expand this search query:\n\ndocker timeout"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.3, min_p=0.15, repetition_penalty=1.05)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Framework Versions

TRL: 0.28.0
Transformers: 5.0.0
PyTorch: 2.9.0+cu128
Datasets: 4.0.0
PEFT: LoRA via TRL SFTTrainer

qmd — Local-first document search with BM25 + vector + LLM reranking
LiquidAI/LFM2-1.2B — Base model
tobil/qmd-query-expansion-train — Training dataset
OrcsRise/qmd-query-expansion-lfm2-gguf — GGUF quantizations
tobil/qmd-query-expansion-1.7B-gguf — Default qmd model (Qwen3-1.7B)

License

Apache 2.0 — same as the base LFM2-1.2B model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for OrcsRise/qmd-query-expansion-lfm2-sft

Base model

LiquidAI/LFM2-1.2B

Adapter

(7)

this model

OrcsRise
/

qmd-query-expansion-lfm2-sft