QMD Query Expansion β€” LFM2-1.2B SFT Adapter

Fine-tuned LiquidAI/LFM2-1.2B for structured query expansion in qmd, a local-first document search engine.

For ready-to-use GGUF quantizations, see OrcsRise/qmd-query-expansion-lfm2-gguf.

What This Model Does

Given a short search query, the model generates structured expansions in three formats that qmd uses for hybrid search:

Prefix Purpose Example
lex: Lexical keywords for BM25/FTS5 search lex: docker container timeout settings
vec: Natural language for vector similarity search vec: how to configure docker container timeout
hyde: Hypothetical document for HyDE retrieval hyde: Docker containers can be configured with timeout settings using the --stop-timeout flag...

Example

Input:

/no_think Expand this search query:

docker timeout

Output:

lex: docker container timeout
lex: docker stop timeout configuration
vec: how to configure docker container timeout settings
vec: docker container restart timeout policy
hyde: Docker containers can be configured with timeout settings using the --stop-timeout flag. The default timeout is 10 seconds before SIGKILL is sent.

Why LFM2?

LFM2's hybrid architecture (convolutions + attention) is 2x faster at decode/prefill vs standard transformers of the same size β€” ideal for on-device query expansion where latency matters. Added as an alternative to the default Qwen3-1.7B model in qmd v1.0.7.

Training Details

Parameter Value
Base model LiquidAI/LFM2-1.2B
Dataset tobil/qmd-query-expansion-train (5,157 examples)
Method SFT with LoRA (rank 16, alpha 32)
LoRA targets q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3
Trainable params 11.1M / 1.18B total (~0.9%)
Epochs 5
Batch size 4 (x4 gradient accumulation = 16 effective)
Learning rate 2e-4 (cosine schedule)
Max sequence length 512
Precision bf16
Hardware NVIDIA Tesla T4 (Google Colab)
Training time ~2.5 hours

Training Loss

Step Training Loss Validation Loss
200 0.528 0.545
400 0.484 0.520
600 0.384 0.521

Recommended Generation Parameters

Parameter Value
Temperature 0.3
min_p 0.15
Repetition penalty 1.05

Usage with qmd

# Use the pre-built GGUF:
export QMD_GEN_MODEL="hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf"
qmd query "docker timeout"

Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("OrcsRise/qmd-query-expansion-lfm2-sft")
tokenizer = AutoTokenizer.from_pretrained("OrcsRise/qmd-query-expansion-lfm2-sft")

messages = [{"role": "user", "content": "/no_think Expand this search query:\n\ndocker timeout"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256, temperature=0.3, min_p=0.15, repetition_penalty=1.05)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Framework Versions

  • TRL: 0.28.0
  • Transformers: 5.0.0
  • PyTorch: 2.9.0+cu128
  • Datasets: 4.0.0
  • PEFT: LoRA via TRL SFTTrainer

Related

License

Apache 2.0 β€” same as the base LFM2-1.2B model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for OrcsRise/qmd-query-expansion-lfm2-sft

Base model

LiquidAI/LFM2-1.2B
Adapter
(7)
this model

Dataset used to train OrcsRise/qmd-query-expansion-lfm2-sft