Updated readme

409c41d verified about 2 months ago

2.96 kB

license: apache-2.0
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
  - qwen2.5
  - fine-tuned
  - qlora
  - query-optimization
  - enterprise-search
  - text2text-generation
language:
  - en
pipeline_tag: text-generation

Qwen2.5-1.5B Query Optimizer

A fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct trained to rewrite loose, conversational user queries into clear, retrieval-focused enterprise document search queries.

Model Details

Property	Value
Base model	Qwen/Qwen2.5-1.5B-Instruct
Fine-tuning method	QLoRA (4-bit NF4 + LoRA)
LoRA rank	16
LoRA alpha	32
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training examples	481 (90% of 535 total)
Eval examples	54 (10% of 535 total)
Training epochs	3
Effective batch size	16 (4 × 4 gradient accumulation)
Learning rate	2e-4 (cosine schedule)
Max sequence length	256

Intended Use

This model is designed for enterprise AI search pipelines where raw user queries need to be normalized before being passed to a retrieval system (e.g., vector search, BM25, or hybrid search).

Input: A natural, conversational user query
Output: A concise, retrieval-optimized search query

Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "abi-commits/qwen-query-optimizer"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)

SYSTEM_PROMPT = (
    "You are a query optimization agent. Rewrite user queries into clear, "
    "retrieval-focused enterprise document search queries. "
    "Do not add new information. Do not hallucinate."
)

def optimize_query(user_query: str) -> str:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user",   "content": user_query},
    ]
    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=80,
            do_sample=False,
            repetition_penalty=1.1,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id,
        )
    generated = output_ids[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(generated, skip_special_tokens=True).strip()

# Examples
print(optimize_query("how do i request time off?"))
# → "employee leave request procedure and time-off policy"

print(optimize_query("what's the refund policy?"))
# → "refund policy terms and conditions for customer returns"