qwen3-1.7b-promql
A fine-tuned version of Qwen3-1.7B for generating PromQL queries from natural language descriptions.
Model Details
- Base model: Qwen/Qwen3-1.7B
- Fine-tuning method: QLoRA (4-bit) via Unsloth
- Training data: ~6,400 curated PromQL instruction examples covering Kubernetes, node metrics, application metrics, and alerting patterns
- Training time: ~24 minutes on A100
- Formats available: LoRA adapter weights + GGUF (Q4_K_M)
Evaluation
Evaluated against the base Qwen3-1.7B on 100 held-out examples using PromQL parser validation and LLM-as-judge scoring (1-5):
| Model | Valid PromQL | Correct% | Avg Score |
|---|---|---|---|
| qwen3-1.7b-promql (this model) | 90% | 35% | 3.55 |
| qwen3:1.7b (base) | 6% | 4% | — |
Per-category breakdown:
| Category | Valid% | Correct% |
|---|---|---|
| General metrics | 90% | 45% |
| Hard / multi-step | 93% | 48% |
| Expert / subqueries | 87% | 12% |
The model performs well on common Kubernetes and infrastructure monitoring queries. Complex nested subqueries (e.g. min_over_time(rate(...)[6h:5m])) are the current weak spot.
Usage
Ollama (recommended)
# Download the GGUF file from this repo, then:
cat > Modelfile << 'EOF'
FROM ./qwen3-1.7b.Q4_K_M.gguf
TEMPLATE """<|im_start|>system
You are a PromQL expert. Given a monitoring request and context, return only the PromQL query with no explanation.<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER temperature 0.1
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>"
EOF
ollama create promql -f Modelfile
ollama run promql "Request: Show HTTP error rate over 5 minutes
Context: Metric http_requests_total with labels code, method"
Transformers + LoRA adapter
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
model = PeftModel.from_pretrained(base, "AsyncBuilds/qwen3-1.7b-promql")
SYSTEM = "You are a PromQL expert. Given a monitoring request and context, return only the PromQL query with no explanation."
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "Request: Show CPU usage per node\nContext: Metric node_cpu_seconds_total"},
]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response.strip())
Input Format
The model expects input in this format:
Request: <natural language description of what you want to measure>
Context: <relevant metric names and labels>
The Context field is optional but improves accuracy — include the metric name(s) you want to query when known.
Training Data
Trained on a curated dataset of ~6,400 PromQL instruction examples covering:
- Kubernetes cluster metrics (kube-state-metrics, cAdvisor)
- Node/infrastructure metrics (node_exporter)
- Application metrics (HTTP, gRPC, database)
- Alerting patterns (absent, rate thresholds)
- Hard negatives (common mistakes and their corrections)
Dataset was validated using a combination of PromQL parser validation and LLM-as-judge scoring before training.
Limitations
- Complex nested subqueries with multiple aggregation levels may be inaccurate
- Non-standard or custom metric names require explicit context
- Not a substitute for understanding PromQL — always validate generated queries before use in production alerting
License
Apache 2.0 — same as the base Qwen3 model.
- Downloads last month
- 56
4-bit