qwen3-1.7b-promql

A fine-tuned version of Qwen3-1.7B for generating PromQL queries from natural language descriptions.

Model Details

  • Base model: Qwen/Qwen3-1.7B
  • Fine-tuning method: QLoRA (4-bit) via Unsloth
  • Training data: ~6,400 curated PromQL instruction examples covering Kubernetes, node metrics, application metrics, and alerting patterns
  • Training time: ~24 minutes on A100
  • Formats available: LoRA adapter weights + GGUF (Q4_K_M)

Evaluation

Evaluated against the base Qwen3-1.7B on 100 held-out examples using PromQL parser validation and LLM-as-judge scoring (1-5):

Model Valid PromQL Correct% Avg Score
qwen3-1.7b-promql (this model) 90% 35% 3.55
qwen3:1.7b (base) 6% 4%

Per-category breakdown:

Category Valid% Correct%
General metrics 90% 45%
Hard / multi-step 93% 48%
Expert / subqueries 87% 12%

The model performs well on common Kubernetes and infrastructure monitoring queries. Complex nested subqueries (e.g. min_over_time(rate(...)[6h:5m])) are the current weak spot.

Usage

Ollama (recommended)

# Download the GGUF file from this repo, then:
cat > Modelfile << 'EOF'
FROM ./qwen3-1.7b.Q4_K_M.gguf

TEMPLATE """<|im_start|>system
You are a PromQL expert. Given a monitoring request and context, return only the PromQL query with no explanation.<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER temperature 0.1
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>"
EOF

ollama create promql -f Modelfile
ollama run promql "Request: Show HTTP error rate over 5 minutes
Context: Metric http_requests_total with labels code, method"

Transformers + LoRA adapter

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
model = PeftModel.from_pretrained(base, "AsyncBuilds/qwen3-1.7b-promql")

SYSTEM = "You are a PromQL expert. Given a monitoring request and context, return only the PromQL query with no explanation."

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user",   "content": "Request: Show CPU usage per node\nContext: Metric node_cpu_seconds_total"},
]

prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response.strip())

Input Format

The model expects input in this format:

Request: <natural language description of what you want to measure>
Context: <relevant metric names and labels>

The Context field is optional but improves accuracy — include the metric name(s) you want to query when known.

Training Data

Trained on a curated dataset of ~6,400 PromQL instruction examples covering:

  • Kubernetes cluster metrics (kube-state-metrics, cAdvisor)
  • Node/infrastructure metrics (node_exporter)
  • Application metrics (HTTP, gRPC, database)
  • Alerting patterns (absent, rate thresholds)
  • Hard negatives (common mistakes and their corrections)

Dataset was validated using a combination of PromQL parser validation and LLM-as-judge scoring before training.

Limitations

  • Complex nested subqueries with multiple aggregation levels may be inaccurate
  • Non-standard or custom metric names require explicit context
  • Not a substitute for understanding PromQL — always validate generated queries before use in production alerting

License

Apache 2.0 — same as the base Qwen3 model.

Downloads last month
56
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AsyncBuilds/qwen3-1.7b-promql

Finetuned
Qwen/Qwen3-1.7B
Adapter
(338)
this model