sift
Fine-tune of Qwen3.5 0.8B for a single job: pull the signal out of noisy error output.
You give it a Python traceback, a Kubernetes log, a Rust build failure and it gives back either a clean 2-3 line summary of what actually broke, or a search query you can paste into Google. All the absolute paths, UUIDs, timestamps, internal framework frames, and other garbage that makes errors unreadable gets dropped. The actual error message, error codes, and causal chain stay.
This runs on CPU in 1-2 seconds at Q4_K_M quantization. No GPU needed.
Example
Input:
Traceback (most recent call last):
File "/home/sid/projects/api/src/routes.py", line 42, in handler
result = db.execute(query)
File "/home/sid/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2308, in execute
return self._execute_internal(
File "/home/sid/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
self.dialect.do_execute(
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "10.0.1.55", port 5432 failed: Connection timed out
Is the server running on that host and accepting TCP/IP connections?
Clean output:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server failed: Connection timed out
Is the server running on that host and accepting TCP/IP connections?
Search output:
sqlalchemy psycopg2 OperationalError connection timed out PostgreSQL
Two modes, two system prompts
The model was trained on these exact prompts. Use them verbatim.
Clean mode โ strips noise, keeps the actual error:
You clean raw error output so an LLM debugging agent can diagnose the root cause. Given a raw error, return ONLY the cleaned version.
REMOVED:
- Timestamps, dates, uptime counters
- UUIDs, request IDs, correlation IDs, trace IDs, span IDs
- Absolute file paths (keep just filename + line number)
- IP addresses, hostnames, port numbers (UNLESS the error is about connectivity/DNS/TLS)
- Subscription IDs, account IDs, project IDs, tenant IDs
- Container IDs, pod name suffixes (keep the deployment/service name)
- User-specific resource names (replace with <resource_name>, <bucket_name>, <db_name>, etc.)
- Redundant/repeated lines (but note the count, e.g., "... repeated 47 times")
- Framework-internal stack frames deep in the call stack (keep top 3-5 user code frames + the deepest frame that THROWS the error)
- ANSI color codes, spinner characters, progress bar artifacts
- Auth tokens, keys, passwords, connection strings (replace with <redacted>)
KEPT:
- Error codes and error types (e.g., E0382, ORA-1234, ENOENT, SQLSTATE)
- The full error message text exactly as written
- The framework/tool name and version if mentioned
- Relevant stack frames showing user code (filename:line_number + function name)
- The DEEPEST causal frame (the frame that actually threw/originated the error)
- "Caused by" / "caused by" / chained exception chains IN FULL
- Configuration keys and values that caused the error
- HTTP status codes AND the response body/message if present
- Environment/runtime info (language version, OS, platform) if mentioned
- State or phase info (e.g., "during migration", "at startup", "while compiling")
- Variable names, types, and values mentioned in the error
- Expected vs. actual values (e.g., "expected string, got null")
- Permission/role names in auth errors
- Package/module/crate names and versions in dependency errors
- Exit codes and signal names
STRUCTURE:
- If there are chained/nested errors ("Caused by"), preserve the full chain in order
- If there are multiple distinct errors, separate them with a blank line
- Preserve the hierarchy: primary error first, then causes, then relevant stack frames
- If a repeated message was collapsed, append: [repeated N times]
Return ONLY the cleaned error text. No explanations, no markdown formatting, no prefixes, no commentary.
Search mode โ returns a keyword query:
You are sift in search mode. Given raw error output, return ONLY a short search query (5-15 words) optimized for Google/StackOverflow. Just keywords, no quotes, no operators. Nothing else.
Chat template
Standard Qwen3.5 ChatML:
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{user_input}<|im_end|>
<|im_start|>assistant
Inference parameters
These were used during training and testing:
| Parameter | Value |
|---|---|
| temperature | 0.1 |
| top_p | 0.9 |
| repeat_penalty | 1.1 |
| n_predict | 512 |
| n_ctx | 2048 |
| n_gpu_layers | 0 (CPU only) |
Running with llama.cpp
./llama-cli \
-m sift-qwen3.5-0.8b-error-extractor.Q4_K_M.gguf \
--temp 0.1 \
--top-p 0.9 \
--repeat-penalty 1.1 \
-n 512 \
-c 2048 \
--no-display-prompt \
-p "<|im_start|>system
You clean raw error output...<|im_end|>
<|im_start|>user
{your_error_here}<|im_end|>
<|im_start|>assistant
"
Input length
Keep inputs under ~2000 characters. The model was trained on pre-filtered error excerpts, not full log files. If you're feeding it a 10,000-line log, extract the relevant section first โ signal lines (lines with Error:, Exception, Caused by:, etc.) plus a few lines of surrounding context.
What it was trained on
The finetuning dataset was 13 thousand error messages from stackoverflow that were taken via Big Query and cleaned up.
Quantization
This repo hosts the Q4_K_M GGUF (~500MB). It runs on CPU and is fast enough for interactive use
Related
- sift CLI โ the Rust tool this model ships in
- Downloads last month
- 56
We're not able to determine the quantization variants.
Model tree for Sid77449/sift
Base model
Qwen/Qwen2.5-0.5B