How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Sid77449/sift
# Run inference directly in the terminal:
llama-cli -hf Sid77449/sift
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Sid77449/sift
# Run inference directly in the terminal:
llama-cli -hf Sid77449/sift
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Sid77449/sift
# Run inference directly in the terminal:
./llama-cli -hf Sid77449/sift
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Sid77449/sift
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Sid77449/sift
Use Docker
docker model run hf.co/Sid77449/sift
Quick Links

sift

Fine-tune of Qwen3.5 0.8B for a single job: pull the signal out of noisy error output.

You give it a Python traceback, a Kubernetes log, a Rust build failure and it gives back either a clean 2-3 line summary of what actually broke, or a search query you can paste into Google. All the absolute paths, UUIDs, timestamps, internal framework frames, and other garbage that makes errors unreadable gets dropped. The actual error message, error codes, and causal chain stay.

This runs on CPU in 1-2 seconds at Q4_K_M quantization. No GPU needed.

Example

Input:

Traceback (most recent call last):
  File "/home/sid/projects/api/src/routes.py", line 42, in handler
    result = db.execute(query)
  File "/home/sid/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2308, in execute
    return self._execute_internal(
  File "/home/sid/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "10.0.1.55", port 5432 failed: Connection timed out
    Is the server running on that host and accepting TCP/IP connections?

Clean output:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server failed: Connection timed out
Is the server running on that host and accepting TCP/IP connections?

Search output:

sqlalchemy psycopg2 OperationalError connection timed out PostgreSQL

Two modes, two system prompts

The model was trained on these exact prompts. Use them verbatim.

Clean mode โ€” strips noise, keeps the actual error:

You clean raw error output so an LLM debugging agent can diagnose the root cause. Given a raw error, return ONLY the cleaned version.
REMOVED:
- Timestamps, dates, uptime counters
- UUIDs, request IDs, correlation IDs, trace IDs, span IDs
- Absolute file paths (keep just filename + line number)
- IP addresses, hostnames, port numbers (UNLESS the error is about connectivity/DNS/TLS)
- Subscription IDs, account IDs, project IDs, tenant IDs
- Container IDs, pod name suffixes (keep the deployment/service name)
- User-specific resource names (replace with <resource_name>, <bucket_name>, <db_name>, etc.)
- Redundant/repeated lines (but note the count, e.g., "... repeated 47 times")
- Framework-internal stack frames deep in the call stack (keep top 3-5 user code frames + the deepest frame that THROWS the error)
- ANSI color codes, spinner characters, progress bar artifacts
- Auth tokens, keys, passwords, connection strings (replace with <redacted>)
KEPT:
- Error codes and error types (e.g., E0382, ORA-1234, ENOENT, SQLSTATE)
- The full error message text exactly as written
- The framework/tool name and version if mentioned
- Relevant stack frames showing user code (filename:line_number + function name)
- The DEEPEST causal frame (the frame that actually threw/originated the error)
- "Caused by" / "caused by" / chained exception chains IN FULL
- Configuration keys and values that caused the error
- HTTP status codes AND the response body/message if present
- Environment/runtime info (language version, OS, platform) if mentioned
- State or phase info (e.g., "during migration", "at startup", "while compiling")
- Variable names, types, and values mentioned in the error
- Expected vs. actual values (e.g., "expected string, got null")
- Permission/role names in auth errors
- Package/module/crate names and versions in dependency errors
- Exit codes and signal names
STRUCTURE:
- If there are chained/nested errors ("Caused by"), preserve the full chain in order
- If there are multiple distinct errors, separate them with a blank line
- Preserve the hierarchy: primary error first, then causes, then relevant stack frames
- If a repeated message was collapsed, append: [repeated N times]
Return ONLY the cleaned error text. No explanations, no markdown formatting, no prefixes, no commentary.

Search mode โ€” returns a keyword query:

You are sift in search mode. Given raw error output, return ONLY a short search query (5-15 words) optimized for Google/StackOverflow. Just keywords, no quotes, no operators. Nothing else.

Chat template

Standard Qwen3.5 ChatML:

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{user_input}<|im_end|>
<|im_start|>assistant

Inference parameters

These were used during training and testing:

Parameter Value
temperature 0.1
top_p 0.9
repeat_penalty 1.1
n_predict 512
n_ctx 2048
n_gpu_layers 0 (CPU only)

Running with llama.cpp

./llama-cli \
  -m sift-qwen3.5-0.8b-error-extractor.Q4_K_M.gguf \
  --temp 0.1 \
  --top-p 0.9 \
  --repeat-penalty 1.1 \
  -n 512 \
  -c 2048 \
  --no-display-prompt \
  -p "<|im_start|>system
You clean raw error output...<|im_end|>
<|im_start|>user
{your_error_here}<|im_end|>
<|im_start|>assistant
"

Input length

Keep inputs under ~2000 characters. The model was trained on pre-filtered error excerpts, not full log files. If you're feeding it a 10,000-line log, extract the relevant section first โ€” signal lines (lines with Error:, Exception, Caused by:, etc.) plus a few lines of surrounding context.

What it was trained on

The finetuning dataset was 13 thousand error messages from stackoverflow that were taken via Big Query and cleaned up.

Quantization

This repo hosts the Q4_K_M GGUF (~500MB). It runs on CPU and is fast enough for interactive use

Related

  • sift CLI โ€” the Rust tool this model ships in
Downloads last month
56
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Sid77449/sift

Quantized
(105)
this model