Instructions to use Sid77449/sift with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sid77449/sift with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Sid77449/sift",
	filename="model.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Sid77449/sift with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Sid77449/sift
# Run inference directly in the terminal:
llama-cli -hf Sid77449/sift

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Sid77449/sift
# Run inference directly in the terminal:
llama-cli -hf Sid77449/sift

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Sid77449/sift
# Run inference directly in the terminal:
./llama-cli -hf Sid77449/sift

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Sid77449/sift
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Sid77449/sift

Use Docker

docker model run hf.co/Sid77449/sift

LM Studio
Jan
Ollama
How to use Sid77449/sift with Ollama:
```
ollama run hf.co/Sid77449/sift
```

Unsloth Studio new

How to use Sid77449/sift with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Sid77449/sift to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Sid77449/sift to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Sid77449/sift to start chatting

Pi new

How to use Sid77449/sift with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Sid77449/sift

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Sid77449/sift"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Sid77449/sift with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Sid77449/sift

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Sid77449/sift

Run Hermes

hermes

Docker Model Runner
How to use Sid77449/sift with Docker Model Runner:
```
docker model run hf.co/Sid77449/sift
```

Lemonade

How to use Sid77449/sift with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Sid77449/sift

Run and chat with the model

lemonade run user.sift-{{QUANT_TAG}}

List all available models

lemonade list

sift

Fine-tune of Qwen3.5 0.8B for a single job: pull the signal out of noisy error output.

You give it a Python traceback, a Kubernetes log, a Rust build failure and it gives back either a clean 2-3 line summary of what actually broke, or a search query you can paste into Google. All the absolute paths, UUIDs, timestamps, internal framework frames, and other garbage that makes errors unreadable gets dropped. The actual error message, error codes, and causal chain stay.

This runs on CPU in 1-2 seconds at Q4_K_M quantization. No GPU needed.

Example

Input:

Traceback (most recent call last):
  File "/home/sid/projects/api/src/routes.py", line 42, in handler
    result = db.execute(query)
  File "/home/sid/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2308, in execute
    return self._execute_internal(
  File "/home/sid/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "10.0.1.55", port 5432 failed: Connection timed out
    Is the server running on that host and accepting TCP/IP connections?

Clean output:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server failed: Connection timed out
Is the server running on that host and accepting TCP/IP connections?

Search output:

sqlalchemy psycopg2 OperationalError connection timed out PostgreSQL

Two modes, two system prompts

The model was trained on these exact prompts. Use them verbatim.

Clean mode — strips noise, keeps the actual error:

You clean raw error output so an LLM debugging agent can diagnose the root cause. Given a raw error, return ONLY the cleaned version.
REMOVED:
- Timestamps, dates, uptime counters
- UUIDs, request IDs, correlation IDs, trace IDs, span IDs
- Absolute file paths (keep just filename + line number)
- IP addresses, hostnames, port numbers (UNLESS the error is about connectivity/DNS/TLS)
- Subscription IDs, account IDs, project IDs, tenant IDs
- Container IDs, pod name suffixes (keep the deployment/service name)
- User-specific resource names (replace with <resource_name>, <bucket_name>, <db_name>, etc.)
- Redundant/repeated lines (but note the count, e.g., "... repeated 47 times")
- Framework-internal stack frames deep in the call stack (keep top 3-5 user code frames + the deepest frame that THROWS the error)
- ANSI color codes, spinner characters, progress bar artifacts
- Auth tokens, keys, passwords, connection strings (replace with <redacted>)
KEPT:
- Error codes and error types (e.g., E0382, ORA-1234, ENOENT, SQLSTATE)
- The full error message text exactly as written
- The framework/tool name and version if mentioned
- Relevant stack frames showing user code (filename:line_number + function name)
- The DEEPEST causal frame (the frame that actually threw/originated the error)
- "Caused by" / "caused by" / chained exception chains IN FULL
- Configuration keys and values that caused the error
- HTTP status codes AND the response body/message if present
- Environment/runtime info (language version, OS, platform) if mentioned
- State or phase info (e.g., "during migration", "at startup", "while compiling")
- Variable names, types, and values mentioned in the error
- Expected vs. actual values (e.g., "expected string, got null")
- Permission/role names in auth errors
- Package/module/crate names and versions in dependency errors
- Exit codes and signal names
STRUCTURE:
- If there are chained/nested errors ("Caused by"), preserve the full chain in order
- If there are multiple distinct errors, separate them with a blank line
- Preserve the hierarchy: primary error first, then causes, then relevant stack frames
- If a repeated message was collapsed, append: [repeated N times]
Return ONLY the cleaned error text. No explanations, no markdown formatting, no prefixes, no commentary.

Search mode — returns a keyword query:

You are sift in search mode. Given raw error output, return ONLY a short search query (5-15 words) optimized for Google/StackOverflow. Just keywords, no quotes, no operators. Nothing else.

Chat template

Standard Qwen3.5 ChatML:

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{user_input}<|im_end|>
<|im_start|>assistant

Inference parameters

These were used during training and testing:

Parameter	Value
temperature	0.1
top_p	0.9
repeat_penalty	1.1
n_predict	512
n_ctx	2048
n_gpu_layers	0 (CPU only)

Running with llama.cpp

./llama-cli \
  -m sift-qwen3.5-0.8b-error-extractor.Q4_K_M.gguf \
  --temp 0.1 \
  --top-p 0.9 \
  --repeat-penalty 1.1 \
  -n 512 \
  -c 2048 \
  --no-display-prompt \
  -p "<|im_start|>system
You clean raw error output...<|im_end|>
<|im_start|>user
{your_error_here}<|im_end|>
<|im_start|>assistant
"

Input length

Keep inputs under ~2000 characters. The model was trained on pre-filtered error excerpts, not full log files. If you're feeding it a 10,000-line log, extract the relevant section first — signal lines (lines with Error:, Exception, Caused by:, etc.) plus a few lines of surrounding context.

What it was trained on

The finetuning dataset was 13 thousand error messages from stackoverflow that were taken via Big Query and cleaned up.

Quantization

This repo hosts the Q4_K_M GGUF (~500MB). It runs on CPU and is fast enough for interactive use

sift CLI — the Rust tool this model ships in

Downloads last month: 56

GGUF

Model size

0.8B params

Architecture

qwen35

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sid77449/sift

Base model

Qwen/Qwen2.5-0.5B

Quantized

(105)

this model

Sid77449
/

sift