How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf louisguthmann/qwen3.5-2b-shellcommand-linux-gguf:
# Run inference directly in the terminal:
llama-cli -hf louisguthmann/qwen3.5-2b-shellcommand-linux-gguf:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf louisguthmann/qwen3.5-2b-shellcommand-linux-gguf:
# Run inference directly in the terminal:
llama-cli -hf louisguthmann/qwen3.5-2b-shellcommand-linux-gguf:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf louisguthmann/qwen3.5-2b-shellcommand-linux-gguf:
# Run inference directly in the terminal:
./llama-cli -hf louisguthmann/qwen3.5-2b-shellcommand-linux-gguf:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf louisguthmann/qwen3.5-2b-shellcommand-linux-gguf:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf louisguthmann/qwen3.5-2b-shellcommand-linux-gguf:
Use Docker
docker model run hf.co/louisguthmann/qwen3.5-2b-shellcommand-linux-gguf:
Quick Links

Qwen3.5-2B ShellCommand-Linux GGUF

This repository contains merged GGUF exports of the current best Qwen3.5-2B ShellCommand-Linux LoRA.

Source

  • adapter source: https://huggingface.co/louisguthmann/qwen3.5-2b-shellcommand-linux-lora
  • GitHub repo: https://github.com/GuthL/bitnet-nl2sh

Files

  • Qwen3.5-2B-shellcommand-linux-F16.gguf
  • Qwen3.5-2B-shellcommand-linux-Q4_K_M.gguf
  • Qwen3.5-2B-shellcommand-linux-Q4_K_S.gguf

Inherited Eval Snapshot

These metrics come from the source LoRA adapter before GGUF quantization.

  • score: 276.5033
  • verifier ok rate: 0.7750
  • verifier command rate: 0.7604
  • verifier ask rate: 0.7500
  • verifier cannot rate: 1.0000
  • exact any-exact rate: 0.2500
  • exact parse-ok rate: 0.9800

Recommended Deployment Variants

  • Q4_K_M: safer default if you want more quality headroom
  • Q4_K_S: leaner option if memory or latency is tighter

CX23 Benchmarking

See the GitHub docs for the exact benchmark commands used for llama.cpp on Hetzner CX23.

Downloads last month
118
GGUF
Model size
2B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for louisguthmann/qwen3.5-2b-shellcommand-linux-gguf

Finetuned
Qwen/Qwen3.5-2B
Quantized
(92)
this model