Instructions to use dreeseaw/cleo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dreeseaw/cleo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dreeseaw/cleo") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("dreeseaw/cleo") model = AutoModelForMultimodalLM.from_pretrained("dreeseaw/cleo") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use dreeseaw/cleo with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="dreeseaw/cleo", filename="cleo-Q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use dreeseaw/cleo with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dreeseaw/cleo:Q8_0 # Run inference directly in the terminal: llama-cli -hf dreeseaw/cleo:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dreeseaw/cleo:Q8_0 # Run inference directly in the terminal: llama-cli -hf dreeseaw/cleo:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf dreeseaw/cleo:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf dreeseaw/cleo:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf dreeseaw/cleo:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf dreeseaw/cleo:Q8_0
Use Docker
docker model run hf.co/dreeseaw/cleo:Q8_0
- LM Studio
- Jan
- vLLM
How to use dreeseaw/cleo with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dreeseaw/cleo" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dreeseaw/cleo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dreeseaw/cleo:Q8_0
- SGLang
How to use dreeseaw/cleo with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dreeseaw/cleo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dreeseaw/cleo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dreeseaw/cleo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dreeseaw/cleo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use dreeseaw/cleo with Ollama:
ollama run hf.co/dreeseaw/cleo:Q8_0
- Unsloth Studio
How to use dreeseaw/cleo with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dreeseaw/cleo to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dreeseaw/cleo to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for dreeseaw/cleo to start chatting
- Pi
How to use dreeseaw/cleo with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dreeseaw/cleo:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dreeseaw/cleo:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dreeseaw/cleo with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dreeseaw/cleo:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dreeseaw/cleo:Q8_0
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use dreeseaw/cleo with Docker Model Runner:
docker model run hf.co/dreeseaw/cleo:Q8_0
- Lemonade
How to use dreeseaw/cleo with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull dreeseaw/cleo:Q8_0
Run and chat with the model
lemonade run user.cleo-Q8_0
List all available models
lemonade list
Cleo
Cleo is a small SQL analyst hardel: a Qwen3.5-2B fine-tune paired with a read-only SQL harness/runtime for live database connections. The model is trained to inspect schemas, gather real values from the database, repair SQL from execution feedback, and return analyst-ready read-only queries.
The recommended entry point is the Python package and MCP server:
github.com/Dreeseaw/cleo.
pip install "cleo-sql[hf] @ git+https://github.com/Dreeseaw/cleo.git@master"
from cleo import Cleo
cleo = Cleo.from_hf("dreeseaw/cleo")
ans = cleo("How many employees are currently in each department?", conn)
print(ans.sql)
print(ans.rows)
print(ans.clarification)
cleo(...) and cleo.ask(...) use the hardel runtime by default: a greedy candidate plus sampled candidates are executed through the same read-only harness, then selected with product-visible execution evidence. Use cleo.ask_once(...) when you explicitly want a single-candidate, lower-latency path.
Files
| file | purpose |
|---|---|
| root model files | Current hardel Transformers weights in bf16-compatible safetensors format. |
v1.4-hardel-v3/ |
Archived copy of the current hardel checkpoint. |
cleo-Q8_0.gguf |
Legacy llama-cpp-python GGUF alias from a prior release. |
cleo_v1_2_bird-no_mtp-Q8_0.gguf |
Prior versioned Q8_0 GGUF artifact. |
v1.0/ |
Earlier archived tool-use checkpoint. |
For the current release, use the Hugging Face Transformers backend through Cleo.from_hf("dreeseaw/cleo"). The GGUF files are retained for compatibility with earlier runtime paths.
Public Benchmark
All rows below use denotation scoring on BIRD minidev SQLite: predicted and gold SQL are executed, normalized row sets are compared, and the formula_1 database is excluded because of training overlap. BIRD-434 is reported as a broad analytical SQL benchmark, not as the only measure of the product runtime.
| model / runtime | BIRD-434 execution accuracy |
|---|---|
| Cleo v1.4 hardel K=4 | 143/434 = 33.0% |
| Gemini 2.5 Flash | 55.5% |
| DeepSeek Chat | 50.5% |
Cleo was evaluated through the public package runtime with k=4, temperature=0.7, max_gather=3, and max_repair=2.
Model Lineage
Cleo starts from Qwen/Qwen3.5-2B-Base, then adds SQL analyst behavior in stages:
- Analyst SQL contract SFT: strict JSON outputs, read-only SQL, clarification behavior, and schema-grounded query writing across schema-diverse tasks.
- Real-schema teacher distillation: on-policy trajectories from larger SQL-capable models teach the first analyst checkpoint to work against realistic database shapes.
- Tool-use continuation: ECHO-format traces teach gather -> observation -> final behavior, so the model can discover stored values, codes, sentinels, and naming conventions before producing final SQL.
- Repair and runtime continuation: the steadier full-fine-tuned ECHO branch is continued with train-safe observed-value corrections and replay, emphasizing reliable harness behavior over one-off repair memorization.
- Hardel runtime selection: the shipped package combines the model with live execution, candidate search, repair, and an evidence selector, making the model and harness one product surface rather than two separate demos.
Runtime Notes
The root model files are current bf16 Transformers weights. They are not stored as int8 weights. For CUDA machines that need lower VRAM, install the optional extra and load with runtime quantization:
pip install "cleo-sql[hf,int8] @ git+https://github.com/Dreeseaw/cleo.git@master"
cleo = Cleo.from_hf("dreeseaw/cleo", quantization="int8")
By default, Cleo.from_hf() asks PyTorch what is available and chooses CUDA, then XPU, then MPS, then CPU. CUDA is the tested fast path for this release; CPU is supported by compatible Transformers installs but is slower.
Limitations
- Cleo is designed for analyst SQL workflows over live, read-only database connections.
- Use least-privilege read-only credentials, query timeouts, and normal application-level review for production data access.
- The package supports common DB-API connections and dialect transpilation across SQLite, Postgres, MySQL, and DuckDB-style workflows; validate outputs for production-critical reporting.
- Very large schemas should be scoped with
tables=or a providedschema=string so the runtime sees the relevant part of the database.
Links
- Package and MCP server:
Dreeseaw/cleo - PyPI:
cleo-sql - Base model:
Qwen/Qwen3.5-2B-Base - Value-discovery benchmark:
dreeseaw/cleo-value-discovery - Process analytics dataset:
dreeseaw/cleo-process-analytics-v1
- Downloads last month
- 320
Model tree for dreeseaw/cleo
Base model
Qwen/Qwen3.5-2B-Base