Cleo

Cleo is a small SQL analyst hardel: a Qwen3.5-2B fine-tune paired with a read-only SQL harness/runtime for live database connections. The model is trained to inspect schemas, gather real values from the database, repair SQL from execution feedback, and return analyst-ready read-only queries.

The recommended entry point is the Python package and MCP server: github.com/Dreeseaw/cleo.

pip install "cleo-sql[hf] @ git+https://github.com/Dreeseaw/cleo.git@master"
from cleo import Cleo

cleo = Cleo.from_hf("dreeseaw/cleo")
ans = cleo("How many employees are currently in each department?", conn)

print(ans.sql)
print(ans.rows)
print(ans.clarification)

cleo(...) and cleo.ask(...) use the hardel runtime by default: a greedy candidate plus sampled candidates are executed through the same read-only harness, then selected with product-visible execution evidence. Use cleo.ask_once(...) when you explicitly want a single-candidate, lower-latency path.

Files

file purpose
root model files Current hardel Transformers weights in bf16-compatible safetensors format.
v1.4-hardel-v3/ Archived copy of the current hardel checkpoint.
cleo-Q8_0.gguf Legacy llama-cpp-python GGUF alias from a prior release.
cleo_v1_2_bird-no_mtp-Q8_0.gguf Prior versioned Q8_0 GGUF artifact.
v1.0/ Earlier archived tool-use checkpoint.

For the current release, use the Hugging Face Transformers backend through Cleo.from_hf("dreeseaw/cleo"). The GGUF files are retained for compatibility with earlier runtime paths.

Public Benchmark

All rows below use denotation scoring on BIRD minidev SQLite: predicted and gold SQL are executed, normalized row sets are compared, and the formula_1 database is excluded because of training overlap. BIRD-434 is reported as a broad analytical SQL benchmark, not as the only measure of the product runtime.

BIRD-434 SQL execution accuracy

model / runtime BIRD-434 execution accuracy
Cleo v1.4 hardel K=4 143/434 = 33.0%
Gemini 2.5 Flash 55.5%
DeepSeek Chat 50.5%

Cleo was evaluated through the public package runtime with k=4, temperature=0.7, max_gather=3, and max_repair=2.

Model Lineage

Cleo starts from Qwen/Qwen3.5-2B-Base, then adds SQL analyst behavior in stages:

  1. Analyst SQL contract SFT: strict JSON outputs, read-only SQL, clarification behavior, and schema-grounded query writing across schema-diverse tasks.
  2. Real-schema teacher distillation: on-policy trajectories from larger SQL-capable models teach the first analyst checkpoint to work against realistic database shapes.
  3. Tool-use continuation: ECHO-format traces teach gather -> observation -> final behavior, so the model can discover stored values, codes, sentinels, and naming conventions before producing final SQL.
  4. Repair and runtime continuation: the steadier full-fine-tuned ECHO branch is continued with train-safe observed-value corrections and replay, emphasizing reliable harness behavior over one-off repair memorization.
  5. Hardel runtime selection: the shipped package combines the model with live execution, candidate search, repair, and an evidence selector, making the model and harness one product surface rather than two separate demos.

Runtime Notes

The root model files are current bf16 Transformers weights. They are not stored as int8 weights. For CUDA machines that need lower VRAM, install the optional extra and load with runtime quantization:

pip install "cleo-sql[hf,int8] @ git+https://github.com/Dreeseaw/cleo.git@master"
cleo = Cleo.from_hf("dreeseaw/cleo", quantization="int8")

By default, Cleo.from_hf() asks PyTorch what is available and chooses CUDA, then XPU, then MPS, then CPU. CUDA is the tested fast path for this release; CPU is supported by compatible Transformers installs but is slower.

Limitations

  • Cleo is designed for analyst SQL workflows over live, read-only database connections.
  • Use least-privilege read-only credentials, query timeouts, and normal application-level review for production data access.
  • The package supports common DB-API connections and dialect transpilation across SQLite, Postgres, MySQL, and DuckDB-style workflows; validate outputs for production-critical reporting.
  • Very large schemas should be scoped with tables= or a provided schema= string so the runtime sees the relevant part of the database.

Links

Downloads last month
320
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dreeseaw/cleo

Finetuned
(46)
this model