Phi-3-mini Text-to-SQL — GGUF (quantized for CPU)

Quantized GGUF builds of the fine-tuned Phi-3-mini Text-to-SQL model (LoRA already merged into the base weights), for fast CPU inference with llama.cpp.

File Size Effective bits/weight vs f16
phi3-text-to-sql-Q4_K_M.gguf ⭐ recommended 2.40 GB 5.01 −68.6% (3.2× smaller)
phi3-text-to-sql-Q5_K_M.gguf 2.76 GB 5.76 −64.0% (2.8× smaller)

Note: "Q4" K-quants average ~5 effective bits/weight (embeddings and some tensors stay higher-precision), so the file is larger than a literal 4-bit×params calculation.

Which one?

Use Q4_K_M. On this task it matched Q5_K_M on quality while being smaller and faster.

Benchmarks (measured)

CPU = Intel i7-13650HX, 14 threads, llama-bench, build 9637:

Model Prompt processing (pp256) Token generation (tg64)
Q4_K_M 91.4 tok/s 20.1 tok/s
Q5_K_M 59.6 tok/s 18.5 tok/s

Task quality (12 held-out questions, execution-match against a live SQLite DB):

Model Execution-match Valid SQL
Q4_K_M 75.0% 100%
Q5_K_M 75.0% 100%

4-bit quantization cost no measurable task accuracy vs 5-bit here.

Run it

# CLI
llama-cli -m phi3-text-to-sql-Q4_K_M.gguf -p "<|user|>\n<schema + question><|end|>\n<|assistant|>\n" -n 150 --temp 0

# Server (OpenAI-compatible)
llama-server -m phi3-text-to-sql-Q4_K_M.gguf -c 2048 -t 14 --port 8080

The model expects Phi-3 chat formatting; include the database schema in the user turn (see the adapter card for the exact prompt). It outputs raw SQLite.

License: MIT.

Downloads last month
289
GGUF
Model size
4B params
Architecture
phi3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bhuvandesai/phi3-text-to-sql-gguf

Space using Bhuvandesai/phi3-text-to-sql-gguf 1