duckdb-nsql-7b-mlx-8bit

This repository contains an MLX-optimized 8-bit quantized variant of motherduckdb/DuckDB-NSQL-7B-v0.1, intended for fast, memory-efficient inference on Apple Silicon (M1/M2/M3/M4).

Model description

DuckDB-NSQL-7B is a 7B parameter language model fine-tuned to translate natural language questions into DuckDB SQL. The 8-bit MLX conversion reduces RAM/VRAM usage significantly versus FP16 while typically preserving near-FP16 quality for most NL→SQL workloads.

Conversion details

  • Base model: motherduckdb/DuckDB-NSQL-7B-v0.1 (fine-tuned from Llama 2 7B)
  • Format: MLX
  • Precision: 8-bit quantized
  • Typical memory footprint: ~7–8 GB (varies by MLX quantization / runtime)
  • Recommended for: production + dev on 16–32 GB Macs (best quality/speed balance)

Installation

pip install mlx-lm

Usage

Python

from mlx_lm import load, generate

model, tokenizer = load("Nuxera/duckdb-nsql-7b-mlx-8bit")

schema = """
CREATE TABLE hospitals (
  hospital_id BIGINT,
  hospital_name VARCHAR,
  region VARCHAR,
  bed_capacity INTEGER
);

CREATE TABLE encounters (
  encounter_id BIGINT,
  hospital_id BIGINT,
  encounter_datetime TIMESTAMP,
  encounter_type VARCHAR
);
"""

question = "For each hospital region, how many encounters happened this month?"

prompt = f"""You are an assistant that writes valid DuckDB SQL queries.

### Schema:
{schema}

### Question:
{question}

### Response (DuckDB SQL only):"""

out = generate(model, tokenizer, prompt=prompt, max_tokens=256, temp=0.0)
print(out)

Run as a local server

mlx_lm.server --model Nuxera/duckdb-nsql-7b-mlx-8bit --port 8080
curl -X POST http://localhost:8080/v1/completions   -H "Content-Type: application/json"   -d '{
    "prompt": "### Schema:\nCREATE TABLE patients(...);\n\n### Question:\nCount patients by region\n\n### Response (DuckDB SQL only):",
    "max_tokens": 200,
    "temperature": 0
  }'

Prompt format

This model works best when you provide:

  1. Clear schema (tables + columns)
  2. One question
  3. Explicit instruction to output SQL only

Example:

You are an assistant that writes valid DuckDB SQL queries.

### Schema:
CREATE TABLE ...

### Question:
...

### Response (DuckDB SQL only):

Known limitations

  • Optimized for DuckDB SQL; other dialects may require edits.
  • Very complex joins / nested queries may occasionally need post-processing.
  • Like most text-to-SQL models, ambiguous questions can yield multiple “valid” SQL interpretations.

License

This model inherits the Llama 2 license from the base model.

Citation

@misc{nuxera_duckdb_nsql_mlx_8bit,
  title={DuckDB-NSQL-7B MLX 8-bit Quantized Conversion},
  author={Nuxera AI},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Nuxera/duckdb-nsql-7b-mlx-8bit}}
}

Base model:

@misc{duckdb_nsql,
  title={DuckDB-NSQL-7B: Natural Language to SQL for DuckDB},
  author={MotherDuck},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/motherduckdb/DuckDB-NSQL-7B-v0.1}}
}

Acknowledgments

Downloads last month
8
Safetensors
Model size
7B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nuxera/duckdb-nsql-7b-mlx-8bit

Quantized
(3)
this model