How to use from
llama.cppInstall from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tharun66/mistral-sql-gguf# Run inference directly in the terminal:
llama-cli -hf tharun66/mistral-sql-ggufInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tharun66/mistral-sql-gguf# Run inference directly in the terminal:
llama-cli -hf tharun66/mistral-sql-ggufUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf tharun66/mistral-sql-gguf# Run inference directly in the terminal:
./llama-cli -hf tharun66/mistral-sql-ggufBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf tharun66/mistral-sql-gguf# Run inference directly in the terminal:
./build/bin/llama-cli -hf tharun66/mistral-sql-ggufUse Docker
docker model run hf.co/tharun66/mistral-sql-ggufQuick Links
Mistral-7B SQL GGUF
A GGUF-quantized version of Mistral-7B fine-tuned for SQL query generation. Optimized for CPU inference with clean SQL outputs.
Model Details
- Base Model: Mistral-7B-Instruct-v0.3
- Quantization: Q8_0
- Context Length: 32768 tokens (default from base model)
- Format: GGUF (V3 latest)
- Size: 7.17 GB
- Parameters: 7.25B
- Architecture: Llama
- Use Case: Text to SQL conversion
Usage
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
# Download and setup
model_path = hf_hub_download(
repo_id="tharun66/mistral-sql-gguf",
filename="mistral_sql_q4.gguf"
)
# Initialize model
llm = Llama(
model_path=model_path,
n_ctx=512,
n_threads=4,
verbose=False
)
def generate_sql(question):
prompt = f"""### Task: Convert to SQL
### Question: {question}
### SQL:"""
response = llm(
prompt,
max_tokens=128,
temperature=0.7,
stop=["system", "user", "assistant", "###"],
echo=False
)
return response['choices'][0]['text'].strip()
# Example
question = "Show all active users"
sql = generate_sql(question)
print(sql)
# Output: SELECT * FROM users WHERE status = 'active'
- Downloads last month
- -
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
# Gated model: Login with a HF token with gated access permission hf auth login