Instructions to use TheBloke/sqlcoder2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TheBloke/sqlcoder2-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TheBloke/sqlcoder2-GGUF")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("TheBloke/sqlcoder2-GGUF", dtype="auto")

llama-cpp-python

How to use TheBloke/sqlcoder2-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="TheBloke/sqlcoder2-GGUF",
	filename="sqlcoder2.Q2_K.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use TheBloke/sqlcoder2-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf TheBloke/sqlcoder2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf TheBloke/sqlcoder2-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf TheBloke/sqlcoder2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf TheBloke/sqlcoder2-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TheBloke/sqlcoder2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf TheBloke/sqlcoder2-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TheBloke/sqlcoder2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TheBloke/sqlcoder2-GGUF:Q4_K_M

Use Docker

docker model run hf.co/TheBloke/sqlcoder2-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use TheBloke/sqlcoder2-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TheBloke/sqlcoder2-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/sqlcoder2-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TheBloke/sqlcoder2-GGUF:Q4_K_M

SGLang

How to use TheBloke/sqlcoder2-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TheBloke/sqlcoder2-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/sqlcoder2-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TheBloke/sqlcoder2-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/sqlcoder2-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use TheBloke/sqlcoder2-GGUF with Ollama:
```
ollama run hf.co/TheBloke/sqlcoder2-GGUF:Q4_K_M
```

Unsloth Studio

How to use TheBloke/sqlcoder2-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TheBloke/sqlcoder2-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TheBloke/sqlcoder2-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TheBloke/sqlcoder2-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use TheBloke/sqlcoder2-GGUF with Docker Model Runner:
```
docker model run hf.co/TheBloke/sqlcoder2-GGUF:Q4_K_M
```

Lemonade

How to use TheBloke/sqlcoder2-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull TheBloke/sqlcoder2-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.sqlcoder2-GGUF-Q4_K_M

List all available models

lemonade list

Segmentation Fault on SqlCoder2 | ERROR: byte not found in vocab: '

by mvalente - opened Oct 12, 2023

Discussion

mvalente

Oct 12, 2023

Works on:
sqlcoder.Q5_K_M.gguf
sqlcoder.Q5_K_S.gguf

Segmentation fault on:
sqlcoder2.Q5_K_M.gguf
sqlcoder2.Q5_K_S.gguf

See screenshot. Let me know what kind of information you might need to debug this issue.

mvalente changed discussion title from Segmentation Fault on SqlCoder2 to Segmentation Fault on SqlCoder2 | RROR: byte not found in vocab: ' Oct 12, 2023

mvalente changed discussion title from Segmentation Fault on SqlCoder2 | RROR: byte not found in vocab: ' to Segmentation Fault on SqlCoder2 | ERROR: byte not found in vocab: ' Oct 12, 2023

atwoodjw

Oct 19, 2023

I'm seeing the same error loading sqlcoder2.Q4_K_M.gguf in text-generation-webui via llama.cpp model loader.

ERROR: byte not found in vocab: '
'
Segmentation fault (core dumped)

AayushShah

Oct 25, 2023

•

edited Oct 25, 2023

Exactly!!! Getting the same error on SqlCoder2.Q5_K_M.gguf and also Q5_0. I think we should just keep using the SQLCoder for now :)
Any hope for this @TheBloke ?

Thanks!

mvalente

Oct 26, 2023

@AayushShah What models have you been using for SQLGen? Do you know any benchmarks/blog/discussions on the efficiency of LLMs for SQLGen. I've been trying code llama to a moderate level of success.

Charlie33

Oct 26, 2023

what is the reason??? Failed to create LLM 'starcoder' from '/root/.cache/huggingface/hub/models--TheBloke--sqlcoder2-GGUF/blobs/b5e26875dc981af3ef803aef36a7f6da08d75e9ea5484a95d1bf2aa622ac3cb0'.

AayushShah

Oct 26, 2023

•

edited Oct 26, 2023

@mvalente
Yeah actually I had very high hopes for SQLCoder-2 and since it was not working I tried running it on A5000 GPU but still it wasn't good as I expected it.
As you have found, me too.
CodeLlama is literally understanding the instructions and giving good results with almost all times proper grammar (valid SQL).

So for now, I think codellama-7b is promising model for me.
Other models I have tried:

Zephyer: This is amazing model. It can handle amazing queries but it is not commercially usable and is general purpose so can't beat codellama as of now.
Wizard-Coder: It is good for small and simple queries but not as efficient as code llama
NumbersStation's 2B model for SQL: It seem great in the start, but it doesn't have the GGUF support. Need to test more for my usecase, still it is 2B model at most. But they have Llama-7B version too. You may check that out as well (probably the model isn't capable of understanding the instructions... but worth checking out)

I am expecting to test more models like:

Mistral
Llama-instruct (by together)

Let me know if you get any success with any model or other model, I am still figuring out.
Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment