Instructions to use TheBloke/Magicoder-S-DS-6.7B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TheBloke/Magicoder-S-DS-6.7B-GGUF")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("TheBloke/Magicoder-S-DS-6.7B-GGUF", dtype="auto")

llama-cpp-python

How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="TheBloke/Magicoder-S-DS-6.7B-GGUF",
	filename="magicoder-s-ds-6.7b.Q2_K.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TheBloke/Magicoder-S-DS-6.7B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/Magicoder-S-DS-6.7B-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M

SGLang

How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TheBloke/Magicoder-S-DS-6.7B-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/Magicoder-S-DS-6.7B-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TheBloke/Magicoder-S-DS-6.7B-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/Magicoder-S-DS-6.7B-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Ollama:
```
ollama run hf.co/TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
```

Unsloth Studio

How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TheBloke/Magicoder-S-DS-6.7B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TheBloke/Magicoder-S-DS-6.7B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TheBloke/Magicoder-S-DS-6.7B-GGUF to start chatting

Docker Model Runner
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Docker Model Runner:
```
docker model run hf.co/TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
```

Lemonade

How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Magicoder-S-DS-6.7B-GGUF-Q4_K_M

List all available models

lemonade list

failed to load the model

by rinoa - opened Dec 6, 2023

Discussion

rinoa

Dec 6, 2023

•

edited Dec 6, 2023

Hi @TheBloke , thank you for making the gguf model. I got an error when loading this model with llama.cpp. Do you have any suggestion?

error loading model: unordered_map::at
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'magicoder-s-ds-6.7b.Q5_K_M.gguf'
main: error: unable to load model

suraj007

Dec 6, 2023

•

edited Dec 6, 2023

Hi @TheBloke I am also having the same problem while using the from llama_cpp import Llama same with LM studio also
not able to load the model

steefvw

Dec 6, 2023

Same here but it looks like @TheBloke just reuploaded the model so trying again now.

steefvw

Dec 6, 2023

Same issue unfortunately. By the way, let me take this opportunity to say thanks @TheBloke for your incredible work converting and uploading these models! It's been so valuable to always find the latest models ready to download and deploy on my local machine, typically within days after the original model was launched. Thank you!!

suraj007

Dec 6, 2023

@steefvw any luck, seems he updated readme only

eramax

Dec 7, 2023

yesterday when I did quantization of this model I had the same issue

aniljava

Dec 7, 2023

I think the issue is related to vocab size mismatch.
Similar to.
https://github.com/ggerganov/llama.cpp/issues/3900

When i did conversion, i had same message with the 16bit immediately after the convert.py. Further quantizing to 4 bit gave the vocab warning.

eramax

Dec 7, 2023

@aniljava , try this PR https://github.com/ggerganov/llama.cpp/pull/3633 maybe it work for you, I have tried as well but didn't success.

aniljava

Dec 7, 2023

I tried #3663 briefly before giving up and waiting for @TheBloke . Also played a bit manually chaning the vocab size. No luck.

hf-delta

Dec 7, 2023

he said here:
https://discord.com/channels/1111983596572520458/1181468647441563728/1182011875068747847

Let me know if there's any GGUF issues as I need to use a PR to make this due to lack of tokenizer.model support

So it might be out of his reach for the time being.

It's better to ask the ogrinal model's author instead of downstream.

Pumba2

Dec 7, 2023

•

edited Dec 8, 2023

I removed my "like" since its not working on any app.
We had had this problem with deepseek models before and didnt work then and they dont work now.

AlfredWALLACE

Dec 7, 2023

Just downloaded Q4_K_M and Q3_K_M and also get an error:

error loading model: invalid unordered_map<K, T> key

on both gguf files.
Thanks anyway for publishing quantized models for small GPU!!!

deleted

Dec 7, 2023

Most of the time in cases like this it needs to be addressed upstream. Bloke cant fix their stuff....

TheBloke

Owner Dec 7, 2023

Yeah sorry, the models seem to be unusable at the moment. I will see if I can fix it, otherwise I'll pull it for now

TheBloke

Owner Dec 7, 2023

•

edited Dec 9, 2023

OK I have re-done the quants and they will now work with a specific fork of llama.cpp. This PR is not yet merged and is currently on hold, so there's no immediate indication when it will me merged.

Fork: https://github.com/DOGEwbx/llama.cpp/tree/regex_gpt2_preprocess

They will not work with mainline llama.cpp, and they will not work with any third-party GGUF clients, like llama-cpp-python, LM Studio, text-generation-webui, etc

The files are left for anyone who has the interest to compile llama.cpp for themselves and I can put a note in the README to this effect.

However I might still pull it, as the output is so far not usable:

ᐅ ./main -m /workspace/process/ise-uiuc_magicoder-s-ds-6.7b/gguf/magicoder-s-ds-6.7b.Q4_K_M.gguf -n -1 -ngl 100 -p "You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
write a Python FastAPI server which accepts List[int], sorts the provided numbers, and returns a tuple containing: (highest, lowest, average, List[int] of the numbers sorted asc )

@@ Response
"

You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.

@@ Instruction
write a Python FastAPI server which accepts List[int], sorts the provided numbers, and returns a tuple containing: (highest, lowest, average, List[int] of the numbers sorted asc )

@@ Response
Here is how you can solve it using python's FastAPI for creating an API server. We will use FastAPI to create the web server which includes endpoints that sorts the given integer list and returns highest number, lowest number, average, median, mode of the list in a JSON format. It also provides the sorted list as well:
```python

# install required packages using pip
pip install fastapi uv


$ pip install fastapi
 $ uvscikitly and statistics for calculating modes



```python
from typing import List
from fastapi import FastAPI, Depends
 from fastapi.middleware.responses
import statistics

from pydantic import BaseModel, Body
from starlette.requests: FastAPI
app = FastAPI()
from typing import List
from fastapi import HTTPException, status
from fastapi import FastApi, Response, Depends, Query
# for the calculation of mode in Python API
def calculate_mode(numbers: List[int]= Body):
    return {
        "highest": numbers},
import statistics as sta t.typing from starlette.middleware import FastAPI, HTTPException
from fastapi import FastAPI, Request, Uv
from fastapi import Depends
from typing import Optional
from pydantic import BaseModel
from fastapi.responsesponsese FastAPIErrorHTTPException, JSONResponse
from fastapi import FastAPI, Path
from statistics
from starlette import Request
import uviemyaml APIRouter and path operation:
from pydanticornado.middleware import HTTPException 40.t ai.

class NumberModel for mode in Pythonication with thestarlette

import statistics as Statistics

from fastapi import FastAPI, Depends, Path
from starlette , Uvit
from typing import List[int]
def sort

numbers

@app 

def calculate_mean(statistics) is to use Python's Starlette:

```python
import uvit.validator import Depends,HTTPException

....

I terminated it there as it was infinitely generating. But even before that, there was no usable answer, it's mostly just gibberish

YearZero

Dec 8, 2023

I thought it was a finetune of DeepSeek-Coder-6.7b-base, so I'm surprised it's so wonky. Maybe I misinterpreted the architecture. Hopefully we get llama update that makes this run coherently. Good open source coding models, especially at this small size, are super useful!

mzwing

Mar 30, 2024

Could anyone help me test my quantized results?

https://huggingface.co/mzwing/Magicoder-S-DS-6.7B-GGUF

I just use the master branch of llama.cpp. Looks like some PRs merged later fix this issue.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment