Instructions to use TheBloke/Magicoder-S-DS-6.7B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TheBloke/Magicoder-S-DS-6.7B-GGUF")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TheBloke/Magicoder-S-DS-6.7B-GGUF", dtype="auto") - llama-cpp-python
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="TheBloke/Magicoder-S-DS-6.7B-GGUF", filename="magicoder-s-ds-6.7b.Q2_K.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TheBloke/Magicoder-S-DS-6.7B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/Magicoder-S-DS-6.7B-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
- SGLang
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TheBloke/Magicoder-S-DS-6.7B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/Magicoder-S-DS-6.7B-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TheBloke/Magicoder-S-DS-6.7B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/Magicoder-S-DS-6.7B-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Ollama:
ollama run hf.co/TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
- Unsloth Studio
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TheBloke/Magicoder-S-DS-6.7B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TheBloke/Magicoder-S-DS-6.7B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for TheBloke/Magicoder-S-DS-6.7B-GGUF to start chatting
- Docker Model Runner
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Docker Model Runner:
docker model run hf.co/TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
- Lemonade
How to use TheBloke/Magicoder-S-DS-6.7B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull TheBloke/Magicoder-S-DS-6.7B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Magicoder-S-DS-6.7B-GGUF-Q4_K_M
List all available models
lemonade list
failed to load the model
Hi @TheBloke , thank you for making the gguf model. I got an error when loading this model with llama.cpp. Do you have any suggestion?
error loading model: unordered_map::at
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'magicoder-s-ds-6.7b.Q5_K_M.gguf'
main: error: unable to load model
Hi @TheBloke I am also having the same problem while using the from llama_cpp import Llama same with LM studio also
not able to load the model
Same issue unfortunately. By the way, let me take this opportunity to say thanks @TheBloke for your incredible work converting and uploading these models! It's been so valuable to always find the latest models ready to download and deploy on my local machine, typically within days after the original model was launched. Thank you!!
yesterday when I did quantization of this model I had the same issue
I think the issue is related to vocab size mismatch.
Similar to.
https://github.com/ggerganov/llama.cpp/issues/3900
When i did conversion, i had same message with the 16bit immediately after the convert.py. Further quantizing to 4 bit gave the vocab warning.
@aniljava , try this PR https://github.com/ggerganov/llama.cpp/pull/3633 maybe it work for you, I have tried as well but didn't success.
he said here:
https://discord.com/channels/1111983596572520458/1181468647441563728/1182011875068747847
Let me know if there's any GGUF issues as I need to use a PR to make this due to lack of tokenizer.model support
So it might be out of his reach for the time being.
It's better to ask the ogrinal model's author instead of downstream.
I removed my "like" since its not working on any app.
We had had this problem with deepseek models before and didnt work then and they dont work now.
Just downloaded Q4_K_M and Q3_K_M and also get an error:
error loading model: invalid unordered_map<K, T> key
on both gguf files.
Thanks anyway for publishing quantized models for small GPU!!!
Yeah sorry, the models seem to be unusable at the moment. I will see if I can fix it, otherwise I'll pull it for now
OK I have re-done the quants and they will now work with a specific fork of llama.cpp. This PR is not yet merged and is currently on hold, so there's no immediate indication when it will me merged.
Fork: https://github.com/DOGEwbx/llama.cpp/tree/regex_gpt2_preprocess
They will not work with mainline llama.cpp, and they will not work with any third-party GGUF clients, like llama-cpp-python, LM Studio, text-generation-webui, etc
The files are left for anyone who has the interest to compile llama.cpp for themselves and I can put a note in the README to this effect.
However I might still pull it, as the output is so far not usable:
ᐅ ./main -m /workspace/process/ise-uiuc_magicoder-s-ds-6.7b/gguf/magicoder-s-ds-6.7b.Q4_K_M.gguf -n -1 -ngl 100 -p "You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.
@@ Instruction
write a Python FastAPI server which accepts List[int], sorts the provided numbers, and returns a tuple containing: (highest, lowest, average, List[int] of the numbers sorted asc )
@@ Response
"
You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.
@@ Instruction
write a Python FastAPI server which accepts List[int], sorts the provided numbers, and returns a tuple containing: (highest, lowest, average, List[int] of the numbers sorted asc )
@@ Response
Here is how you can solve it using python's FastAPI for creating an API server. We will use FastAPI to create the web server which includes endpoints that sorts the given integer list and returns highest number, lowest number, average, median, mode of the list in a JSON format. It also provides the sorted list as well:
```python
# install required packages using pip
pip install fastapi uv
$ pip install fastapi
$ uvscikitly and statistics for calculating modes
```python
from typing import List
from fastapi import FastAPI, Depends
from fastapi.middleware.responses
import statistics
from pydantic import BaseModel, Body
from starlette.requests: FastAPI
app = FastAPI()
from typing import List
from fastapi import HTTPException, status
from fastapi import FastApi, Response, Depends, Query
# for the calculation of mode in Python API
def calculate_mode(numbers: List[int]= Body):
return {
"highest": numbers},
import statistics as sta t.typing from starlette.middleware import FastAPI, HTTPException
from fastapi import FastAPI, Request, Uv
from fastapi import Depends
from typing import Optional
from pydantic import BaseModel
from fastapi.responsesponsese FastAPIErrorHTTPException, JSONResponse
from fastapi import FastAPI, Path
from statistics
from starlette import Request
import uviemyaml APIRouter and path operation:
from pydanticornado.middleware import HTTPException 40.t ai.
class NumberModel for mode in Pythonication with thestarlette
import statistics as Statistics
from fastapi import FastAPI, Depends, Path
from starlette , Uvit
from typing import List[int]
def sort
numbers
@app
def calculate_mean(statistics) is to use Python's Starlette:
```python
import uvit.validator import Depends,HTTPException
....
I terminated it there as it was infinitely generating. But even before that, there was no usable answer, it's mostly just gibberish
I thought it was a finetune of DeepSeek-Coder-6.7b-base, so I'm surprised it's so wonky. Maybe I misinterpreted the architecture. Hopefully we get llama update that makes this run coherently. Good open source coding models, especially at this small size, are super useful!
Could anyone help me test my quantized results?
https://huggingface.co/mzwing/Magicoder-S-DS-6.7B-GGUF
I just use the master branch of llama.cpp. Looks like some PRs merged later fix this issue.