Instructions to use Devbora29/Qwen_customQuant_gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Devbora29/Qwen_customQuant_gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Devbora29/Qwen_customQuant_gguf", filename="qwen2.5_3b_dpo_mix_alpha_05.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Devbora29/Qwen_customQuant_gguf with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Devbora29/Qwen_customQuant_gguf # Run inference directly in the terminal: llama cli -hf Devbora29/Qwen_customQuant_gguf
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Devbora29/Qwen_customQuant_gguf # Run inference directly in the terminal: llama cli -hf Devbora29/Qwen_customQuant_gguf
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Devbora29/Qwen_customQuant_gguf # Run inference directly in the terminal: ./llama-cli -hf Devbora29/Qwen_customQuant_gguf
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Devbora29/Qwen_customQuant_gguf # Run inference directly in the terminal: ./build/bin/llama-cli -hf Devbora29/Qwen_customQuant_gguf
Use Docker
docker model run hf.co/Devbora29/Qwen_customQuant_gguf
- LM Studio
- Jan
- Ollama
How to use Devbora29/Qwen_customQuant_gguf with Ollama:
ollama run hf.co/Devbora29/Qwen_customQuant_gguf
- Unsloth Studio
How to use Devbora29/Qwen_customQuant_gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Devbora29/Qwen_customQuant_gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Devbora29/Qwen_customQuant_gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Devbora29/Qwen_customQuant_gguf to start chatting
- Pi
How to use Devbora29/Qwen_customQuant_gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Devbora29/Qwen_customQuant_gguf
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Devbora29/Qwen_customQuant_gguf" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Devbora29/Qwen_customQuant_gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf Devbora29/Qwen_customQuant_gguf
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Devbora29/Qwen_customQuant_gguf
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Devbora29/Qwen_customQuant_gguf with Docker Model Runner:
docker model run hf.co/Devbora29/Qwen_customQuant_gguf
- Lemonade
How to use Devbora29/Qwen_customQuant_gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Devbora29/Qwen_customQuant_gguf
Run and chat with the model
lemonade run user.Qwen_customQuant_gguf-{{QUANT_TAG}}List all available models
lemonade list
| { | |
| "alpha": 0.5, | |
| "prompt": "Explain the theory of relativity", | |
| "layers": { | |
| "0": { | |
| "layer": 0, | |
| "bits": 16, | |
| "label": "INT16", | |
| "confidence": { | |
| "INT16": 0.9942, | |
| "INT8": 0.0016, | |
| "INT4": 0.0042 | |
| } | |
| }, | |
| "1": { | |
| "layer": 1, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.006, | |
| "INT8": 0.9939, | |
| "INT4": 0.0001 | |
| } | |
| }, | |
| "2": { | |
| "layer": 2, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0098, | |
| "INT8": 0.99, | |
| "INT4": 0.0002 | |
| } | |
| }, | |
| "3": { | |
| "layer": 3, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0012, | |
| "INT8": 0.9956, | |
| "INT4": 0.0033 | |
| } | |
| }, | |
| "4": { | |
| "layer": 4, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0009, | |
| "INT8": 0.4189, | |
| "INT4": 0.5802 | |
| } | |
| }, | |
| "5": { | |
| "layer": 5, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0044, | |
| "INT8": 0.7659, | |
| "INT4": 0.2297 | |
| } | |
| }, | |
| "6": { | |
| "layer": 6, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0079, | |
| "INT8": 0.9242, | |
| "INT4": 0.0679 | |
| } | |
| }, | |
| "7": { | |
| "layer": 7, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.004, | |
| "INT8": 0.9418, | |
| "INT4": 0.0541 | |
| } | |
| }, | |
| "8": { | |
| "layer": 8, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0056, | |
| "INT8": 0.9655, | |
| "INT4": 0.0289 | |
| } | |
| }, | |
| "9": { | |
| "layer": 9, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0054, | |
| "INT8": 0.9746, | |
| "INT4": 0.02 | |
| } | |
| }, | |
| "10": { | |
| "layer": 10, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0045, | |
| "INT8": 0.995, | |
| "INT4": 0.0005 | |
| } | |
| }, | |
| "11": { | |
| "layer": 11, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0039, | |
| "INT8": 0.9934, | |
| "INT4": 0.0027 | |
| } | |
| }, | |
| "12": { | |
| "layer": 12, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0076, | |
| "INT8": 0.9906, | |
| "INT4": 0.0018 | |
| } | |
| }, | |
| "13": { | |
| "layer": 13, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0051, | |
| "INT8": 0.937, | |
| "INT4": 0.058 | |
| } | |
| }, | |
| "14": { | |
| "layer": 14, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0138, | |
| "INT8": 0.9818, | |
| "INT4": 0.0044 | |
| } | |
| }, | |
| "15": { | |
| "layer": 15, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0031, | |
| "INT8": 0.988, | |
| "INT4": 0.0088 | |
| } | |
| }, | |
| "16": { | |
| "layer": 16, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0053, | |
| "INT8": 0.9618, | |
| "INT4": 0.0329 | |
| } | |
| }, | |
| "17": { | |
| "layer": 17, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0049, | |
| "INT8": 0.9833, | |
| "INT4": 0.0118 | |
| } | |
| }, | |
| "18": { | |
| "layer": 18, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0047, | |
| "INT8": 0.9641, | |
| "INT4": 0.0313 | |
| } | |
| }, | |
| "19": { | |
| "layer": 19, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0024, | |
| "INT8": 0.9709, | |
| "INT4": 0.0267 | |
| } | |
| }, | |
| "20": { | |
| "layer": 20, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0087, | |
| "INT8": 0.8718, | |
| "INT4": 0.1194 | |
| } | |
| }, | |
| "21": { | |
| "layer": 21, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0018, | |
| "INT8": 0.6375, | |
| "INT4": 0.3607 | |
| } | |
| }, | |
| "22": { | |
| "layer": 22, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0019, | |
| "INT8": 0.265, | |
| "INT4": 0.7331 | |
| } | |
| }, | |
| "23": { | |
| "layer": 23, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0024, | |
| "INT8": 0.6425, | |
| "INT4": 0.3551 | |
| } | |
| }, | |
| "24": { | |
| "layer": 24, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0003, | |
| "INT8": 0.0899, | |
| "INT4": 0.9098 | |
| } | |
| }, | |
| "25": { | |
| "layer": 25, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0007, | |
| "INT8": 0.667, | |
| "INT4": 0.3323 | |
| } | |
| }, | |
| "26": { | |
| "layer": 26, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0001, | |
| "INT8": 0.0951, | |
| "INT4": 0.9047 | |
| } | |
| }, | |
| "27": { | |
| "layer": 27, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0008, | |
| "INT8": 0.3785, | |
| "INT4": 0.6207 | |
| } | |
| }, | |
| "28": { | |
| "layer": 28, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0002, | |
| "INT8": 0.1475, | |
| "INT4": 0.8523 | |
| } | |
| }, | |
| "29": { | |
| "layer": 29, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0001, | |
| "INT8": 0.0329, | |
| "INT4": 0.9669 | |
| } | |
| }, | |
| "30": { | |
| "layer": 30, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0071, | |
| "INT8": 0.9925, | |
| "INT4": 0.0004 | |
| } | |
| }, | |
| "31": { | |
| "layer": 31, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0005, | |
| "INT8": 0.2283, | |
| "INT4": 0.7712 | |
| } | |
| }, | |
| "32": { | |
| "layer": 32, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0005, | |
| "INT8": 0.1044, | |
| "INT4": 0.8951 | |
| } | |
| }, | |
| "33": { | |
| "layer": 33, | |
| "bits": 8, | |
| "label": "INT8", | |
| "confidence": { | |
| "INT16": 0.0008, | |
| "INT8": 0.6213, | |
| "INT4": 0.3779 | |
| } | |
| }, | |
| "34": { | |
| "layer": 34, | |
| "bits": 4, | |
| "label": "INT4", | |
| "confidence": { | |
| "INT16": 0.0002, | |
| "INT8": 0.0204, | |
| "INT4": 0.9794 | |
| } | |
| }, | |
| "35": { | |
| "layer": 35, | |
| "bits": 16, | |
| "label": "INT16", | |
| "confidence": { | |
| "INT16": 0.9932, | |
| "INT8": 0.0045, | |
| "INT4": 0.0023 | |
| } | |
| } | |
| } | |
| } |