Instructions to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="magiccodingman/Qwen3.6-27B-MagicQuant-GGUF", filename="Qwen3.6-27B-LM-IQ2_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M # Run inference directly in the terminal: llama-cli -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M # Run inference directly in the terminal: llama-cli -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M # Run inference directly in the terminal: ./llama-cli -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
Use Docker
docker model run hf.co/magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
- LM Studio
- Jan
- vLLM
How to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "magiccodingman/Qwen3.6-27B-MagicQuant-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "magiccodingman/Qwen3.6-27B-MagicQuant-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
- Ollama
How to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with Ollama:
ollama run hf.co/magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
- Unsloth Studio new
How to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for magiccodingman/Qwen3.6-27B-MagicQuant-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for magiccodingman/Qwen3.6-27B-MagicQuant-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for magiccodingman/Qwen3.6-27B-MagicQuant-GGUF to start chatting
- Pi new
How to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
Run Hermes
hermes
- Docker Model Runner
How to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with Docker Model Runner:
docker model run hf.co/magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
- Lemonade
How to use magiccodingman/Qwen3.6-27B-MagicQuant-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:IQ2_M
Run and chat with the model
lemonade run user.Qwen3.6-27B-MagicQuant-GGUF-IQ2_M
List all available models
lemonade list
| license: apache-2.0 | |
| tags: | |
| - gguf | |
| - text-generation | |
| - magicquant | |
| - conversational | |
| base_model: | |
| - Qwen/Qwen3.6-27B | |
| # MagicQuant Hybrids (v2.0) - Qwen3.6-27B | |
| MagicQuant is a benchmark driven GGUF hybrid discovery and validation system focused on finding real, practical GGUF quants specific to each architecture. | |
| Whether it's a pure baseline model built by llama.cpp, learned tensor configurations from Unsloth, or a custom built MagicQuant hybrid, the model table below shows quants that have won dominance checks, survived collapse spaces, and/or were found to be nonlinearly better. Instead of dumping every quant type possible, MagicQuant tests, validates, and brutally murders anything deemed unworthy. | |
| <details> | |
| <summary>Support MagicQuant</summary> | |
| I’m a solo developer working full time for myself to achieve my dream. I build open source code on the side. If you like any of my work, buying me a coffee is always appreciated. Otherwise, I hope you enjoy, maybe give me a star or something. Or just send me good vibes. Either way, thank you! | |
| [Click here to see ways to support](https://sayou.biz/support) - BTC, Paypal, GitHub sponsors. | |
| </details> | |
| **I will update the model with MTP when it's supported in main.** | |
| --- | |
| ## Final survivors | |
| | Name | Provider | KLD | Size (GB) | Download | | |
| | -------------------------------------------------------------- | ---------- | -----------: | --------: | ----------------------------------------------------------------------- | | |
| | ~~LM-Q8_0~~ | ~~llama.cpp~~ | ~~0.003768~~ | ~~28.60~~ | | | |
| | [<u>MQ-Q6_K_1</u>](#winner-notes "Replaced: MQ-Q8_0, LM-Q8_0") | MagicQuant | 0.002845 | 27.25 | [Link](./../../resolve/main/Qwen3.6-27B-MQ-Q6_K_1.gguf?download=true) | | |
| | MQ-Q6_K_2 | MagicQuant | 0.003884 | 25.23 | [Link](./../../resolve/main/Qwen3.6-27B-MQ-Q6_K_2.gguf?download=true) | | |
| | MQ-Q6_K_3 | MagicQuant | 0.004914 | 23.66 | [Link](./../../resolve/main/Qwen3.6-27B-MQ-Q6_K_3.gguf?download=true) | | |
| | ~~LM-Q6_K~~ | ~~llama.cpp~~ | ~~0.007249~~ | ~~22.08~~ | | | |
| | [<u>MQ-Q5_K_S_1</u>](#winner-notes "Replaced: LM-Q6_K") | MagicQuant | 0.006477 | 21.90 | [Link](./../../resolve/main/Qwen3.6-27B-MQ-Q5_K_S_1.gguf?download=true) | | |
| | MQ-Q5_K_S_2 | MagicQuant | 0.007617 | 20.86 | [Link](./../../resolve/main/Qwen3.6-27B-MQ-Q5_K_S_2.gguf?download=true) | | |
| | LM-Q5_K_S | llama.cpp | 0.010790 | 18.68 | [Link](./../../resolve/main/Qwen3.6-27B-LM-Q5_K_S.gguf?download=true) | | |
| | ~~UD-Q4_K_XL~~ | ~~Unsloth~~ | ~~0.023521~~ | ~~17.61~~ | | | |
| | [<u>MQ-IQ4_NL_1</u>](#winner-notes "Replaced: UD-Q4_K_XL") | MagicQuant | 0.019687 | 17.59 | [Link](./../../resolve/main/Qwen3.6-27B-MQ-IQ4_NL_1.gguf?download=true) | | |
| | LM-IQ4_NL | llama.cpp | 0.025714 | 15.80 | [Link](./../../resolve/main/Qwen3.6-27B-LM-IQ4_NL.gguf?download=true) | | |
| | LM-IQ4_XS | llama.cpp | 0.027015 | 15.08 | [Link](./../../resolve/main/Qwen3.6-27B-LM-IQ4_XS.gguf?download=true) | | |
| | [<u>MQ-IQ3_M_1</u>](#winner-notes "Replaced: UD-Q3_K_XL") | MagicQuant | 0.043802 | 14.49 | [Link](./../../resolve/main/Qwen3.6-27B-MQ-IQ3_M_1.gguf?download=true) | | |
| | [<u>LM-IQ3_S</u>](#winner-notes "Replaced: LM-IQ3_XS") | llama.cpp | 0.064393 | 12.42 | [Link](./../../resolve/main/Qwen3.6-27B-LM-IQ3_S.gguf?download=true) | | |
| | [<u>LM-IQ3_XXS</u>](#winner-notes "Replaced: UD-IQ2_M") | llama.cpp | 0.093578 | 11.19 | [Link](./../../resolve/main/Qwen3.6-27B-LM-IQ3_XXS.gguf?download=true) | | |
| | LM-IQ2_M | llama.cpp | 0.163117 | 10.00 | [Link](./../../resolve/main/Qwen3.6-27B-LM-IQ2_M.gguf?download=true) | | |
| | [<u>LM-IQ2_S</u>](#winner-notes "Replaced: LM-IQ2_XS") | llama.cpp | 0.210251 | 9.36 | [Link](./../../resolve/main/Qwen3.6-27B-LM-IQ2_S.gguf?download=true) | | |
| | LM-IQ2_XXS | llama.cpp | 0.302597 | 8.43 | [Link](./../../resolve/main/Qwen3.6-27B-LM-IQ2_XXS.gguf?download=true) | | |
| * Crossed out models are showcased for reference. | |
| * This model architecture had unusual anomaly detection occurrence. MagicQuant pipeline utilized this anomaly to achieve unusually better quants than normally achievable. Please read the wiki to understand what a quant anomaly is and how it's utilized. | |
| <details> | |
| <summary>Provider credits</summary> | |
| - [llama.cpp](https://github.com/ggml-org/llama.cpp) — Baseline quantization formats and llama.cpp tooling. | |
| - [Unsloth](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) — External learned baseline source (UD). | |
| </details> | |
| <details> | |
| <summary>Warning - Is MagicQuant Better? (hint: how you frame the question matters)</summary> | |
| External/custom baselines are normalized into MagicQuant's controlled comparison flow. MagicQuant rebuilds a learned baseline under native-source / MagicQuant-controlled conditions, including its own imatrix handling, so hybrids or external baselines (like Unsloth) can be judged on a more equal footing. That does **not** mean MagicQuant proved the original upstream artifact or upstream imatrix was worse. These comparisons exist for internal hybrid-search consistency and equal playing field comparisons, not as a universal judgment of the original creator's exact release artifact. | |
| **Easier to digest explanation:** | |
| MagicQuant compares and benchmarks the models quant to tensor configurations, but not the original artifact. And there's different reasons MagicQuant chooses to lift up a winning quant, not all winners are purely "better". It depends heavily on a variety of factors. Though choices are always documented in the repo under the manifest folder. You can always view what and why decisions were made by the automated system. | |
| So, MagicQuant can confidently tell you, "under the same quantization to tensor configurations and identical imatrix, with this benchmark, I deemed this a winner". | |
| </details> | |
| <details> | |
| <summary>Re-Uploading External Provider Baselines</summary> | |
| By default, if an external provider like Unsloth is deemed the winner, the repo should generally link directly to the original provider instead of re-hosting the quant. External GGUFs are normally only re-uploaded when a specific winning variant does not already exist (e.g. Heretic models or similar). | |
| </details> | |
| --- | |
| ## Release metadata | |
| - [Final survivor metrics](./../../resolve/main/magicquant-manifest/magicquant.final-survivors.json) — full file names, KLD, PPL, PPL delta %, byte sizes, download targets, and replacement lineage. PPL delta % is measured against the native/reference PPL when available; negative is better and larger positive values are worse. | |
| - [Hybrid tensor map](./../../resolve/main/magicquant-manifest/magicquant.hybrid-map.json) — tensor-group assignments and effective-state details for MagicQuant hybrid GGUFs. | |
| - [Clone tensor configs](./../../resolve/main/magicquant-manifest/magicquant.clone-configs.json) — exact per-GGUF tensor quantization maps for reproducing this final output list in repository clone mode. | |
| - [Isolation samples](./../../resolve/main/magicquant-manifest/magicquant.isolation-samples.json) — isolated base/group probe samples with KLD, PPL, PPL delta %, and size truth. | |
| - [Bad trade details](./../../resolve/main/magicquant-manifest/magicquant.bad-trades.json) — structured bad-trade pruning decisions from the isolation optimizer. | |
| - [Replacement details](./../../resolve/main/magicquant-manifest/magicquant.replacements.json) — structured details for baselines or anchors removed from the final download table, including reason codes, KLD deltas, PPL delta %, and size deltas. | |
| <details> | |
| <summary>Replacement reason codes</summary> | |
| - `STRICT_DOMINANCE` — the winner was no larger and had lower real KLD than the removed anchor. | |
| - `NEAR_BASELINE_PREMIUM` — the winner used only the configured near-baseline size premium and beat the real linear KLD trade line. | |
| - `INTERIOR_DISCOVERY` — the winner was selected as a useful interior point inside a size/KLD gap between anchors. | |
| - `SPACING_COLLAPSE` — two candidates were too close in practical output space, so the stronger one was kept. | |
| - `FINAL_DOMINANCE` — a later validated survivor dominated this artifact in final real benchmark comparison. | |
| <a id="winner-notes"></a> | |
| Underlined names in the table replaced or ultimately inherited the replacement of another artifact. Hover the name for the short replacement summary, or inspect `magicquant-manifest/magicquant.replacements.json` for exact KLD/PPL/size deltas. | |
| </details> | |
| --- | |