Instructions to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF",
	filename="Qwen3.6-27B-LM-IQ2_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M
# Run inference directly in the terminal:
llama cli -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M
# Run inference directly in the terminal:
llama cli -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M
# Run inference directly in the terminal:
./llama-cli -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Use Docker

docker model run hf.co/magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

LM Studio
Jan

vLLM

How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Ollama
How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with Ollama:
```
ollama run hf.co/magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M
```

Unsloth Studio

How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF to start chatting

How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with Docker Model Runner:
```
docker model run hf.co/magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M
```

Lemonade

How to use magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF:IQ2_M

Run and chat with the model

lemonade run user.Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF-IQ2_M

List all available models

lemonade list

MagicQuant Hybrids (v2.0) - Qwen3.6-27B Uncensored (By llmfan46)

MagicQuant is a benchmark driven GGUF hybrid discovery and validation system focused on finding real, practical GGUF quants specific to each architecture.

Whether it's a pure baseline model built by llama.cpp, learned tensor configurations from Unsloth, or a custom built MagicQuant hybrid, the model table below shows quants that have won dominance checks, survived collapse spaces, and/or were found to be nonlinearly better. Instead of dumping every quant type possible, MagicQuant tests, validates, and brutally murders anything deemed unworthy.

MagicQuant Info & Wiki

MagicQuant is a project intended to become open source. Currently the full methodology is documented at the MagicQuant Wiki.

That wiki will eventually have the code uploaded and be renamed to just "MagicQuant" for the repo instead of "MagicQuant-Wiki". The code is still too much in the early prototype stage. It's beginning to mature, but it requires a heavy hand throughout the process.

Once I'm confident in the code, believe the methodology and protocols are mature enough that I won't be changing it weekly and bricking every setup after every update. I'm excited to share not just the entire methodology, but the entire code base to reproduce MagicQuant :)

Support MagicQuant

I’m a solo developer working full time for myself to achieve my dream. I build open source code on the side. If you like any of my work, buying me a coffee is always appreciated. Otherwise, I hope you enjoy, maybe give me a star or something. Or just send me good vibes. Either way, thank you!

Click here to see ways to support - BTC, Paypal, GitHub sponsors.

Clone Notice

This repository did not run through the full MagicQuant evolution/search pipeline. It is a clone of the final survivor tensor configurations from magiccodingman/Qwen3.6-27B-MagicQuant-GGUF, rebuilt and benchmarked locally for this model.

The archived MagicQuant JSON files in magicquant-manifest/ are copied from the source release for durability. The clone benchmark JSON and the table below are from this clone run, so those metrics reflect the rebuilt outputs in this repository.

Final survivors

Name	Provider	KLD	Size (GB)	Download
MQ-Q6_K_1	MagicQuant	0.003838	27.70	Link
MQ-Q6_K_3	MagicQuant	0.005859	24.10	Link
MQ-Q5_K_S_1	MagicQuant	0.006975	22.34	Link
MQ-Q5_K_S_2	MagicQuant	0.007902	21.31	Link
LM-Q5_K_S	llama.cpp	0.015510	19.16	Link
MQ-IQ4_NL_1	MagicQuant	0.017155	18.03	Link
LM-IQ4_NL	llama.cpp	0.025073	16.29	Link
LM-IQ4_XS	llama.cpp	0.028951	15.57	Link
MQ-IQ3_M_1	MagicQuant	0.044483	14.94	Link
LM-IQ3_S	llama.cpp	0.065563	12.91	Link
LM-IQ3_XXS	llama.cpp	0.098076	11.67	Link
LM-IQ2_M	llama.cpp	0.165493	10.49	Link
LM-IQ2_S	llama.cpp	0.212110	9.85	Link
LM-IQ2_XXS	llama.cpp	0.302324	8.92	Link

This model architecture had unusual anomaly detection occurrence. MagicQuant pipeline utilized this anomaly to achieve unusually better quants than normally achievable. Please read the wiki to understand what a quant anomaly is and how it's utilized.
MQ-Q6_K_2 removed even though referenced in manifest files. The Q6_K_2 strictly lost to Q6_K_1 meaning the cloning process didn't translate perfectly for that specific quantization pattern. Which is not a big deal but is why I removed it. No reason to have a quant that's just worse than another both in KLD and still larger in size.

MTP Support Notes

Please note that Q8_0 was used for the MTP tensors because as of the time this was added, llama.cpp imatrix and MTP tensor support isn't working fully. Therefore the manifest files won't perfectly represent the new tensors added. Not that it's a huge deal tbh.

This is a more unique situation due to MTP tensors stripped previously but re-added later.

Provider credits

llama.cpp — Baseline quantization formats and llama.cpp tooling.

Warning - Is MagicQuant Better? (hint: how you frame the question matters)

External/custom baselines are normalized into MagicQuant's controlled comparison flow. MagicQuant rebuilds a learned baseline under native-source / MagicQuant-controlled conditions, including its own imatrix handling, so hybrids or external baselines (like Unsloth) can be judged on a more equal footing. That does not mean MagicQuant proved the original upstream artifact or upstream imatrix was worse. These comparisons exist for internal hybrid-search consistency and equal playing field comparisons, not as a universal judgment of the original creator's exact release artifact.

Easier to digest explanation:

MagicQuant compares and benchmarks the models quant to tensor configurations, but not the original artifact. And there's different reasons MagicQuant chooses to lift up a winning quant, not all winners are purely "better". It depends heavily on a variety of factors. Though choices are always documented in the repo under the manifest folder. You can always view what and why decisions were made by the automated system.

So, MagicQuant can confidently tell you, "under the same quantization to tensor configurations and identical imatrix, with this benchmark, I deemed this a winner".

Re-Uploading External Provider Baselines

By default, if an external provider like Unsloth is deemed the winner, the repo should generally link directly to the original provider instead of re-hosting the quant. External GGUFs are normally only re-uploaded when a specific winning variant does not already exist (e.g. Heretic models or similar).

Release metadata

Final survivor metrics — full file names, KLD, PPL, PPL delta %, byte sizes, download targets, and replacement lineage. PPL delta % is measured against the native/reference PPL when available; negative is better and larger positive values are worse.
Hybrid tensor map — tensor-group assignments and effective-state details for MagicQuant hybrid GGUFs.
Clone tensor configs — exact per-GGUF tensor quantization maps for reproducing this final output list in repository clone mode.
Isolation samples — isolated base/group probe samples with KLD, PPL, PPL delta %, and size truth.
Bad trade details — structured bad-trade pruning decisions from the isolation optimizer.
Clone benchmark summary — fresh benchmark results from this clone run.
Replacement details — structured details for baselines or anchors removed from the final download table, including reason codes, KLD deltas, PPL delta %, and size deltas.

Replacement reason codes

STRICT_DOMINANCE — the winner was no larger and had lower real KLD than the removed anchor.
NEAR_BASELINE_PREMIUM — the winner used only the configured near-baseline size premium and beat the real linear KLD trade line.
INTERIOR_DISCOVERY — the winner was selected as a useful interior point inside a size/KLD gap between anchors.
SPACING_COLLAPSE — two candidates were too close in practical output space, so the stronger one was kept.
FINAL_DOMINANCE — a later validated survivor dominated this artifact in final real benchmark comparison.

Underlined names in the table replaced or ultimately inherited the replacement of another artifact. Hover the name for the short replacement summary, or inspect magicquant-manifest/magicquant.replacements.json for exact KLD/PPL/size deltas.

Downloads last month: 2,699

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

Model tree for magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF

Base model

Qwen/Qwen3.6-27B

Finetuned

llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved

Quantized

(28)

this model

Collection including magiccodingman/Qwen3.6-27B-Uncensored-MagicQuant-MTP-GGUF

Magic Quant

Collection

MagicQuant is a benchmark-driven GGUF evaluation and hybrid-discovery system. https://github.com/magiccodingman/MagicQuant-Wiki • 5 items • Updated May 26 • 33