How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:
# Run inference directly in the terminal:
llama-cli -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:
# Run inference directly in the terminal:
llama-cli -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:
Use Docker
docker model run hf.co/magiccodingman/Qwen3.6-27B-MagicQuant-GGUF:
Quick Links

MagicQuant Hybrids (v2.0) - Qwen3.6-27B

MagicQuant is a benchmark driven GGUF hybrid discovery and validation system focused on finding real, practical GGUF quants specific to each architecture.

Whether it's a pure baseline model built by llama.cpp, learned tensor configurations from Unsloth, or a custom built MagicQuant hybrid, the model table below shows quants that have won dominance checks, survived collapse spaces, and/or were found to be nonlinearly better. Instead of dumping every quant type possible, MagicQuant tests, validates, and brutally murders anything deemed unworthy.

Support MagicQuant

I’m a solo developer working full time for myself to achieve my dream. I build open source code on the side. If you like any of my work, buying me a coffee is always appreciated. Otherwise, I hope you enjoy, maybe give me a star or something. Or just send me good vibes. Either way, thank you!

Click here to see ways to support - BTC, Paypal, GitHub sponsors.

I will update the model with MTP when it's supported in main.


Final survivors

Name Provider KLD Size (GB) Download
LM-Q8_0 llama.cpp 0.003768 28.60
MQ-Q6_K_1 MagicQuant 0.002845 27.25 Link
MQ-Q6_K_2 MagicQuant 0.003884 25.23 Link
MQ-Q6_K_3 MagicQuant 0.004914 23.66 Link
LM-Q6_K llama.cpp 0.007249 22.08
MQ-Q5_K_S_1 MagicQuant 0.006477 21.90 Link
MQ-Q5_K_S_2 MagicQuant 0.007617 20.86 Link
LM-Q5_K_S llama.cpp 0.010790 18.68 Link
UD-Q4_K_XL Unsloth 0.023521 17.61
MQ-IQ4_NL_1 MagicQuant 0.019687 17.59 Link
LM-IQ4_NL llama.cpp 0.025714 15.80 Link
LM-IQ4_XS llama.cpp 0.027015 15.08 Link
MQ-IQ3_M_1 MagicQuant 0.043802 14.49 Link
LM-IQ3_S llama.cpp 0.064393 12.42 Link
LM-IQ3_XXS llama.cpp 0.093578 11.19 Link
LM-IQ2_M llama.cpp 0.163117 10.00 Link
LM-IQ2_S llama.cpp 0.210251 9.36 Link
LM-IQ2_XXS llama.cpp 0.302597 8.43 Link
  • Crossed out models are showcased for reference.
  • This model architecture had unusual anomaly detection occurrence. MagicQuant pipeline utilized this anomaly to achieve unusually better quants than normally achievable. Please read the wiki to understand what a quant anomaly is and how it's utilized.
Provider credits
  • llama.cpp — Baseline quantization formats and llama.cpp tooling.
  • Unsloth — External learned baseline source (UD).
Warning - Is MagicQuant Better? (hint: how you frame the question matters)

External/custom baselines are normalized into MagicQuant's controlled comparison flow. MagicQuant rebuilds a learned baseline under native-source / MagicQuant-controlled conditions, including its own imatrix handling, so hybrids or external baselines (like Unsloth) can be judged on a more equal footing. That does not mean MagicQuant proved the original upstream artifact or upstream imatrix was worse. These comparisons exist for internal hybrid-search consistency and equal playing field comparisons, not as a universal judgment of the original creator's exact release artifact.

Easier to digest explanation:

MagicQuant compares and benchmarks the models quant to tensor configurations, but not the original artifact. And there's different reasons MagicQuant chooses to lift up a winning quant, not all winners are purely "better". It depends heavily on a variety of factors. Though choices are always documented in the repo under the manifest folder. You can always view what and why decisions were made by the automated system.

So, MagicQuant can confidently tell you, "under the same quantization to tensor configurations and identical imatrix, with this benchmark, I deemed this a winner".

Re-Uploading External Provider Baselines

By default, if an external provider like Unsloth is deemed the winner, the repo should generally link directly to the original provider instead of re-hosting the quant. External GGUFs are normally only re-uploaded when a specific winning variant does not already exist (e.g. Heretic models or similar).


Release metadata

  • Final survivor metrics — full file names, KLD, PPL, PPL delta %, byte sizes, download targets, and replacement lineage. PPL delta % is measured against the native/reference PPL when available; negative is better and larger positive values are worse.
  • Hybrid tensor map — tensor-group assignments and effective-state details for MagicQuant hybrid GGUFs.
  • Clone tensor configs — exact per-GGUF tensor quantization maps for reproducing this final output list in repository clone mode.
  • Isolation samples — isolated base/group probe samples with KLD, PPL, PPL delta %, and size truth.
  • Bad trade details — structured bad-trade pruning decisions from the isolation optimizer.
  • Replacement details — structured details for baselines or anchors removed from the final download table, including reason codes, KLD deltas, PPL delta %, and size deltas.
Replacement reason codes
  • STRICT_DOMINANCE — the winner was no larger and had lower real KLD than the removed anchor.
  • NEAR_BASELINE_PREMIUM — the winner used only the configured near-baseline size premium and beat the real linear KLD trade line.
  • INTERIOR_DISCOVERY — the winner was selected as a useful interior point inside a size/KLD gap between anchors.
  • SPACING_COLLAPSE — two candidates were too close in practical output space, so the stronger one was kept.
  • FINAL_DOMINANCE — a later validated survivor dominated this artifact in final real benchmark comparison.

Underlined names in the table replaced or ultimately inherited the replacement of another artifact. Hover the name for the short replacement summary, or inspect magicquant-manifest/magicquant.replacements.json for exact KLD/PPL/size deltas.


Downloads last month
6,901
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for magiccodingman/Qwen3.6-27B-MagicQuant-GGUF

Base model

Qwen/Qwen3.6-27B
Quantized
(297)
this model

Collection including magiccodingman/Qwen3.6-27B-MagicQuant-GGUF