Qwen3-Coder-Next-UD-TQ3_0 (GGUF)
This repository contains the TQ3_0 quantized version of the Qwen3-Coder-Next model, specifically optimized for the latest NVIDIA hardware.
π Model Highlights
- Quantization Method: TurboQuant (TQ3_0) β Fine-tuned for superior intelligence retention.
- Target Bitrate: 3.25 bpw (Bits Per Weight) β Strategic sweet spot between 3-bit and 4-bit quantization.
- Hardware Used: Quantized on a dedicated NVIDIA GeForce RTX 5090.
- Optimization: Built using a custom-patched
llama.cpp(llama-turbo) to support the high-efficiency TQ3_0 algorithm.
π οΈ Quantization Details
The TQ3_0 format utilizes advanced Lloyd-Max quantization and Walsh-Hadamard Transform (WHT) to minimize information loss. This specific version has been calibrated to 3.25 bpw, offering a balanced sweet spot between 3-bit and 4-bit quantization.
- BPW (Bits Per Weight): 3.25
- Size: Approximately 30.4 GB (ideally suited for 32GB VRAM GPUs like the RTX 5090)
- Efficiency: Balanced for ultra-fast throughput while maintaining high-level coding logic.
π» How to Use
To run this model, you need a compatible inference engine that supports TurboQuant.
Using llama-server (Example)
./llama-server \
-m Qwen3-Coder-Next-UD-TQ3_0.gguf \
-ngl 99 \
-c 32768 \
--port 8080
- Downloads last month
- 567
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for edwardyoon79/Qwen3-Coder-Next-TQ3_0
Base model
Qwen/Qwen3-Coder-Next