--- base_model: - Qwen/Qwen3-Coder-Next base_model_relation: quantized pipeline_tag: text-generation tags: - quantized - hardware-optimized - qwen3_next - tensordyne license: apache-2.0 --- ## 📝 Overview Tensordyne builds advanced [AI-inference systems](https://www.tensordyne.ai/inference-system), enabling faster, more affordable, and sustainable generative AI. This repository provides resources to quickly get started with **[Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next)** on the **Tensordyne Inference System and its SDK**. ## 🧩 Model Details - **Quantization:** post-training quantization of the base model, no fine-tuning or additional training was performed - **Supported data types:** Tensordyne FP16 (tFP16), Tensordyne FP8 (tFP8), mixed-precision ## ⚙️ Quantization The Tensordyne SDK offers multiple post-training quantization strategies to convert AI models for efficient inference on the Tensordyne Inference System — fully customizable for your optimization targets. We showcase several preselected quantization variants that can be applied on-the-fly to quantize to Tensordyne data types here. The calibration-based strategies are defined by quantization configurations provided as `.json`. The quantized models are evaluated on 10% of the [WikiText-2 raw v1](https://huggingface.co/datasets/Salesforce/wikitext) test set. Negative relative perplexity drops indicate that the model performs better than the float base model. | Model Configuration | Absolute Perplexity | Relative Perplexity Drop vs. BF16 | Details | |----------------------------------|---------------------|-----------------------------------|-------------------------------------------------------------| | BF16 | 6.351 | – | The baseline model trained in BF16 | | layerwise_mixed_precision | 6.365 | 0.23 % | calibration-based mixed-precision: tFP8, outliers in tFP16 | | calibration_based_tFP8 | 6.498 | 2.33 % | calibration-based tFP8 quantization | ## 🚀 Getting Started Refer to the Tensordyne Hugging Face Hub tutorial in our [hosted documentation](https://resources.tensordyne.ai/sdk/) for instructions on using the artifacts provided in this repository. The documentation provides more information on Tensordyne's quantization strategies and introduces you to our SDK.