Qwen3-14B-Instruct – DirectML INT4 (ONNX Runtime)

This repository provides Qwen3-14B-Instruct converted to INT4 ONNX and optimized for DirectML using Microsoft Olive and ONNX Runtime GenAI.

It is designed for native Windows GPU inference (Intel Arc, AMD RDNA, NVIDIA RTX) without CUDA and without running a Python server.
Ideal for integration in C# / .NET applications using ONNX Runtime + DirectML.


Model Details

  • Base model: OpenPipe/Qwen3-14B-Instruct
  • Quantization: INT4 (MatMul NBits)
  • Format: ONNX
  • Runtime: ONNX Runtime with DmlExecutionProvider
  • Conversion toolchain: Microsoft Olive + onnxruntime-genai
  • Target hardware:
    • Intel Arc (A770, A750, 130V, etc.)
    • AMD RDNA2 / RDNA3
    • NVIDIA RTX (via DirectML)

Files

Main inference files:

  • model.onnx
  • model.onnx.data ← INT4 weights (β‰ˆ 9 GB)
  • genai_config.json
  • tokenizer.json, vocab.json, merges.txt
  • chat_template.jinja

Usage in C# (DirectML)

Example (ONNX Runtime GenAI):

using Microsoft.ML.OnnxRuntimeGenAI;

var modelPath = @"Qwen3-14B-Instruct-DirectML-INT4";

using var model = Model.Load(modelPath, new ModelOptions
{
    ExecutionProvider = ExecutionProvider.DirectML
});

using var tokenizer = new Tokenizer(model);
var tokens = tokenizer.Encode("Explain what a Dutch mortgage deed is.");

using var generator = new Generator(model, new GeneratorParams
{
    MaxLength = 1024,
    Temperature = 0.7f
});

generator.AppendTokens(tokens);
generator.Generate();

string output = tokenizer.Decode(generator.GetSequence(0));
Console.WriteLine(output);
Prompt Format
This model supports standard chat-style prompts and works well with Hermes-style system prompts and tool calling.

The included chat_template.jinja can be used to format multi-role conversations.

Performance Notes
INT4 allows the 14B model to run on:

16 GB VRAM GPUs (Arc 130V, RTX 3060, RX 6800)

Throughput depends heavily on DirectML backend and driver quality.

First token latency may be high due to graph compilation.

License & Attribution
Base model:

Qwen3-14B-Instruct by Alibaba / OpenPipe

License: see original model card

Conversion:

ONNX + INT4 quantization performed by Wekkel using Microsoft Olive.

This is an independent community conversion.

No affiliation with Alibaba or Qwen team.
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support