Higgs Audio v3 TTS 4-bit ONNX Vocoder
This repository contains an ONNX export of the Higgs Audio v3 bundled codec decoder/vocoder:
audio_codes [B, 8, T] int64 -> audio_values [B, 1, samples] float32
It is a browser-deployment component for Higgs Audio TTS, not the complete text-to-speech pipeline. The autoregressive Qwen3 text-to-codebook generator is still a separate custom SGLang graph and is not included here.
Files
higgs_audio_v3_vocoder_decode.onnx+.onnx.data: fp32 ONNX vocoder.higgs_audio_v3_vocoder_decode_matmul4.onnx+.onnx.data: ONNX Runtime weight-only 4-bit quantization for supportedMatMulandGathernodes.vocoder_onnx_export_report.json: PyTorch vs ONNX Runtime validation.vocoder_onnx_matmul4_report.json: 4-bit ONNX Runtime validation.
Validation
The fp32 vocoder export passed ONNX checker and ONNX Runtime CPU inference:
- max absolute diff vs PyTorch:
0.0007368475 - mean absolute diff vs PyTorch:
0.0000280040 - mean relative diff vs PyTorch:
0.0053399284
The 4-bit ONNX Runtime artifact replaced:
Gather:8->GatherBlockQuantized: 8MatMul:9->MatMulNBits: 9
and passed ONNX Runtime CPU inference against the fp32 ONNX graph.
Limitation
This is not a full in-browser TTS model yet. A complete browser version still needs the autoregressive Higgs/Qwen3 text-to-codebook generator exported to an ONNX Runtime Web compatible graph and a JavaScript sampling loop for the 8-codebook delayed-token generation.
- Downloads last month
- -
Model tree for Reza2kn/Higgs-Audio-v3-TTS-4bit-ONNX
Base model
bosonai/higgs-audio-v3-tts-4b