Soprano ONNX (KV Cache)

This repository hosts ONNX exports of the Soprano 1.1 80M model with KV caching.

Contents

  • onnx/soprano_backbone_kv_fp32.onnx, soprano_backbone_kv_fp16.onnx, soprano_backbone_kv_int8.onnx (backbone with past_key_values)
  • onnx/soprano_decoder_fp32.onnx + onnx/soprano_decoder_fp32.onnx.data (vocoder decoder)
  • onnx/soprano_decoder_int8.onnx (vocoder decoder)
  • / (tokenizer assets)

Inference & demo

See the streaming inference code here: https://github.com/KevinAHM/soprano-web-onnx

Not compatible with WebGPU via onnxruntime-web as of January 2026.

Upstream

Original project: https://github.com/ekwek1/soprano

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KevinAHM/soprano-1.1-onnx

Quantized
(1)
this model

Space using KevinAHM/soprano-1.1-onnx 1

Collection including KevinAHM/soprano-1.1-onnx