DistillNeuCodec ONNX Encoder

ONNX export of the DistillNeuCodec encoder for lightweight voice cloning inference.

Model Description

This is an ONNX-optimized encoder that produces speech codes compatible with the NeuTTS voice cloning pipeline. The encoder extracts acoustic and semantic features from reference audio to enable zero-shot voice cloning.

Verification

This ONNX export achieves 100% identical output codes compared to the original PyTorch model across all tested audio files:

Test File Duration Codes Match
dave.wav 7.45s 373 βœ“ 100%
jo.wav 13.06s 654 βœ“ 100%
nellie.wav 7.33s 367 βœ“ 100%

Usage

import numpy as np
import soundfile as sf
import onnxruntime

# Load model
sess = onnxruntime.InferenceSession("onnx/distill_neucodec_encoder.onnx")

# Load audio (must be 16kHz)
audio, sr = sf.read("reference.wav")
assert sr == 16000, f"Audio must be 16kHz, got {sr}Hz"

# IMPORTANT: Pre-pad to multiple of 320 samples
T = len(audio)
pad_for_wav = 320 - (T % 320)
audio = np.pad(audio, (0, pad_for_wav))

# Run inference
audio_input = audio[np.newaxis, np.newaxis, :].astype(np.float32)
codes = sess.run(None, {"audio": audio_input})[0].flatten().tolist()

print(f"Generated {len(codes)} codes")

Input/Output Specification

Name Shape Type Description
Input: audio [1, 1, T] float32 16kHz audio, T must be divisible by 320
Output: codes [1, 1, F] int32 Speech codes, F β‰ˆ T/320

Pre-padding Requirement

⚠️ Important: Input audio length must be padded to a multiple of 320 samples before inference:

T = len(audio)
pad_for_wav = 320 - (T % 320)
audio = np.pad(audio, (0, pad_for_wav))

This matches the behavior of the original PyTorch model's _prepare_audio() function.

Files

onnx/
β”œβ”€β”€ distill_neucodec_encoder.onnx       # ONNX model
└── distill_neucodec_encoder.onnx.data  # External weights

Requirements

onnxruntime>=1.16.0
soundfile
numpy

License

Apache 2.0 - same as the base model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for KevinAHM/distill-neucodec-onnx

Quantized
(1)
this model

Datasets used to train KevinAHM/distill-neucodec-onnx

Collection including KevinAHM/distill-neucodec-onnx