DistillNeuCodec ONNX Encoder

ONNX export of the DistillNeuCodec encoder for lightweight voice cloning inference.

Model Description

This is an ONNX-optimized encoder that produces speech codes compatible with the NeuTTS voice cloning pipeline. The encoder extracts acoustic and semantic features from reference audio to enable zero-shot voice cloning.

Verification

This ONNX export achieves 100% identical output codes compared to the original PyTorch model across all tested audio files:

Test File	Duration	Codes	Match
`dave.wav`	7.45s	373	✓ 100%
`jo.wav`	13.06s	654	✓ 100%
`nellie.wav`	7.33s	367	✓ 100%

Usage

import numpy as np
import soundfile as sf
import onnxruntime

# Load model
sess = onnxruntime.InferenceSession("onnx/distill_neucodec_encoder.onnx")

# Load audio (must be 16kHz)
audio, sr = sf.read("reference.wav")
assert sr == 16000, f"Audio must be 16kHz, got {sr}Hz"

# IMPORTANT: Pre-pad to multiple of 320 samples
T = len(audio)
pad_for_wav = 320 - (T % 320)
audio = np.pad(audio, (0, pad_for_wav))

# Run inference
audio_input = audio[np.newaxis, np.newaxis, :].astype(np.float32)
codes = sess.run(None, {"audio": audio_input})[0].flatten().tolist()

print(f"Generated {len(codes)} codes")

Input/Output Specification

Name	Shape	Type	Description
Input: `audio`	`[1, 1, T]`	float32	16kHz audio, T must be divisible by 320
Output: `codes`	`[1, 1, F]`	int32	Speech codes, F ≈ T/320

Pre-padding Requirement

⚠️ Important: Input audio length must be padded to a multiple of 320 samples before inference:

T = len(audio)
pad_for_wav = 320 - (T % 320)
audio = np.pad(audio, (0, pad_for_wav))

This matches the behavior of the original PyTorch model's _prepare_audio() function.

Files

onnx/
├── distill_neucodec_encoder.onnx       # ONNX model
└── distill_neucodec_encoder.onnx.data  # External weights

Requirements

onnxruntime>=1.16.0
soundfile
numpy

License

Apache 2.0 - same as the base model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for KevinAHM/distill-neucodec-onnx

Base model

neuphonic/distill-neucodec

Quantized

(1)

this model

Datasets used to train KevinAHM/distill-neucodec-onnx

Collection including KevinAHM/distill-neucodec-onnx

ONNX Exports

Collection

5 items • Updated Jan 19 • 1