FormantNet β ONNX
ONNX exports of the FormantNet neural formant tracker
(PaPE 2021 / IS 2021), with fp32, fp16, and int8 dynamic-quantization variants.
The exported model is the LSTM1_noIAIF_DFLoss configuration trained on TIMIT:
experiment mvt33_f6z1sTpF10 (6 formants + 1 antiformant, delta-frequency loss weight 0.15).
Architecture
Input (batch, time, 257) β normalized log-spectral envelope, 32 ms / 16 kHz window
LSTM 512 units, unidirectional, return_sequences=True
Dense 20 units, sigmoid
Output (batch, time, 20) β raw sigmoid [0, 1]
Total parameters: 1,587,220 (6.05 MB fp32).
Output layout: F1β¦F6, Fz1 (frequencies) Β· B1β¦B6, Bz1 (bandwidths) Β· A1β¦A6 (amplitudes)
β all as raw sigmoid values in [0, 1]; rescale with the repo's get_rescale_fn() to obtain Hz / dB.
Files
| File | Precision | Size | max_abs vs fp32 |
|---|---|---|---|
formantnet.onnx |
fp32 | 6.36 MB | 0 (reference) |
formantnet_fp16.onnx |
fp16 | 3.18 MB | 4.1 Γ 10β»β΄ |
formantnet_int8.onnx |
int8 (dynamic) | 1.61 MB | 9.2 Γ 10β»Β² |
Usage
import numpy as np
import onnxruntime as ort
sess = ort.InferenceSession("formantnet.onnx", providers=["CPUExecutionProvider"])
# x: float32 array of shape (batch, time, 257)
# β normalized spectral envelopes (subtract training mean, divide by std)
x = np.random.randn(1, 200, 257).astype(np.float32)
raw_params = sess.run(None, {"input": x})[0] # (1, 200, 20), values in [0, 1]
Pre-processing (windowing, FFT, envelope smoothing, normalization) and post-processing (rescaling to Hz/dB, formant sorting, binomial smoothing) are not included in the ONNX graph β use the scripts from the original FormantNet repository for those steps.
Conversion
convert_to_onnx.pyβ reconstructs the Keras model, loads the TF checkpoint, exports to ONNX (opset 15)quantize_onnx.pyβ generates fp16 and int8 variants with parity checksvalidate_onnx.pyβ shape, range, and numeric equivalence validation
Requires: tensorflow-macos 2.13, tf2onnx 1.17, onnxruntime, onnxconverter-common.
Citation
@inproceedings{sakamoto2021formantnet,
title = {Neural Formant Tracking},
author = {Sakamoto, Yuki and others},
booktitle = {Interspeech 2021},
year = {2021}
}