FormantNet / README.md
FredrikKarlssonSpeech's picture
Update README.md
5877b89 verified
metadata
license: mit
library_name: onnx
pipeline_tag: audio-classification
tags:
  - onnx
  - audio
  - speech
  - phonetics
  - formants
  - formant-tracking
  - lstm
  - quantized
  - int8
  - fp16
  - acoustic-phonetics
language:
  - en
base_model: ysakamoto/FormantNet
inference: false

FormantNet — ONNX

ONNX exports of the FormantNet neural formant tracker (PaPE 2021 / IS 2021), with fp32, fp16, and int8 dynamic-quantization variants.

The exported model is the LSTM1_noIAIF_DFLoss configuration trained on TIMIT: experiment mvt33_f6z1sTpF10 (6 formants + 1 antiformant, delta-frequency loss weight 0.15).

Architecture

Input  (batch, time, 257)   — normalized log-spectral envelope, 32 ms / 16 kHz window
LSTM   512 units, unidirectional, return_sequences=True
Dense  20 units, sigmoid
Output (batch, time, 20)    — raw sigmoid [0, 1]

Total parameters: 1,587,220 (6.05 MB fp32).

Output layout: F1…F6, Fz1 (frequencies) · B1…B6, Bz1 (bandwidths) · A1…A6 (amplitudes) — all as raw sigmoid values in [0, 1]; rescale with the repo's get_rescale_fn() to obtain Hz / dB.

Files

File Precision Size max_abs vs fp32
formantnet.onnx fp32 6.36 MB 0 (reference)
formantnet_fp16.onnx fp16 3.18 MB 4.1 × 10⁻⁴
formantnet_int8.onnx int8 (dynamic) 1.61 MB 9.2 × 10⁻²

Usage

import numpy as np
import onnxruntime as ort

sess = ort.InferenceSession("formantnet.onnx", providers=["CPUExecutionProvider"])

# x: float32 array of shape (batch, time, 257)
# — normalized spectral envelopes (subtract training mean, divide by std)
x = np.random.randn(1, 200, 257).astype(np.float32)
raw_params = sess.run(None, {"input": x})[0]   # (1, 200, 20), values in [0, 1]

Pre-processing (windowing, FFT, envelope smoothing, normalization) and post-processing (rescaling to Hz/dB, formant sorting, binomial smoothing) are not included in the ONNX graph — use the scripts from the original FormantNet repository for those steps.

Conversion

  • convert_to_onnx.py — reconstructs the Keras model, loads the TF checkpoint, exports to ONNX (opset 15)
  • quantize_onnx.py — generates fp16 and int8 variants with parity checks
  • validate_onnx.py — shape, range, and numeric equivalence validation

Requires: tensorflow-macos 2.13, tf2onnx 1.17, onnxruntime, onnxconverter-common.

Citation

@inproceedings{sakamoto2021formantnet,
  title     = {Neural Formant Tracking},
  author    = {Sakamoto, Yuki and others},
  booktitle = {Interspeech 2021},
  year      = {2021}
}