| --- |
| license: mit |
| library_name: onnx |
| pipeline_tag: audio-classification |
| tags: |
| - onnx |
| - audio |
| - speech |
| - phonetics |
| - formants |
| - formant-tracking |
| - lstm |
| - quantized |
| - int8 |
| - fp16 |
| - acoustic-phonetics |
| language: |
| - en |
| base_model: ysakamoto/FormantNet |
| inference: false |
| --- |
| |
| # FormantNet — ONNX |
|
|
| ONNX exports of the [FormantNet](https://github.com/NemoursResearch/FormantNet) neural formant tracker |
| (PaPE 2021 / IS 2021), with `fp32`, `fp16`, and `int8` dynamic-quantization variants. |
|
|
| The exported model is the **LSTM1_noIAIF_DFLoss** configuration trained on TIMIT: |
| experiment `mvt33_f6z1sTpF10` (6 formants + 1 antiformant, delta-frequency loss weight 0.15). |
|
|
| ## Architecture |
|
|
| ``` |
| Input (batch, time, 257) — normalized log-spectral envelope, 32 ms / 16 kHz window |
| LSTM 512 units, unidirectional, return_sequences=True |
| Dense 20 units, sigmoid |
| Output (batch, time, 20) — raw sigmoid [0, 1] |
| ``` |
|
|
| Total parameters: **1,587,220** (6.05 MB fp32). |
|
|
| Output layout: F1…F6, Fz1 (frequencies) · B1…B6, Bz1 (bandwidths) · A1…A6 (amplitudes) |
| — all as raw sigmoid values in [0, 1]; rescale with the repo's `get_rescale_fn()` to obtain Hz / dB. |
|
|
| ## Files |
|
|
| | File | Precision | Size | max\_abs vs fp32 | |
| |------|-----------|------|-----------------| |
| | `formantnet.onnx` | fp32 | 6.36 MB | 0 (reference) | |
| | `formantnet_fp16.onnx` | fp16 | 3.18 MB | 4.1 × 10⁻⁴ | |
| | `formantnet_int8.onnx` | int8 (dynamic) | 1.61 MB | 9.2 × 10⁻² | |
|
|
| ## Usage |
|
|
| ```python |
| import numpy as np |
| import onnxruntime as ort |
| |
| sess = ort.InferenceSession("formantnet.onnx", providers=["CPUExecutionProvider"]) |
| |
| # x: float32 array of shape (batch, time, 257) |
| # — normalized spectral envelopes (subtract training mean, divide by std) |
| x = np.random.randn(1, 200, 257).astype(np.float32) |
| raw_params = sess.run(None, {"input": x})[0] # (1, 200, 20), values in [0, 1] |
| ``` |
|
|
| Pre-processing (windowing, FFT, envelope smoothing, normalization) and post-processing |
| (rescaling to Hz/dB, formant sorting, binomial smoothing) are **not** included in the ONNX |
| graph — use the scripts from the original FormantNet repository for those steps. |
|
|
| ## Conversion |
|
|
| - `convert_to_onnx.py` — reconstructs the Keras model, loads the TF checkpoint, exports to ONNX (opset 15) |
| - `quantize_onnx.py` — generates fp16 and int8 variants with parity checks |
| - `validate_onnx.py` — shape, range, and numeric equivalence validation |
|
|
| Requires: `tensorflow-macos 2.13`, `tf2onnx 1.17`, `onnxruntime`, `onnxconverter-common`. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{sakamoto2021formantnet, |
| title = {Neural Formant Tracking}, |
| author = {Sakamoto, Yuki and others}, |
| booktitle = {Interspeech 2021}, |
| year = {2021} |
| } |
| ``` |
|
|