---
license: mit
library_name: onnx
pipeline_tag: audio-classification
tags:
  - onnx
  - audio
  - speech
  - phonetics
  - formants
  - formant-tracking
  - lstm
  - quantized
  - int8
  - fp16
  - acoustic-phonetics
language:
  - en
base_model: ysakamoto/FormantNet
inference: false
---

# FormantNet — ONNX

ONNX exports of the [FormantNet](https://github.com/NemoursResearch/FormantNet) neural formant tracker
(PaPE 2021 / IS 2021), with `fp32`, `fp16`, and `int8` dynamic-quantization variants.

The exported model is the **LSTM1_noIAIF_DFLoss** configuration trained on TIMIT:
experiment `mvt33_f6z1sTpF10` (6 formants + 1 antiformant, delta-frequency loss weight 0.15).

## Architecture

```
Input  (batch, time, 257)   — normalized log-spectral envelope, 32 ms / 16 kHz window
LSTM   512 units, unidirectional, return_sequences=True
Dense  20 units, sigmoid
Output (batch, time, 20)    — raw sigmoid [0, 1]
```

Total parameters: **1,587,220** (6.05 MB fp32).

Output layout: F1…F6, Fz1 (frequencies) · B1…B6, Bz1 (bandwidths) · A1…A6 (amplitudes)
— all as raw sigmoid values in [0, 1]; rescale with the repo's `get_rescale_fn()` to obtain Hz / dB.

## Files

| File | Precision | Size | max\_abs vs fp32 |
|------|-----------|------|-----------------|
| `formantnet.onnx` | fp32 | 6.36 MB | 0 (reference) |
| `formantnet_fp16.onnx` | fp16 | 3.18 MB | 4.1 × 10⁻⁴ |
| `formantnet_int8.onnx` | int8 (dynamic) | 1.61 MB | 9.2 × 10⁻² |

## Usage

```python
import numpy as np
import onnxruntime as ort

sess = ort.InferenceSession("formantnet.onnx", providers=["CPUExecutionProvider"])

# x: float32 array of shape (batch, time, 257)
# — normalized spectral envelopes (subtract training mean, divide by std)
x = np.random.randn(1, 200, 257).astype(np.float32)
raw_params = sess.run(None, {"input": x})[0]   # (1, 200, 20), values in [0, 1]
```

Pre-processing (windowing, FFT, envelope smoothing, normalization) and post-processing
(rescaling to Hz/dB, formant sorting, binomial smoothing) are **not** included in the ONNX
graph — use the scripts from the original FormantNet repository for those steps.

## Conversion

- `convert_to_onnx.py` — reconstructs the Keras model, loads the TF checkpoint, exports to ONNX (opset 15)
- `quantize_onnx.py` — generates fp16 and int8 variants with parity checks
- `validate_onnx.py` — shape, range, and numeric equivalence validation

Requires: `tensorflow-macos 2.13`, `tf2onnx 1.17`, `onnxruntime`, `onnxconverter-common`.

## Citation

```bibtex
@inproceedings{sakamoto2021formantnet,
  title     = {Neural Formant Tracking},
  author    = {Sakamoto, Yuki and others},
  booktitle = {Interspeech 2021},
  year      = {2021}
}
```