FredrikKarlssonSpeech
/

FormantNet

Audio Classification

formant-tracking

acoustic-phonetics

Model card Files Files and versions

FormantNet / README.md

FredrikKarlssonSpeech's picture

FredrikKarlssonSpeech

Update README.md

5877b89 verified 21 days ago

|

history blame contribute delete

2.71 kB

	---
	license: mit
	library_name: onnx
	pipeline_tag: audio-classification
	tags:
	- onnx
	- audio
	- speech
	- phonetics
	- formants
	- formant-tracking
	- lstm
	- quantized
	- int8
	- fp16
	- acoustic-phonetics
	language:
	- en
	base_model: ysakamoto/FormantNet
	inference: false
	---

	# FormantNet — ONNX

	ONNX exports of the [FormantNet](https://github.com/NemoursResearch/FormantNet) neural formant tracker
	(PaPE 2021 / IS 2021), with `fp32`, `fp16`, and `int8` dynamic-quantization variants.

	The exported model is the LSTM1_noIAIF_DFLoss configuration trained on TIMIT:
	experiment `mvt33_f6z1sTpF10` (6 formants + 1 antiformant, delta-frequency loss weight 0.15).

	## Architecture

	```
	Input (batch, time, 257) — normalized log-spectral envelope, 32 ms / 16 kHz window
	LSTM 512 units, unidirectional, return_sequences=True
	Dense 20 units, sigmoid
	Output (batch, time, 20) — raw sigmoid [0, 1]
	```

	Total parameters: 1,587,220 (6.05 MB fp32).

	Output layout: F1…F6, Fz1 (frequencies) · B1…B6, Bz1 (bandwidths) · A1…A6 (amplitudes)
	— all as raw sigmoid values in [0, 1]; rescale with the repo's `get_rescale_fn()` to obtain Hz / dB.

	## Files

	\| File \| Precision \| Size \| max\_abs vs fp32 \|
	\|------\|-----------\|------\|-----------------\|
	\| `formantnet.onnx` \| fp32 \| 6.36 MB \| 0 (reference) \|
	\| `formantnet_fp16.onnx` \| fp16 \| 3.18 MB \| 4.1 × 10⁻⁴ \|
	\| `formantnet_int8.onnx` \| int8 (dynamic) \| 1.61 MB \| 9.2 × 10⁻² \|

	## Usage

	```python
	import numpy as np
	import onnxruntime as ort

	sess = ort.InferenceSession("formantnet.onnx", providers=["CPUExecutionProvider"])

	# x: float32 array of shape (batch, time, 257)
	# — normalized spectral envelopes (subtract training mean, divide by std)
	x = np.random.randn(1, 200, 257).astype(np.float32)
	raw_params = sess.run(None, {"input": x})[0] # (1, 200, 20), values in [0, 1]
	```

	Pre-processing (windowing, FFT, envelope smoothing, normalization) and post-processing
	(rescaling to Hz/dB, formant sorting, binomial smoothing) are not included in the ONNX
	graph — use the scripts from the original FormantNet repository for those steps.

	## Conversion

	- `convert_to_onnx.py` — reconstructs the Keras model, loads the TF checkpoint, exports to ONNX (opset 15)
	- `quantize_onnx.py` — generates fp16 and int8 variants with parity checks
	- `validate_onnx.py` — shape, range, and numeric equivalence validation

	Requires: `tensorflow-macos 2.13`, `tf2onnx 1.17`, `onnxruntime`, `onnxconverter-common`.

	## Citation

	```bibtex
	@inproceedings{sakamoto2021formantnet,
	title = {Neural Formant Tracking},
	author = {Sakamoto, Yuki and others},
	booktitle = {Interspeech 2021},
	year = {2021}
	}
	```