add export script

221475f 22 days ago

4.64 kB

	---
	license: apache-2.0
	library_name: c
	base_model: FireRedTeam/FireRedVAD
	tags:
	- voice-activity-detection
	- vad
	- audio-event-detection
	- aed
	- streaming
	- dfsmn
	- c
	- embedded
	language:
	- multilingual
	---

	# FireRedVAD-C — FRVD weights for the pure-C inference engine

	Pre-converted weights for running
	[FireRedTeam/FireRedVAD](https://huggingface.co/FireRedTeam/FireRedVAD)
	on the zero-dependency C inference engine used by `mod_fireredvad`
	(FreeSWITCH module) and `fireredvad-dart` (Flutter package).

	The PyTorch checkpoints ship as `model.pth.tar` files and require
	torch + kaldi at inference time. This repo strips them down to a single
	flat float32 blob plus a JSON CMVN file, suitable for embedding in C,
	Dart, or any runtime that just wants `fread()` + matmul.

	## Files

	\| File \| Size \| Description \|
	\| --- \| --- \| --- \|
	\| `fireredvad.bin` \| 4.41 MB \| FRVD weights — VAD + AED, LE float32 \|
	\| `fireredvad.json` \| 3.2 KB \| CMVN stats (`means`, `inv_std`) — 80 bins \|
	\| `export_frvd.py` \| — \| Reproducible export script (PyTorch → FRVD) \|

	## Source models

	- VAD:
	[FireRedTeam/FireRedVAD/Stream-VAD](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/Stream-VAD)
	— streaming-trained DFSMN, no lookahead used at inference (causal).
	- AED:
	[FireRedTeam/FireRedVAD/AED](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/AED)
	— non-streaming DFSMN with lookahead, 3-class (speech / music / noise).
	- CMVN: kaldi `cmvn.ark` from the same upstream repo, converted to JSON.

	## Architecture

	DFSMN with shared topology for VAD and AED:

	\| \| VAD (Stream-VAD) \| AED \|
	\| --- \| --- \| --- \|
	\| Input dim (mel bins) \| 80 \| 80 \|
	\| Hidden \| 256 \| 256 \|
	\| Projection \| 128 \| 128 \|
	\| FSMN blocks (R) \| 8 \| 8 \|
	\| Lookback order (N1) \| 20 \| 20 \|
	\| Lookahead order (N2) \| 20 (skipped at inference) \| 20 \|
	\| Output classes \| 1 (sigmoid) \| 3 (softmax) \|
	\| Parameters \| 567,937 \| 588,931 \|

	## FRVD binary format

	```text
	offset size field
	0 4 bytes magic = "FRVD"
	4 uint32 little-endian version = 1
	8 float32[] VAD weights (see fireredvad.h::VadWeights)
	... float32[] AED weights (see fireredvad.h::AedWeights)
	```

	VAD layout (in read order):

	- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`
	- `inp_fc2_w[256*128]`, `inp_fc2_b[128]`
	- `fsmn0_lookback[128*20]`
	- 7 × `{fc1_w[128256], fc1_b[256], fc2_w[256128], lookback[128*20]}`
	- `out_fc1_w[128*256]`, `out_fc1_b[256]`
	- `out_fc2_w[256*1]`, `out_fc2_b[1]`

	AED layout adds lookahead at every FSMN site and uses 3-class output:

	- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`,
	`inp_fc2_w[256*128]`, `inp_fc2_b[128]`
	- `fsmn0_lookback[12820]`, `fsmn0_lookahead[12820]`
	- 7 × `{fc1_w, fc1_b, fc2_w, lookback, lookahead}`
	- `out_fc1_w[128*256]`, `out_fc1_b[256]`
	- `out_fc2_w[256*3]`, `out_fc2_b[3]`

	Linear weights are stored row-major as `[in, out]` (PyTorch's
	`Linear.weight` transposed). Depthwise Conv1d filters are stored as
	`[P, K]`.

	## Usage

	### Download

	```python
	from huggingface_hub import hf_hub_download

	bin_path = hf_hub_download(
	repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.bin"
	)
	json_path = hf_hub_download(
	repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.json"
	)
	```

	Or with the CLI:

	```bash
	huggingface-cli download eschmidbauer/fireredvad-c --local-dir models/
	```

	### C (FreeSWITCH module)

	[`mod_fireredvad`](https://github.com/vector-ventures/mod_fireredvad)
	loads the files directly:

	```c
	Cmvn cmvn;
	VadWeights vad;
	AedWeights aed;

	fireredvad_load_cmvn("fireredvad.json", &cmvn);
	fireredvad_load_weights("fireredvad.bin", &vad, &aed);
	```

	### Dart (Flutter)

	[`fireredvad-dart`](https://github.com/voxcom-us/fireredvad-dart)
	bundles the same files as Flutter assets and parses them in pure Dart.

	## Reproducing

	The `export_frvd.py` script downloads the upstream PyTorch checkpoints
	and writes byte-identical `fireredvad.bin` + `fireredvad.json`:

	```bash
	uv run export_frvd.py
	```

	Dependencies (handled automatically by `uv` from the inline PEP 723
	metadata): `torch`, `numpy`, `kaldiio`, `huggingface_hub`, `fireredvad`.

	## License

	Apache 2.0, inherited from the upstream FireRedVAD release. The original
	model authors retain credit for training; this repo only provides a
	repackaged binary form.

	## Citation

	```bibtex
	@misc{fireredvad,
	title = {FireRedVAD: A SOTA Industrial-Grade Voice Activity
	Detection \& Audio Event Detection},
	author = {Xu, Kaituo and Li, Wenpeng and Huang, Kai and Liu, Kun},
	year = {2026},
	howpublished = {\url{https://github.com/FireRedTeam/FireRedVAD}},
	}
	```