--- license: apache-2.0 library_name: c base_model: FireRedTeam/FireRedVAD tags: - voice-activity-detection - vad - audio-event-detection - aed - streaming - dfsmn - c - embedded language: - multilingual --- # FireRedVAD-C — FRVD weights for the pure-C inference engine Pre-converted weights for running [FireRedTeam/FireRedVAD](https://huggingface.co/FireRedTeam/FireRedVAD) on the zero-dependency C inference engine used by `mod_fireredvad` (FreeSWITCH module) and `fireredvad-dart` (Flutter package). The PyTorch checkpoints ship as `model.pth.tar` files and require torch + kaldi at inference time. This repo strips them down to a single flat float32 blob plus a JSON CMVN file, suitable for embedding in C, Dart, or any runtime that just wants `fread()` + matmul. ## Files | File | Size | Description | | --- | --- | --- | | `fireredvad.bin` | 4.41 MB | FRVD weights — VAD + AED, LE float32 | | `fireredvad.json` | 3.2 KB | CMVN stats (`means`, `inv_std`) — 80 bins | | `export_frvd.py` | — | Reproducible export script (PyTorch → FRVD) | ## Source models - **VAD**: [FireRedTeam/FireRedVAD/Stream-VAD](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/Stream-VAD) — streaming-trained DFSMN, no lookahead used at inference (causal). - **AED**: [FireRedTeam/FireRedVAD/AED](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/AED) — non-streaming DFSMN with lookahead, 3-class (speech / music / noise). - **CMVN**: kaldi `cmvn.ark` from the same upstream repo, converted to JSON. ## Architecture DFSMN with shared topology for VAD and AED: | | VAD (Stream-VAD) | AED | | --- | --- | --- | | Input dim (mel bins) | 80 | 80 | | Hidden | 256 | 256 | | Projection | 128 | 128 | | FSMN blocks (R) | 8 | 8 | | Lookback order (N1) | 20 | 20 | | Lookahead order (N2) | 20 (skipped at inference) | 20 | | Output classes | 1 (sigmoid) | 3 (softmax) | | Parameters | 567,937 | 588,931 | ## FRVD binary format ```text offset size field 0 4 bytes magic = "FRVD" 4 uint32 little-endian version = 1 8 float32[] VAD weights (see fireredvad.h::VadWeights) ... float32[] AED weights (see fireredvad.h::AedWeights) ``` VAD layout (in read order): - `inp_fc1_w[80*256]`, `inp_fc1_b[256]` - `inp_fc2_w[256*128]`, `inp_fc2_b[128]` - `fsmn0_lookback[128*20]` - 7 × `{fc1_w[128*256], fc1_b[256], fc2_w[256*128], lookback[128*20]}` - `out_fc1_w[128*256]`, `out_fc1_b[256]` - `out_fc2_w[256*1]`, `out_fc2_b[1]` AED layout adds lookahead at every FSMN site and uses 3-class output: - `inp_fc1_w[80*256]`, `inp_fc1_b[256]`, `inp_fc2_w[256*128]`, `inp_fc2_b[128]` - `fsmn0_lookback[128*20]`, `fsmn0_lookahead[128*20]` - 7 × `{fc1_w, fc1_b, fc2_w, lookback, lookahead}` - `out_fc1_w[128*256]`, `out_fc1_b[256]` - `out_fc2_w[256*3]`, `out_fc2_b[3]` Linear weights are stored row-major as `[in, out]` (PyTorch's `Linear.weight` transposed). Depthwise Conv1d filters are stored as `[P, K]`. ## Usage ### Download ```python from huggingface_hub import hf_hub_download bin_path = hf_hub_download( repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.bin" ) json_path = hf_hub_download( repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.json" ) ``` Or with the CLI: ```bash huggingface-cli download eschmidbauer/fireredvad-c --local-dir models/ ``` ### C (FreeSWITCH module) [`mod_fireredvad`](https://github.com/vector-ventures/mod_fireredvad) loads the files directly: ```c Cmvn cmvn; VadWeights vad; AedWeights aed; fireredvad_load_cmvn("fireredvad.json", &cmvn); fireredvad_load_weights("fireredvad.bin", &vad, &aed); ``` ### Dart (Flutter) [`fireredvad-dart`](https://github.com/voxcom-us/fireredvad-dart) bundles the same files as Flutter assets and parses them in pure Dart. ## Reproducing The `export_frvd.py` script downloads the upstream PyTorch checkpoints and writes byte-identical `fireredvad.bin` + `fireredvad.json`: ```bash uv run export_frvd.py ``` Dependencies (handled automatically by `uv` from the inline PEP 723 metadata): `torch`, `numpy`, `kaldiio`, `huggingface_hub`, `fireredvad`. ## License Apache 2.0, inherited from the upstream FireRedVAD release. The original model authors retain credit for training; this repo only provides a repackaged binary form. ## Citation ```bibtex @misc{fireredvad, title = {FireRedVAD: A SOTA Industrial-Grade Voice Activity Detection \& Audio Event Detection}, author = {Xu, Kaituo and Li, Wenpeng and Huang, Kai and Liu, Kun}, year = {2026}, howpublished = {\url{https://github.com/FireRedTeam/FireRedVAD}}, } ```