| --- |
| license: apache-2.0 |
| library_name: c |
| base_model: FireRedTeam/FireRedVAD |
| tags: |
| - voice-activity-detection |
| - vad |
| - audio-event-detection |
| - aed |
| - streaming |
| - dfsmn |
| - c |
| - embedded |
| language: |
| - multilingual |
| --- |
| |
| # FireRedVAD-C — FRVD weights for the pure-C inference engine |
|
|
| Pre-converted weights for running |
| [FireRedTeam/FireRedVAD](https://huggingface.co/FireRedTeam/FireRedVAD) |
| on the zero-dependency C inference engine used by `mod_fireredvad` |
| (FreeSWITCH module) and `fireredvad-dart` (Flutter package). |
|
|
| The PyTorch checkpoints ship as `model.pth.tar` files and require |
| torch + kaldi at inference time. This repo strips them down to a single |
| flat float32 blob plus a JSON CMVN file, suitable for embedding in C, |
| Dart, or any runtime that just wants `fread()` + matmul. |
|
|
| ## Files |
|
|
| | File | Size | Description | |
| | --- | --- | --- | |
| | `fireredvad.bin` | 4.41 MB | FRVD weights — VAD + AED, LE float32 | |
| | `fireredvad.json` | 3.2 KB | CMVN stats (`means`, `inv_std`) — 80 bins | |
| | `export_frvd.py` | — | Reproducible export script (PyTorch → FRVD) | |
|
|
| ## Source models |
|
|
| - **VAD**: |
| [FireRedTeam/FireRedVAD/Stream-VAD](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/Stream-VAD) |
| — streaming-trained DFSMN, no lookahead used at inference (causal). |
| - **AED**: |
| [FireRedTeam/FireRedVAD/AED](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/AED) |
| — non-streaming DFSMN with lookahead, 3-class (speech / music / noise). |
| - **CMVN**: kaldi `cmvn.ark` from the same upstream repo, converted to JSON. |
|
|
| ## Architecture |
|
|
| DFSMN with shared topology for VAD and AED: |
|
|
| | | VAD (Stream-VAD) | AED | |
| | --- | --- | --- | |
| | Input dim (mel bins) | 80 | 80 | |
| | Hidden | 256 | 256 | |
| | Projection | 128 | 128 | |
| | FSMN blocks (R) | 8 | 8 | |
| | Lookback order (N1) | 20 | 20 | |
| | Lookahead order (N2) | 20 (skipped at inference) | 20 | |
| | Output classes | 1 (sigmoid) | 3 (softmax) | |
| | Parameters | 567,937 | 588,931 | |
|
|
| ## FRVD binary format |
|
|
| ```text |
| offset size field |
| 0 4 bytes magic = "FRVD" |
| 4 uint32 little-endian version = 1 |
| 8 float32[] VAD weights (see fireredvad.h::VadWeights) |
| ... float32[] AED weights (see fireredvad.h::AedWeights) |
| ``` |
|
|
| VAD layout (in read order): |
|
|
| - `inp_fc1_w[80*256]`, `inp_fc1_b[256]` |
| - `inp_fc2_w[256*128]`, `inp_fc2_b[128]` |
| - `fsmn0_lookback[128*20]` |
| - 7 × `{fc1_w[128*256], fc1_b[256], fc2_w[256*128], lookback[128*20]}` |
| - `out_fc1_w[128*256]`, `out_fc1_b[256]` |
| - `out_fc2_w[256*1]`, `out_fc2_b[1]` |
|
|
| AED layout adds lookahead at every FSMN site and uses 3-class output: |
|
|
| - `inp_fc1_w[80*256]`, `inp_fc1_b[256]`, |
| `inp_fc2_w[256*128]`, `inp_fc2_b[128]` |
| - `fsmn0_lookback[128*20]`, `fsmn0_lookahead[128*20]` |
| - 7 × `{fc1_w, fc1_b, fc2_w, lookback, lookahead}` |
| - `out_fc1_w[128*256]`, `out_fc1_b[256]` |
| - `out_fc2_w[256*3]`, `out_fc2_b[3]` |
|
|
| Linear weights are stored row-major as `[in, out]` (PyTorch's |
| `Linear.weight` transposed). Depthwise Conv1d filters are stored as |
| `[P, K]`. |
|
|
| ## Usage |
|
|
| ### Download |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| bin_path = hf_hub_download( |
| repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.bin" |
| ) |
| json_path = hf_hub_download( |
| repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.json" |
| ) |
| ``` |
|
|
| Or with the CLI: |
|
|
| ```bash |
| huggingface-cli download eschmidbauer/fireredvad-c --local-dir models/ |
| ``` |
|
|
| ### C (FreeSWITCH module) |
|
|
| [`mod_fireredvad`](https://github.com/vector-ventures/mod_fireredvad) |
| loads the files directly: |
|
|
| ```c |
| Cmvn cmvn; |
| VadWeights vad; |
| AedWeights aed; |
| |
| fireredvad_load_cmvn("fireredvad.json", &cmvn); |
| fireredvad_load_weights("fireredvad.bin", &vad, &aed); |
| ``` |
|
|
| ### Dart (Flutter) |
|
|
| [`fireredvad-dart`](https://github.com/voxcom-us/fireredvad-dart) |
| bundles the same files as Flutter assets and parses them in pure Dart. |
|
|
| ## Reproducing |
|
|
| The `export_frvd.py` script downloads the upstream PyTorch checkpoints |
| and writes byte-identical `fireredvad.bin` + `fireredvad.json`: |
|
|
| ```bash |
| uv run export_frvd.py |
| ``` |
|
|
| Dependencies (handled automatically by `uv` from the inline PEP 723 |
| metadata): `torch`, `numpy`, `kaldiio`, `huggingface_hub`, `fireredvad`. |
|
|
| ## License |
|
|
| Apache 2.0, inherited from the upstream FireRedVAD release. The original |
| model authors retain credit for training; this repo only provides a |
| repackaged binary form. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{fireredvad, |
| title = {FireRedVAD: A SOTA Industrial-Grade Voice Activity |
| Detection \& Audio Event Detection}, |
| author = {Xu, Kaituo and Li, Wenpeng and Huang, Kai and Liu, Kun}, |
| year = {2026}, |
| howpublished = {\url{https://github.com/FireRedTeam/FireRedVAD}}, |
| } |
| ``` |
|
|