File size: 4,638 Bytes
221475f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | ---
license: apache-2.0
library_name: c
base_model: FireRedTeam/FireRedVAD
tags:
- voice-activity-detection
- vad
- audio-event-detection
- aed
- streaming
- dfsmn
- c
- embedded
language:
- multilingual
---
# FireRedVAD-C — FRVD weights for the pure-C inference engine
Pre-converted weights for running
[FireRedTeam/FireRedVAD](https://huggingface.co/FireRedTeam/FireRedVAD)
on the zero-dependency C inference engine used by `mod_fireredvad`
(FreeSWITCH module) and `fireredvad-dart` (Flutter package).
The PyTorch checkpoints ship as `model.pth.tar` files and require
torch + kaldi at inference time. This repo strips them down to a single
flat float32 blob plus a JSON CMVN file, suitable for embedding in C,
Dart, or any runtime that just wants `fread()` + matmul.
## Files
| File | Size | Description |
| --- | --- | --- |
| `fireredvad.bin` | 4.41 MB | FRVD weights — VAD + AED, LE float32 |
| `fireredvad.json` | 3.2 KB | CMVN stats (`means`, `inv_std`) — 80 bins |
| `export_frvd.py` | — | Reproducible export script (PyTorch → FRVD) |
## Source models
- **VAD**:
[FireRedTeam/FireRedVAD/Stream-VAD](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/Stream-VAD)
— streaming-trained DFSMN, no lookahead used at inference (causal).
- **AED**:
[FireRedTeam/FireRedVAD/AED](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/AED)
— non-streaming DFSMN with lookahead, 3-class (speech / music / noise).
- **CMVN**: kaldi `cmvn.ark` from the same upstream repo, converted to JSON.
## Architecture
DFSMN with shared topology for VAD and AED:
| | VAD (Stream-VAD) | AED |
| --- | --- | --- |
| Input dim (mel bins) | 80 | 80 |
| Hidden | 256 | 256 |
| Projection | 128 | 128 |
| FSMN blocks (R) | 8 | 8 |
| Lookback order (N1) | 20 | 20 |
| Lookahead order (N2) | 20 (skipped at inference) | 20 |
| Output classes | 1 (sigmoid) | 3 (softmax) |
| Parameters | 567,937 | 588,931 |
## FRVD binary format
```text
offset size field
0 4 bytes magic = "FRVD"
4 uint32 little-endian version = 1
8 float32[] VAD weights (see fireredvad.h::VadWeights)
... float32[] AED weights (see fireredvad.h::AedWeights)
```
VAD layout (in read order):
- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`
- `inp_fc2_w[256*128]`, `inp_fc2_b[128]`
- `fsmn0_lookback[128*20]`
- 7 × `{fc1_w[128*256], fc1_b[256], fc2_w[256*128], lookback[128*20]}`
- `out_fc1_w[128*256]`, `out_fc1_b[256]`
- `out_fc2_w[256*1]`, `out_fc2_b[1]`
AED layout adds lookahead at every FSMN site and uses 3-class output:
- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`,
`inp_fc2_w[256*128]`, `inp_fc2_b[128]`
- `fsmn0_lookback[128*20]`, `fsmn0_lookahead[128*20]`
- 7 × `{fc1_w, fc1_b, fc2_w, lookback, lookahead}`
- `out_fc1_w[128*256]`, `out_fc1_b[256]`
- `out_fc2_w[256*3]`, `out_fc2_b[3]`
Linear weights are stored row-major as `[in, out]` (PyTorch's
`Linear.weight` transposed). Depthwise Conv1d filters are stored as
`[P, K]`.
## Usage
### Download
```python
from huggingface_hub import hf_hub_download
bin_path = hf_hub_download(
repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.bin"
)
json_path = hf_hub_download(
repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.json"
)
```
Or with the CLI:
```bash
huggingface-cli download eschmidbauer/fireredvad-c --local-dir models/
```
### C (FreeSWITCH module)
[`mod_fireredvad`](https://github.com/vector-ventures/mod_fireredvad)
loads the files directly:
```c
Cmvn cmvn;
VadWeights vad;
AedWeights aed;
fireredvad_load_cmvn("fireredvad.json", &cmvn);
fireredvad_load_weights("fireredvad.bin", &vad, &aed);
```
### Dart (Flutter)
[`fireredvad-dart`](https://github.com/voxcom-us/fireredvad-dart)
bundles the same files as Flutter assets and parses them in pure Dart.
## Reproducing
The `export_frvd.py` script downloads the upstream PyTorch checkpoints
and writes byte-identical `fireredvad.bin` + `fireredvad.json`:
```bash
uv run export_frvd.py
```
Dependencies (handled automatically by `uv` from the inline PEP 723
metadata): `torch`, `numpy`, `kaldiio`, `huggingface_hub`, `fireredvad`.
## License
Apache 2.0, inherited from the upstream FireRedVAD release. The original
model authors retain credit for training; this repo only provides a
repackaged binary form.
## Citation
```bibtex
@misc{fireredvad,
title = {FireRedVAD: A SOTA Industrial-Grade Voice Activity
Detection \& Audio Event Detection},
author = {Xu, Kaituo and Li, Wenpeng and Huang, Kai and Liu, Kun},
year = {2026},
howpublished = {\url{https://github.com/FireRedTeam/FireRedVAD}},
}
```
|