fireredvad-c / README.md
eschmidbauer's picture
add export script
221475f
---
license: apache-2.0
library_name: c
base_model: FireRedTeam/FireRedVAD
tags:
- voice-activity-detection
- vad
- audio-event-detection
- aed
- streaming
- dfsmn
- c
- embedded
language:
- multilingual
---
# FireRedVAD-C — FRVD weights for the pure-C inference engine
Pre-converted weights for running
[FireRedTeam/FireRedVAD](https://huggingface.co/FireRedTeam/FireRedVAD)
on the zero-dependency C inference engine used by `mod_fireredvad`
(FreeSWITCH module) and `fireredvad-dart` (Flutter package).
The PyTorch checkpoints ship as `model.pth.tar` files and require
torch + kaldi at inference time. This repo strips them down to a single
flat float32 blob plus a JSON CMVN file, suitable for embedding in C,
Dart, or any runtime that just wants `fread()` + matmul.
## Files
| File | Size | Description |
| --- | --- | --- |
| `fireredvad.bin` | 4.41 MB | FRVD weights — VAD + AED, LE float32 |
| `fireredvad.json` | 3.2 KB | CMVN stats (`means`, `inv_std`) — 80 bins |
| `export_frvd.py` | — | Reproducible export script (PyTorch → FRVD) |
## Source models
- **VAD**:
[FireRedTeam/FireRedVAD/Stream-VAD](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/Stream-VAD)
— streaming-trained DFSMN, no lookahead used at inference (causal).
- **AED**:
[FireRedTeam/FireRedVAD/AED](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/AED)
— non-streaming DFSMN with lookahead, 3-class (speech / music / noise).
- **CMVN**: kaldi `cmvn.ark` from the same upstream repo, converted to JSON.
## Architecture
DFSMN with shared topology for VAD and AED:
| | VAD (Stream-VAD) | AED |
| --- | --- | --- |
| Input dim (mel bins) | 80 | 80 |
| Hidden | 256 | 256 |
| Projection | 128 | 128 |
| FSMN blocks (R) | 8 | 8 |
| Lookback order (N1) | 20 | 20 |
| Lookahead order (N2) | 20 (skipped at inference) | 20 |
| Output classes | 1 (sigmoid) | 3 (softmax) |
| Parameters | 567,937 | 588,931 |
## FRVD binary format
```text
offset size field
0 4 bytes magic = "FRVD"
4 uint32 little-endian version = 1
8 float32[] VAD weights (see fireredvad.h::VadWeights)
... float32[] AED weights (see fireredvad.h::AedWeights)
```
VAD layout (in read order):
- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`
- `inp_fc2_w[256*128]`, `inp_fc2_b[128]`
- `fsmn0_lookback[128*20]`
- 7 × `{fc1_w[128*256], fc1_b[256], fc2_w[256*128], lookback[128*20]}`
- `out_fc1_w[128*256]`, `out_fc1_b[256]`
- `out_fc2_w[256*1]`, `out_fc2_b[1]`
AED layout adds lookahead at every FSMN site and uses 3-class output:
- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`,
`inp_fc2_w[256*128]`, `inp_fc2_b[128]`
- `fsmn0_lookback[128*20]`, `fsmn0_lookahead[128*20]`
- 7 × `{fc1_w, fc1_b, fc2_w, lookback, lookahead}`
- `out_fc1_w[128*256]`, `out_fc1_b[256]`
- `out_fc2_w[256*3]`, `out_fc2_b[3]`
Linear weights are stored row-major as `[in, out]` (PyTorch's
`Linear.weight` transposed). Depthwise Conv1d filters are stored as
`[P, K]`.
## Usage
### Download
```python
from huggingface_hub import hf_hub_download
bin_path = hf_hub_download(
repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.bin"
)
json_path = hf_hub_download(
repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.json"
)
```
Or with the CLI:
```bash
huggingface-cli download eschmidbauer/fireredvad-c --local-dir models/
```
### C (FreeSWITCH module)
[`mod_fireredvad`](https://github.com/vector-ventures/mod_fireredvad)
loads the files directly:
```c
Cmvn cmvn;
VadWeights vad;
AedWeights aed;
fireredvad_load_cmvn("fireredvad.json", &cmvn);
fireredvad_load_weights("fireredvad.bin", &vad, &aed);
```
### Dart (Flutter)
[`fireredvad-dart`](https://github.com/voxcom-us/fireredvad-dart)
bundles the same files as Flutter assets and parses them in pure Dart.
## Reproducing
The `export_frvd.py` script downloads the upstream PyTorch checkpoints
and writes byte-identical `fireredvad.bin` + `fireredvad.json`:
```bash
uv run export_frvd.py
```
Dependencies (handled automatically by `uv` from the inline PEP 723
metadata): `torch`, `numpy`, `kaldiio`, `huggingface_hub`, `fireredvad`.
## License
Apache 2.0, inherited from the upstream FireRedVAD release. The original
model authors retain credit for training; this repo only provides a
repackaged binary form.
## Citation
```bibtex
@misc{fireredvad,
title = {FireRedVAD: A SOTA Industrial-Grade Voice Activity
Detection \& Audio Event Detection},
author = {Xu, Kaituo and Li, Wenpeng and Huang, Kai and Liu, Kun},
year = {2026},
howpublished = {\url{https://github.com/FireRedTeam/FireRedVAD}},
}
```