File size: 4,638 Bytes

221475f

---
license: apache-2.0
library_name: c
base_model: FireRedTeam/FireRedVAD
tags:
- voice-activity-detection
- vad
- audio-event-detection
- aed
- streaming
- dfsmn
- c
- embedded
language:
- multilingual
---

# FireRedVAD-C — FRVD weights for the pure-C inference engine

Pre-converted weights for running
[FireRedTeam/FireRedVAD](https://huggingface.co/FireRedTeam/FireRedVAD)
on the zero-dependency C inference engine used by `mod_fireredvad`
(FreeSWITCH module) and `fireredvad-dart` (Flutter package).

The PyTorch checkpoints ship as `model.pth.tar` files and require
torch + kaldi at inference time. This repo strips them down to a single
flat float32 blob plus a JSON CMVN file, suitable for embedding in C,
Dart, or any runtime that just wants `fread()` + matmul.

## Files

| File | Size | Description |
| --- | --- | --- |
| `fireredvad.bin` | 4.41 MB | FRVD weights — VAD + AED, LE float32 |
| `fireredvad.json` | 3.2 KB | CMVN stats (`means`, `inv_std`) — 80 bins |
| `export_frvd.py` | — | Reproducible export script (PyTorch → FRVD) |

## Source models

- **VAD**:
  [FireRedTeam/FireRedVAD/Stream-VAD](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/Stream-VAD)
  — streaming-trained DFSMN, no lookahead used at inference (causal).
- **AED**:
  [FireRedTeam/FireRedVAD/AED](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/AED)
  — non-streaming DFSMN with lookahead, 3-class (speech / music / noise).
- **CMVN**: kaldi `cmvn.ark` from the same upstream repo, converted to JSON.

## Architecture

DFSMN with shared topology for VAD and AED:

| | VAD (Stream-VAD) | AED |
| --- | --- | --- |
| Input dim (mel bins) | 80 | 80 |
| Hidden | 256 | 256 |
| Projection | 128 | 128 |
| FSMN blocks (R) | 8 | 8 |
| Lookback order (N1) | 20 | 20 |
| Lookahead order (N2) | 20 (skipped at inference) | 20 |
| Output classes | 1 (sigmoid) | 3 (softmax) |
| Parameters | 567,937 | 588,931 |

## FRVD binary format

```text
offset  size                  field
0       4 bytes               magic = "FRVD"
4       uint32 little-endian  version = 1
8       float32[]             VAD weights (see fireredvad.h::VadWeights)
...     float32[]             AED weights (see fireredvad.h::AedWeights)
```

VAD layout (in read order):

- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`
- `inp_fc2_w[256*128]`, `inp_fc2_b[128]`
- `fsmn0_lookback[128*20]`
- 7 × `{fc1_w[128*256], fc1_b[256], fc2_w[256*128], lookback[128*20]}`
- `out_fc1_w[128*256]`, `out_fc1_b[256]`
- `out_fc2_w[256*1]`, `out_fc2_b[1]`

AED layout adds lookahead at every FSMN site and uses 3-class output:

- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`,
  `inp_fc2_w[256*128]`, `inp_fc2_b[128]`
- `fsmn0_lookback[128*20]`, `fsmn0_lookahead[128*20]`
- 7 × `{fc1_w, fc1_b, fc2_w, lookback, lookahead}`
- `out_fc1_w[128*256]`, `out_fc1_b[256]`
- `out_fc2_w[256*3]`, `out_fc2_b[3]`

Linear weights are stored row-major as `[in, out]` (PyTorch's
`Linear.weight` transposed). Depthwise Conv1d filters are stored as
`[P, K]`.

## Usage

### Download

```python
from huggingface_hub import hf_hub_download

bin_path = hf_hub_download(
    repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.bin"
)
json_path = hf_hub_download(
    repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.json"
)
```

Or with the CLI:

```bash
huggingface-cli download eschmidbauer/fireredvad-c --local-dir models/
```

### C (FreeSWITCH module)

[`mod_fireredvad`](https://github.com/vector-ventures/mod_fireredvad)
loads the files directly:

```c
Cmvn cmvn;
VadWeights vad;
AedWeights aed;

fireredvad_load_cmvn("fireredvad.json", &cmvn);
fireredvad_load_weights("fireredvad.bin", &vad, &aed);
```

### Dart (Flutter)

[`fireredvad-dart`](https://github.com/voxcom-us/fireredvad-dart)
bundles the same files as Flutter assets and parses them in pure Dart.

## Reproducing

The `export_frvd.py` script downloads the upstream PyTorch checkpoints
and writes byte-identical `fireredvad.bin` + `fireredvad.json`:

```bash
uv run export_frvd.py
```

Dependencies (handled automatically by `uv` from the inline PEP 723
metadata): `torch`, `numpy`, `kaldiio`, `huggingface_hub`, `fireredvad`.

## License

Apache 2.0, inherited from the upstream FireRedVAD release. The original
model authors retain credit for training; this repo only provides a
repackaged binary form.

## Citation

```bibtex
@misc{fireredvad,
  title  = {FireRedVAD: A SOTA Industrial-Grade Voice Activity
            Detection \& Audio Event Detection},
  author = {Xu, Kaituo and Li, Wenpeng and Huang, Kai and Liu, Kun},
  year   = {2026},
  howpublished = {\url{https://github.com/FireRedTeam/FireRedVAD}},
}
```