File size: 4,638 Bytes
221475f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
license: apache-2.0
library_name: c
base_model: FireRedTeam/FireRedVAD
tags:
- voice-activity-detection
- vad
- audio-event-detection
- aed
- streaming
- dfsmn
- c
- embedded
language:
- multilingual
---

# FireRedVAD-C — FRVD weights for the pure-C inference engine

Pre-converted weights for running
[FireRedTeam/FireRedVAD](https://huggingface.co/FireRedTeam/FireRedVAD)
on the zero-dependency C inference engine used by `mod_fireredvad`
(FreeSWITCH module) and `fireredvad-dart` (Flutter package).

The PyTorch checkpoints ship as `model.pth.tar` files and require
torch + kaldi at inference time. This repo strips them down to a single
flat float32 blob plus a JSON CMVN file, suitable for embedding in C,
Dart, or any runtime that just wants `fread()` + matmul.

## Files

| File | Size | Description |
| --- | --- | --- |
| `fireredvad.bin` | 4.41 MB | FRVD weights — VAD + AED, LE float32 |
| `fireredvad.json` | 3.2 KB | CMVN stats (`means`, `inv_std`) — 80 bins |
| `export_frvd.py` | — | Reproducible export script (PyTorch → FRVD) |

## Source models

- **VAD**:
  [FireRedTeam/FireRedVAD/Stream-VAD](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/Stream-VAD)
  — streaming-trained DFSMN, no lookahead used at inference (causal).
- **AED**:
  [FireRedTeam/FireRedVAD/AED](https://huggingface.co/FireRedTeam/FireRedVAD/tree/main/AED)
  — non-streaming DFSMN with lookahead, 3-class (speech / music / noise).
- **CMVN**: kaldi `cmvn.ark` from the same upstream repo, converted to JSON.

## Architecture

DFSMN with shared topology for VAD and AED:

| | VAD (Stream-VAD) | AED |
| --- | --- | --- |
| Input dim (mel bins) | 80 | 80 |
| Hidden | 256 | 256 |
| Projection | 128 | 128 |
| FSMN blocks (R) | 8 | 8 |
| Lookback order (N1) | 20 | 20 |
| Lookahead order (N2) | 20 (skipped at inference) | 20 |
| Output classes | 1 (sigmoid) | 3 (softmax) |
| Parameters | 567,937 | 588,931 |

## FRVD binary format

```text
offset  size                  field
0       4 bytes               magic = "FRVD"
4       uint32 little-endian  version = 1
8       float32[]             VAD weights (see fireredvad.h::VadWeights)
...     float32[]             AED weights (see fireredvad.h::AedWeights)
```

VAD layout (in read order):

- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`
- `inp_fc2_w[256*128]`, `inp_fc2_b[128]`
- `fsmn0_lookback[128*20]`
- 7 × `{fc1_w[128*256], fc1_b[256], fc2_w[256*128], lookback[128*20]}`
- `out_fc1_w[128*256]`, `out_fc1_b[256]`
- `out_fc2_w[256*1]`, `out_fc2_b[1]`

AED layout adds lookahead at every FSMN site and uses 3-class output:

- `inp_fc1_w[80*256]`, `inp_fc1_b[256]`,
  `inp_fc2_w[256*128]`, `inp_fc2_b[128]`
- `fsmn0_lookback[128*20]`, `fsmn0_lookahead[128*20]`
- 7 × `{fc1_w, fc1_b, fc2_w, lookback, lookahead}`
- `out_fc1_w[128*256]`, `out_fc1_b[256]`
- `out_fc2_w[256*3]`, `out_fc2_b[3]`

Linear weights are stored row-major as `[in, out]` (PyTorch's
`Linear.weight` transposed). Depthwise Conv1d filters are stored as
`[P, K]`.

## Usage

### Download

```python
from huggingface_hub import hf_hub_download

bin_path = hf_hub_download(
    repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.bin"
)
json_path = hf_hub_download(
    repo_id="eschmidbauer/fireredvad-c", filename="fireredvad.json"
)
```

Or with the CLI:

```bash
huggingface-cli download eschmidbauer/fireredvad-c --local-dir models/
```

### C (FreeSWITCH module)

[`mod_fireredvad`](https://github.com/vector-ventures/mod_fireredvad)
loads the files directly:

```c
Cmvn cmvn;
VadWeights vad;
AedWeights aed;

fireredvad_load_cmvn("fireredvad.json", &cmvn);
fireredvad_load_weights("fireredvad.bin", &vad, &aed);
```

### Dart (Flutter)

[`fireredvad-dart`](https://github.com/voxcom-us/fireredvad-dart)
bundles the same files as Flutter assets and parses them in pure Dart.

## Reproducing

The `export_frvd.py` script downloads the upstream PyTorch checkpoints
and writes byte-identical `fireredvad.bin` + `fireredvad.json`:

```bash
uv run export_frvd.py
```

Dependencies (handled automatically by `uv` from the inline PEP 723
metadata): `torch`, `numpy`, `kaldiio`, `huggingface_hub`, `fireredvad`.

## License

Apache 2.0, inherited from the upstream FireRedVAD release. The original
model authors retain credit for training; this repo only provides a
repackaged binary form.

## Citation

```bibtex
@misc{fireredvad,
  title  = {FireRedVAD: A SOTA Industrial-Grade Voice Activity
            Detection \& Audio Event Detection},
  author = {Xu, Kaituo and Li, Wenpeng and Huang, Kai and Liu, Kun},
  year   = {2026},
  howpublished = {\url{https://github.com/FireRedTeam/FireRedVAD}},
}
```