wav2vec-vm-finetune-c

Pre-exported weights for eschmidbauer/wav2vec-vm-finetune-c, a zero-dependency C re-implementation of the jakeBland/wav2vec-vm-finetune voicemail detector. No PyTorch, no ONNX Runtime, no BLAS at runtime — only libm and libpthread.

This repository contains the model weights in a custom layout consumed by the vm_detect C binary. It does not contain PyTorch / safetensors checkpoints — those live in the upstream repo.

Directory	Size	Notes
`weights-fp32/`	~1.26 GB	every tensor as raw `float32`
`weights-int8/`	~355 MB	large MatMul weights as `int8` + per-row `float32` scale; conv / LayerNorm / biases / pos_conv remain `float32`

Each directory holds one manifest.json plus per-module subdirectories:

weights-fp32/
  manifest.json
  feature_extractor/   conv0..conv6 (weight, bias, norm_weight, norm_bias)
  feature_projection/  norm_*, proj.*
  pos_conv/            weight.bin, bias.bin  (weight-norm collapsed)
  encoder_norm/        weight.bin, bias.bin
  encoder/layer_{0..23}/
                       ln1.*, attn_{q,k,v,out}.*, ln2.*, ffn_{in,out}.*
  classifier/          projector.*, out.*

For the INT8 variant each quantized 2D weight is a pair:

<stem>.q8.bin     int8,    shape [M, K] row-major
<stem>.scale.bin  float32, shape [M]    (one scale per output row)

The C loader auto-detects these and dequantizes to float32 at model load time.

Model details

Fine-tune of facebook/wav2vec2-large for binary voicemail detection, taken unchanged from jakeBland/wav2vec-vm-finetune.


Task	Binary audio classification (`human` vs `voicemail`)
Sample rate	16 kHz, mono
Input length	32,000 samples (2 s), raw float32 PCM
Hidden size	1024
FFN size	4096
Attention heads	16 (64-dim each)
Encoder layers	24
Classifier proj	256
Labels	`0: human`, `1: voicemail`

See manifest.json (identical in both directories) for the full tensor list and shapes.

Usage

Download

pip install huggingface_hub
huggingface-cli download eschmidbauer/wav2vec-vm-finetune-c \
  --local-dir . --local-dir-use-symlinks False

Build and run the C inference binary

git clone https://github.com/eschmidbauer/wav2vec-vm-finetune-c
cd wav2vec-vm-finetune-c
make -C c

# preprocess an mp3/wav to 16 kHz mono float32 PCM (32,000 samples)
python prep_audio.py my_clip.mp3

# run inference against the fp32 or int8 weights
c/vm_detect path/to/weights-fp32 my_clip.f32
c/vm_detect path/to/weights-int8 my_clip.f32

Example output:

loaded weights-fp32 in 160 ms
my_clip.f32  voicemail  (human=0.034, voicemail=0.966)  [1703 ms]

Concurrent batch mode (shared model, N pthread workers):

c/vm_detect weights-fp32 --workers 4 clips/*.f32

See the project README for build flags, profiling env vars (WAV2VEC_PROF, WAV2VEC_DUMP), and the NEON SGEMM micro-kernel details.

Re-generating these weights

The files here were produced by extract_weights.py from the upstream PyTorch checkpoint:

python extract_weights.py                     # jakeBland/wav2vec-vm-finetune
MODEL_ID=other-user/model python extract_weights.py
python extract_weights.py path/to/local/dir

Re-runs are idempotent (delete weights-fp32/ or weights-int8/ to force regeneration).

Intended use and limitations

Designed for call-flow systems that need to decide, from the first ~2 seconds of audio, whether the other end is a live human or a voicemail greeting. Inherits the biases, language coverage, and failure modes of the upstream fine-tune — clips outside the training distribution (non-English, heavy background noise, very short greetings) will degrade accuracy.

The INT8 variant trades a small amount of accuracy for a ~3.5× weight-size reduction and faster load time; behavior is otherwise identical.

License

MIT — see LICENSE in the source repository. Upstream weights are subject to the license of jakeBland/wav2vec-vm-finetune.

Citation

@misc{wav2vec-vm-finetune-c,
  author = {Emmanuel Schmidbauer},
  title  = {wav2vec-vm-finetune-c: zero-dependency C inference for a wav2vec2 voicemail detector},
  year   = {2026},
  url    = {https://github.com/eschmidbauer/wav2vec-vm-finetune-c}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for eschmidbauer/wav2vec-vm-finetune-c

Base model

facebook/wav2vec2-xls-r-300m

Finetuned

jakeBland/wav2vec-vm-finetune