wav2vec-vm-finetune-c

Pre-exported weights for eschmidbauer/wav2vec-vm-finetune-c, a zero-dependency C re-implementation of the jakeBland/wav2vec-vm-finetune voicemail detector. No PyTorch, no ONNX Runtime, no BLAS at runtime โ€” only libm and libpthread.

This repository contains the model weights in a custom layout consumed by the vm_detect C binary. It does not contain PyTorch / safetensors checkpoints โ€” those live in the upstream repo.

Contents

Directory Size Notes
weights-fp32/ ~1.26 GB every tensor as raw float32
weights-int8/ ~355 MB large MatMul weights as int8 + per-row float32 scale; conv / LayerNorm / biases / pos_conv remain float32

Each directory holds one manifest.json plus per-module subdirectories:

weights-fp32/
  manifest.json
  feature_extractor/   conv0..conv6 (weight, bias, norm_weight, norm_bias)
  feature_projection/  norm_*, proj.*
  pos_conv/            weight.bin, bias.bin  (weight-norm collapsed)
  encoder_norm/        weight.bin, bias.bin
  encoder/layer_{0..23}/
                       ln1.*, attn_{q,k,v,out}.*, ln2.*, ffn_{in,out}.*
  classifier/          projector.*, out.*

For the INT8 variant each quantized 2D weight is a pair:

<stem>.q8.bin     int8,    shape [M, K] row-major
<stem>.scale.bin  float32, shape [M]    (one scale per output row)

The C loader auto-detects these and dequantizes to float32 at model load time.

Model details

Fine-tune of facebook/wav2vec2-large for binary voicemail detection, taken unchanged from jakeBland/wav2vec-vm-finetune.

Task Binary audio classification (human vs voicemail)
Sample rate 16 kHz, mono
Input length 32,000 samples (2 s), raw float32 PCM
Hidden size 1024
FFN size 4096
Attention heads 16 (64-dim each)
Encoder layers 24
Classifier proj 256
Labels 0: human, 1: voicemail

See manifest.json (identical in both directories) for the full tensor list and shapes.

Usage

Download

pip install huggingface_hub
huggingface-cli download eschmidbauer/wav2vec-vm-finetune-c \
  --local-dir . --local-dir-use-symlinks False

Build and run the C inference binary

git clone https://github.com/eschmidbauer/wav2vec-vm-finetune-c
cd wav2vec-vm-finetune-c
make -C c

# preprocess an mp3/wav to 16 kHz mono float32 PCM (32,000 samples)
python prep_audio.py my_clip.mp3

# run inference against the fp32 or int8 weights
c/vm_detect path/to/weights-fp32 my_clip.f32
c/vm_detect path/to/weights-int8 my_clip.f32

Example output:

loaded weights-fp32 in 160 ms
my_clip.f32  voicemail  (human=0.034, voicemail=0.966)  [1703 ms]

Concurrent batch mode (shared model, N pthread workers):

c/vm_detect weights-fp32 --workers 4 clips/*.f32

See the project README for build flags, profiling env vars (WAV2VEC_PROF, WAV2VEC_DUMP), and the NEON SGEMM micro-kernel details.

Re-generating these weights

The files here were produced by extract_weights.py from the upstream PyTorch checkpoint:

python extract_weights.py                     # jakeBland/wav2vec-vm-finetune
MODEL_ID=other-user/model python extract_weights.py
python extract_weights.py path/to/local/dir

Re-runs are idempotent (delete weights-fp32/ or weights-int8/ to force regeneration).

Intended use and limitations

Designed for call-flow systems that need to decide, from the first ~2 seconds of audio, whether the other end is a live human or a voicemail greeting. Inherits the biases, language coverage, and failure modes of the upstream fine-tune โ€” clips outside the training distribution (non-English, heavy background noise, very short greetings) will degrade accuracy.

The INT8 variant trades a small amount of accuracy for a ~3.5ร— weight-size reduction and faster load time; behavior is otherwise identical.

License

MIT โ€” see LICENSE in the source repository. Upstream weights are subject to the license of jakeBland/wav2vec-vm-finetune.

Citation

@misc{wav2vec-vm-finetune-c,
  author = {Emmanuel Schmidbauer},
  title  = {wav2vec-vm-finetune-c: zero-dependency C inference for a wav2vec2 voicemail detector},
  year   = {2026},
  url    = {https://github.com/eschmidbauer/wav2vec-vm-finetune-c}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for eschmidbauer/wav2vec-vm-finetune-c

Finetuned
(1)
this model