Audio Classification
Transformers
ONNX
Japanese
multilingual
voice-activity-detection
vad
whisper
speech-detection
asmr
japanese
whispered-speech
Instructions to use TransWithAI/Whisper-Vad-EncDec-ASMR-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TransWithAI/Whisper-Vad-EncDec-ASMR-onnx with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("audio-classification", model="TransWithAI/Whisper-Vad-EncDec-ASMR-onnx")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TransWithAI/Whisper-Vad-EncDec-ASMR-onnx", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Supported in CrispASR (C++ runtime, GGUF format)
#3
by cstr - opened
Hi! We've added native support for this model in CrispASR, our unified C++ ASR toolkit.
What's included:
- ONNX → GGUF converter with graph topology tracing for anonymous tensor mapping
- Full C++ runtime: whisper-base encoder (6L) + 2-layer TransformerDecoder + frame classifier
- Works with both F32 (113 MB) and Q4_K (22 MB) quantization
- Auto-download:
--vad -vm whisper-vadfetches the Q4_K GGUF automatically - Integrated into the VAD dispatch pipeline alongside Silero VAD and FireRedVAD
Pre-built GGUFs: cstr/whisper-vad-encdec-asmr-GGUF
Usage:
crispasr --backend whisper -m auto --auto-download --vad -vm whisper-vad -f audio.wav
Tested on English, German, and various audio lengths. The model produces clean VAD segmentation (similar to FireRedVAD) but adds ~1s overhead from the whisper encoder pass. See the benchmark comparison in our README.
Thanks for publishing the model!
This comment has been hidden