Wav2Vec2 Base 960h โ ONNX
ONNX export of wav2vec2-base-960h, a Wav2Vec2 model fine-tuned on 960 hours of LibriSpeech for automatic speech recognition using CTC decoding.
Mirrored for use with inference4j, an inference-only AI library for Java.
Original Source
- Repository: Xenova (originally facebook/wav2vec2-base-960h)
- License: mit
Usage with inference4j
try (Wav2Vec2 model = Wav2Vec2.fromPretrained("models/wav2vec2-base-960h")) {
Transcription result = model.transcribe(Path.of("audio.wav"));
System.out.println(result.text());
}
Model Details
| Property | Value |
|---|---|
| Architecture | Wav2Vec2 Base (12 transformer layers) |
| Task | Automatic speech recognition (CTC decoding) |
| Training data | LibriSpeech 960h |
| Input | 16kHz mono audio (float32 waveform) |
| Output | CTC logits โ greedy-decoded text |
| Original framework | PyTorch (HuggingFace Transformers) |
| ONNX export | By Xenova (Transformers.js) |
License
This model is licensed under the MIT License. Original model by Facebook AI, ONNX export by Xenova.