Wav2Vec2 Base 960h โ€” ONNX

ONNX export of wav2vec2-base-960h, a Wav2Vec2 model fine-tuned on 960 hours of LibriSpeech for automatic speech recognition using CTC decoding.

Mirrored for use with inference4j, an inference-only AI library for Java.

Original Source

Usage with inference4j

try (Wav2Vec2 model = Wav2Vec2.fromPretrained("models/wav2vec2-base-960h")) {
    Transcription result = model.transcribe(Path.of("audio.wav"));
    System.out.println(result.text());
}

Model Details

Property Value
Architecture Wav2Vec2 Base (12 transformer layers)
Task Automatic speech recognition (CTC decoding)
Training data LibriSpeech 960h
Input 16kHz mono audio (float32 waveform)
Output CTC logits โ†’ greedy-decoded text
Original framework PyTorch (HuggingFace Transformers)
ONNX export By Xenova (Transformers.js)

License

This model is licensed under the MIT License. Original model by Facebook AI, ONNX export by Xenova.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support