Wav2Vec2 Base 960h — ONNX

ONNX export of wav2vec2-base-960h, a Wav2Vec2 model fine-tuned on 960 hours of LibriSpeech for automatic speech recognition using CTC decoding.

Mirrored for use with inference4j, an inference-only AI library for Java.

Original Source

Repository: Xenova (originally facebook/wav2vec2-base-960h)
License: mit

Usage with inference4j

try (Wav2Vec2 model = Wav2Vec2.fromPretrained("models/wav2vec2-base-960h")) {
    Transcription result = model.transcribe(Path.of("audio.wav"));
    System.out.println(result.text());
}

Model Details

Property	Value
Architecture	Wav2Vec2 Base (12 transformer layers)
Task	Automatic speech recognition (CTC decoding)
Training data	LibriSpeech 960h
Input	16kHz mono audio (float32 waveform)
Output	CTC logits → greedy-decoded text
Original framework	PyTorch (HuggingFace Transformers)
ONNX export	By Xenova (Transformers.js)

License

This model is licensed under the MIT License. Original model by Facebook AI, ONNX export by Xenova.

Downloads last month: -; Downloads are not tracked for this model. How to track