wav2vec2-base-960h / README.md
vccarvalho11's picture
Upload wav2vec2-base-960h ONNX model
a4f3a03 verified
metadata
library_name: onnx
tags:
  - wav2vec2
  - speech-to-text
  - automatic-speech-recognition
  - ctc
  - audio
  - onnx
  - inference4j
license: mit
pipeline_tag: automatic-speech-recognition

Wav2Vec2 Base 960h — ONNX

ONNX export of wav2vec2-base-960h, a Wav2Vec2 model fine-tuned on 960 hours of LibriSpeech for automatic speech recognition using CTC decoding.

Mirrored for use with inference4j, an inference-only AI library for Java.

Original Source

Usage with inference4j

try (Wav2Vec2 model = Wav2Vec2.fromPretrained("models/wav2vec2-base-960h")) {
    Transcription result = model.transcribe(Path.of("audio.wav"));
    System.out.println(result.text());
}

Model Details

Property Value
Architecture Wav2Vec2 Base (12 transformer layers)
Task Automatic speech recognition (CTC decoding)
Training data LibriSpeech 960h
Input 16kHz mono audio (float32 waveform)
Output CTC logits → greedy-decoded text
Original framework PyTorch (HuggingFace Transformers)
ONNX export By Xenova (Transformers.js)

License

This model is licensed under the MIT License. Original model by Facebook AI, ONNX export by Xenova.