| | --- |
| | library_name: onnx |
| | tags: |
| | - wav2vec2 |
| | - speech-to-text |
| | - automatic-speech-recognition |
| | - ctc |
| | - audio |
| | - onnx |
| | - inference4j |
| | license: mit |
| | pipeline_tag: automatic-speech-recognition |
| | --- |
| | |
| | # Wav2Vec2 Base 960h — ONNX |
| |
|
| | ONNX export of [wav2vec2-base-960h](https://huggingface.co/Xenova/wav2vec2-base-960h), a Wav2Vec2 model fine-tuned on 960 hours of LibriSpeech for automatic speech recognition using CTC decoding. |
| |
|
| | Mirrored for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java. |
| |
|
| | ## Original Source |
| |
|
| | - **Repository:** [Xenova (originally facebook/wav2vec2-base-960h)](https://huggingface.co/Xenova/wav2vec2-base-960h) |
| | - **License:** mit |
| |
|
| | ## Usage with inference4j |
| |
|
| | ```java |
| | try (Wav2Vec2 model = Wav2Vec2.fromPretrained("models/wav2vec2-base-960h")) { |
| | Transcription result = model.transcribe(Path.of("audio.wav")); |
| | System.out.println(result.text()); |
| | } |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | Architecture | Wav2Vec2 Base (12 transformer layers) | |
| | | Task | Automatic speech recognition (CTC decoding) | |
| | | Training data | LibriSpeech 960h | |
| | | Input | 16kHz mono audio (float32 waveform) | |
| | | Output | CTC logits → greedy-decoded text | |
| | | Original framework | PyTorch (HuggingFace Transformers) | |
| | | ONNX export | By Xenova (Transformers.js) | |
| |
|
| | ## License |
| |
|
| | This model is licensed under the [MIT License](https://opensource.org/licenses/MIT). Original model by [Facebook AI](https://huggingface.co/facebook/wav2vec2-base-960h), ONNX export by [Xenova](https://huggingface.co/Xenova). |
| |
|