File size: 1,581 Bytes
a4f3a03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
library_name: onnx
tags:
  - wav2vec2
  - speech-to-text
  - automatic-speech-recognition
  - ctc
  - audio
  - onnx
  - inference4j
license: mit
pipeline_tag: automatic-speech-recognition
---

# Wav2Vec2 Base 960h — ONNX

ONNX export of [wav2vec2-base-960h](https://huggingface.co/Xenova/wav2vec2-base-960h), a Wav2Vec2 model fine-tuned on 960 hours of LibriSpeech for automatic speech recognition using CTC decoding.

Mirrored for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java.

## Original Source

- **Repository:** [Xenova (originally facebook/wav2vec2-base-960h)](https://huggingface.co/Xenova/wav2vec2-base-960h)
- **License:** mit

## Usage with inference4j

```java
try (Wav2Vec2 model = Wav2Vec2.fromPretrained("models/wav2vec2-base-960h")) {
    Transcription result = model.transcribe(Path.of("audio.wav"));
    System.out.println(result.text());
}
```

## Model Details

| Property | Value |
|----------|-------|
| Architecture | Wav2Vec2 Base (12 transformer layers) |
| Task | Automatic speech recognition (CTC decoding) |
| Training data | LibriSpeech 960h |
| Input | 16kHz mono audio (float32 waveform) |
| Output | CTC logits → greedy-decoded text |
| Original framework | PyTorch (HuggingFace Transformers) |
| ONNX export | By Xenova (Transformers.js) |

## License

This model is licensed under the [MIT License](https://opensource.org/licenses/MIT). Original model by [Facebook AI](https://huggingface.co/facebook/wav2vec2-base-960h), ONNX export by [Xenova](https://huggingface.co/Xenova).