vccarvalho11 commited on
Commit
a4f3a03
·
verified ·
1 Parent(s): 5230420

Upload wav2vec2-base-960h ONNX model

Browse files
Files changed (3) hide show
  1. README.md +49 -0
  2. model.onnx +3 -0
  3. vocab.json +34 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: onnx
3
+ tags:
4
+ - wav2vec2
5
+ - speech-to-text
6
+ - automatic-speech-recognition
7
+ - ctc
8
+ - audio
9
+ - onnx
10
+ - inference4j
11
+ license: mit
12
+ pipeline_tag: automatic-speech-recognition
13
+ ---
14
+
15
+ # Wav2Vec2 Base 960h — ONNX
16
+
17
+ ONNX export of [wav2vec2-base-960h](https://huggingface.co/Xenova/wav2vec2-base-960h), a Wav2Vec2 model fine-tuned on 960 hours of LibriSpeech for automatic speech recognition using CTC decoding.
18
+
19
+ Mirrored for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java.
20
+
21
+ ## Original Source
22
+
23
+ - **Repository:** [Xenova (originally facebook/wav2vec2-base-960h)](https://huggingface.co/Xenova/wav2vec2-base-960h)
24
+ - **License:** mit
25
+
26
+ ## Usage with inference4j
27
+
28
+ ```java
29
+ try (Wav2Vec2 model = Wav2Vec2.fromPretrained("models/wav2vec2-base-960h")) {
30
+ Transcription result = model.transcribe(Path.of("audio.wav"));
31
+ System.out.println(result.text());
32
+ }
33
+ ```
34
+
35
+ ## Model Details
36
+
37
+ | Property | Value |
38
+ |----------|-------|
39
+ | Architecture | Wav2Vec2 Base (12 transformer layers) |
40
+ | Task | Automatic speech recognition (CTC decoding) |
41
+ | Training data | LibriSpeech 960h |
42
+ | Input | 16kHz mono audio (float32 waveform) |
43
+ | Output | CTC logits → greedy-decoded text |
44
+ | Original framework | PyTorch (HuggingFace Transformers) |
45
+ | ONNX export | By Xenova (Transformers.js) |
46
+
47
+ ## License
48
+
49
+ This model is licensed under the [MIT License](https://opensource.org/licenses/MIT). Original model by [Facebook AI](https://huggingface.co/facebook/wav2vec2-base-960h), ONNX export by [Xenova](https://huggingface.co/Xenova).
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e46614273f03ff4b87923a965e417fa72004825522cb007c9c25633b8475490d
3
+ size 377887594
vocab.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "'": 27,
3
+ "</s>": 2,
4
+ "<pad>": 0,
5
+ "<s>": 1,
6
+ "<unk>": 3,
7
+ "A": 7,
8
+ "B": 24,
9
+ "C": 19,
10
+ "D": 14,
11
+ "E": 5,
12
+ "F": 20,
13
+ "G": 21,
14
+ "H": 11,
15
+ "I": 10,
16
+ "J": 29,
17
+ "K": 26,
18
+ "L": 15,
19
+ "M": 17,
20
+ "N": 9,
21
+ "O": 8,
22
+ "P": 23,
23
+ "Q": 30,
24
+ "R": 13,
25
+ "S": 12,
26
+ "T": 6,
27
+ "U": 16,
28
+ "V": 25,
29
+ "W": 18,
30
+ "X": 28,
31
+ "Y": 22,
32
+ "Z": 31,
33
+ "|": 4
34
+ }