Hochien commited on
Commit
de27887
·
verified ·
1 Parent(s): 3393b85

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +44 -3
README.md CHANGED
@@ -1,3 +1,44 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - multilingual
5
+ license: apache-2.0
6
+ tags:
7
+ - onnx
8
+ - audio
9
+ - automatic-speech-recognition
10
+ - phoneme-recognition
11
+ - wav2vec2
12
+ base_model: facebook/wav2vec2-lv-60-espeak-cv-ft
13
+ ---
14
+
15
+ # Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)
16
+
17
+ This is an **ONNX export** of the [facebook/wav2vec2-lv-60-espeak-cv-ft](https://huggingface.co/facebook/wav2vec2-lv-60-espeak-cv-ft) model.
18
+
19
+ It is designed for client-side inference in the **Music Video Maker** application to perform forced alignment of lyrics to audio.
20
+
21
+ ## Model Details
22
+
23
+ - **Original Model**: `facebook/wav2vec2-lv-60-espeak-cv-ft`
24
+ - **Format**: ONNX (Open Neural Network Exchange)
25
+ - **Precision**: FP16 (Float16)
26
+ - **Output**: IPA Phoneme logits (392 vocab size)
27
+ - **Sample Rate**: 16kHz
28
+
29
+ ## Usage
30
+
31
+ This model is intended to be used with the ONNX Runtime (e.g., via `ort` in Rust or `onnxruntime` in Python).
32
+
33
+ ### Input
34
+ - **Name**: `audio`
35
+ - **Shape**: `[batch_size, samples]`
36
+ - **Type**: Float32 tensor
37
+
38
+ ### Output
39
+ - **Name**: `logits`
40
+ - **Shape**: `[batch_size, frames, 392]` (392 is the vocab size)
41
+
42
+ ## License
43
+
44
+ This model is a derivative of the original `facebook/wav2vec2-lv-60-espeak-cv-ft` model and retains the **Apache 2.0** license.