File size: 1,218 Bytes
de27887
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
735df05
de27887
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
language:
- en
- multilingual
license: apache-2.0
tags:
- onnx
- audio
- automatic-speech-recognition
- phoneme-recognition
- wav2vec2
base_model: facebook/wav2vec2-lv-60-espeak-cv-ft
---

# Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)

This is an **ONNX export** of the [facebook/wav2vec2-lv-60-espeak-cv-ft](https://huggingface.co/facebook/wav2vec2-lv-60-espeak-cv-ft) model.

It is designed for client-side inference in the **UltrClick ContentPro** application to perform forced alignment of lyrics to audio.

## Model Details

-   **Original Model**: `facebook/wav2vec2-lv-60-espeak-cv-ft`
-   **Format**: ONNX (Open Neural Network Exchange)
-   **Precision**: FP16 (Float16)
-   **Output**: IPA Phoneme logits (392 vocab size)
-   **Sample Rate**: 16kHz

## Usage

This model is intended to be used with the ONNX Runtime (e.g., via `ort` in Rust or `onnxruntime` in Python).

### Input
-   **Name**: `audio`
-   **Shape**: `[batch_size, samples]`
-   **Type**: Float32 tensor

### Output
-   **Name**: `logits`
-   **Shape**: `[batch_size, frames, 392]` (392 is the vocab size)

## License

This model is a derivative of the original `facebook/wav2vec2-lv-60-espeak-cv-ft` model and retains the **Apache 2.0** license.