File size: 1,218 Bytes

---
language:
- en
- multilingual
license: apache-2.0
tags:
- onnx
- audio
- automatic-speech-recognition
- phoneme-recognition
- wav2vec2
base_model: facebook/wav2vec2-lv-60-espeak-cv-ft
---

# Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)

This is an **ONNX export** of the [facebook/wav2vec2-lv-60-espeak-cv-ft](https://huggingface.co/facebook/wav2vec2-lv-60-espeak-cv-ft) model.

It is designed for client-side inference in the **UltrClick ContentPro** application to perform forced alignment of lyrics to audio.

## Model Details

-   **Original Model**: `facebook/wav2vec2-lv-60-espeak-cv-ft`
-   **Format**: ONNX (Open Neural Network Exchange)
-   **Precision**: FP16 (Float16)
-   **Output**: IPA Phoneme logits (392 vocab size)
-   **Sample Rate**: 16kHz

## Usage

This model is intended to be used with the ONNX Runtime (e.g., via `ort` in Rust or `onnxruntime` in Python).

### Input
-   **Name**: `audio`
-   **Shape**: `[batch_size, samples]`
-   **Type**: Float32 tensor

### Output
-   **Name**: `logits`
-   **Shape**: `[batch_size, frames, 392]` (392 is the vocab size)

## License

This model is a derivative of the original `facebook/wav2vec2-lv-60-espeak-cv-ft` model and retains the **Apache 2.0** license.