--- language: - en - multilingual license: apache-2.0 tags: - onnx - audio - automatic-speech-recognition - phoneme-recognition - wav2vec2 base_model: facebook/wav2vec2-lv-60-espeak-cv-ft --- # Wav2Vec2-LV-60-Espeak-CV-FT (ONNX) This is an **ONNX export** of the [facebook/wav2vec2-lv-60-espeak-cv-ft](https://huggingface.co/facebook/wav2vec2-lv-60-espeak-cv-ft) model. It is designed for client-side inference in the **UltrClick ContentPro** application to perform forced alignment of lyrics to audio. ## Model Details - **Original Model**: `facebook/wav2vec2-lv-60-espeak-cv-ft` - **Format**: ONNX (Open Neural Network Exchange) - **Precision**: FP16 (Float16) - **Output**: IPA Phoneme logits (392 vocab size) - **Sample Rate**: 16kHz ## Usage This model is intended to be used with the ONNX Runtime (e.g., via `ort` in Rust or `onnxruntime` in Python). ### Input - **Name**: `audio` - **Shape**: `[batch_size, samples]` - **Type**: Float32 tensor ### Output - **Name**: `logits` - **Shape**: `[batch_size, frames, 392]` (392 is the vocab size) ## License This model is a derivative of the original `facebook/wav2vec2-lv-60-espeak-cv-ft` model and retains the **Apache 2.0** license.