wav2vec2-alignment / README.md
Hochien's picture
Upload README.md with huggingface_hub
735df05 verified
---
language:
- en
- multilingual
license: apache-2.0
tags:
- onnx
- audio
- automatic-speech-recognition
- phoneme-recognition
- wav2vec2
base_model: facebook/wav2vec2-lv-60-espeak-cv-ft
---
# Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)
This is an **ONNX export** of the [facebook/wav2vec2-lv-60-espeak-cv-ft](https://huggingface.co/facebook/wav2vec2-lv-60-espeak-cv-ft) model.
It is designed for client-side inference in the **UltrClick ContentPro** application to perform forced alignment of lyrics to audio.
## Model Details
- **Original Model**: `facebook/wav2vec2-lv-60-espeak-cv-ft`
- **Format**: ONNX (Open Neural Network Exchange)
- **Precision**: FP16 (Float16)
- **Output**: IPA Phoneme logits (392 vocab size)
- **Sample Rate**: 16kHz
## Usage
This model is intended to be used with the ONNX Runtime (e.g., via `ort` in Rust or `onnxruntime` in Python).
### Input
- **Name**: `audio`
- **Shape**: `[batch_size, samples]`
- **Type**: Float32 tensor
### Output
- **Name**: `logits`
- **Shape**: `[batch_size, frames, 392]` (392 is the vocab size)
## License
This model is a derivative of the original `facebook/wav2vec2-lv-60-espeak-cv-ft` model and retains the **Apache 2.0** license.