Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)
This is an ONNX export of the facebook/wav2vec2-lv-60-espeak-cv-ft model.
It is designed for client-side inference in the UltrClick ContentPro application to perform forced alignment of lyrics to audio.
Model Details
- Original Model:
facebook/wav2vec2-lv-60-espeak-cv-ft - Format: ONNX (Open Neural Network Exchange)
- Precision: FP16 (Float16)
- Output: IPA Phoneme logits (392 vocab size)
- Sample Rate: 16kHz
Usage
This model is intended to be used with the ONNX Runtime (e.g., via ort in Rust or onnxruntime in Python).
Input
- Name:
audio - Shape:
[batch_size, samples] - Type: Float32 tensor
Output
- Name:
logits - Shape:
[batch_size, frames, 392](392 is the vocab size)
License
This model is a derivative of the original facebook/wav2vec2-lv-60-espeak-cv-ft model and retains the Apache 2.0 license.
Model tree for Hochien/wav2vec2-alignment
Base model
facebook/wav2vec2-lv-60-espeak-cv-ft