Hochien
/

wav2vec2-alignment

+---
+language:
+- en
+- multilingual
+license: apache-2.0
+tags:
+- onnx
+- audio
+- automatic-speech-recognition
+- phoneme-recognition
+- wav2vec2
+base_model: facebook/wav2vec2-lv-60-espeak-cv-ft
+---
+# Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)
+This is an **ONNX export** of the [facebook/wav2vec2-lv-60-espeak-cv-ft](https://huggingface.co/facebook/wav2vec2-lv-60-espeak-cv-ft) model.
+It is designed for client-side inference in the **Music Video Maker** application to perform forced alignment of lyrics to audio.
+## Model Details
+-   **Original Model**: `facebook/wav2vec2-lv-60-espeak-cv-ft`
+-   **Format**: ONNX (Open Neural Network Exchange)
+-   **Precision**: FP16 (Float16)
+-   **Output**: IPA Phoneme logits (392 vocab size)
+-   **Sample Rate**: 16kHz
+## Usage
+This model is intended to be used with the ONNX Runtime (e.g., via `ort` in Rust or `onnxruntime` in Python).
+### Input
+-   **Name**: `audio`
+-   **Shape**: `[batch_size, samples]`
+-   **Type**: Float32 tensor
+### Output
+-   **Name**: `logits`
+-   **Shape**: `[batch_size, frames, 392]` (392 is the vocab size)
+## License
+This model is a derivative of the original `facebook/wav2vec2-lv-60-espeak-cv-ft` model and retains the **Apache 2.0** license.