Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)

This is an ONNX export of the facebook/wav2vec2-lv-60-espeak-cv-ft model.

It is designed for client-side inference in the UltrClick ContentPro application to perform forced alignment of lyrics to audio.

Model Details

  • Original Model: facebook/wav2vec2-lv-60-espeak-cv-ft
  • Format: ONNX (Open Neural Network Exchange)
  • Precision: FP16 (Float16)
  • Output: IPA Phoneme logits (392 vocab size)
  • Sample Rate: 16kHz

Usage

This model is intended to be used with the ONNX Runtime (e.g., via ort in Rust or onnxruntime in Python).

Input

  • Name: audio
  • Shape: [batch_size, samples]
  • Type: Float32 tensor

Output

  • Name: logits
  • Shape: [batch_size, frames, 392] (392 is the vocab size)

License

This model is a derivative of the original facebook/wav2vec2-lv-60-espeak-cv-ft model and retains the Apache 2.0 license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hochien/wav2vec2-alignment

Quantized
(2)
this model