wav2vec2-alignment / README.md
Hochien's picture
Upload README.md with huggingface_hub
735df05 verified
metadata
language:
  - en
  - multilingual
license: apache-2.0
tags:
  - onnx
  - audio
  - automatic-speech-recognition
  - phoneme-recognition
  - wav2vec2
base_model: facebook/wav2vec2-lv-60-espeak-cv-ft

Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)

This is an ONNX export of the facebook/wav2vec2-lv-60-espeak-cv-ft model.

It is designed for client-side inference in the UltrClick ContentPro application to perform forced alignment of lyrics to audio.

Model Details

  • Original Model: facebook/wav2vec2-lv-60-espeak-cv-ft
  • Format: ONNX (Open Neural Network Exchange)
  • Precision: FP16 (Float16)
  • Output: IPA Phoneme logits (392 vocab size)
  • Sample Rate: 16kHz

Usage

This model is intended to be used with the ONNX Runtime (e.g., via ort in Rust or onnxruntime in Python).

Input

  • Name: audio
  • Shape: [batch_size, samples]
  • Type: Float32 tensor

Output

  • Name: logits
  • Shape: [batch_size, frames, 392] (392 is the vocab size)

License

This model is a derivative of the original facebook/wav2vec2-lv-60-espeak-cv-ft model and retains the Apache 2.0 license.