wav2vec2-alignment / README.md

Hochien

Upload README.md with huggingface_hub

735df05 verified 13 days ago

preview code

raw

history blame contribute delete

1.22 kB

metadata

language:
  - en
  - multilingual
license: apache-2.0
tags:
  - onnx
  - audio
  - automatic-speech-recognition
  - phoneme-recognition
  - wav2vec2
base_model: facebook/wav2vec2-lv-60-espeak-cv-ft

Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)

This is an ONNX export of the facebook/wav2vec2-lv-60-espeak-cv-ft model.

It is designed for client-side inference in the UltrClick ContentPro application to perform forced alignment of lyrics to audio.

Model Details

Original Model: facebook/wav2vec2-lv-60-espeak-cv-ft
Format: ONNX (Open Neural Network Exchange)
Precision: FP16 (Float16)
Output: IPA Phoneme logits (392 vocab size)
Sample Rate: 16kHz

Usage

This model is intended to be used with the ONNX Runtime (e.g., via ort in Rust or onnxruntime in Python).

Input

Name: audio
Shape: [batch_size, samples]
Type: Float32 tensor

Output

Name: logits
Shape: [batch_size, frames, 392] (392 is the vocab size)

License

This model is a derivative of the original facebook/wav2vec2-lv-60-espeak-cv-ft model and retains the Apache 2.0 license.