Hochien
/

wav2vec2-alignment

Automatic Speech Recognition

phoneme-recognition

Model card Files Files and versions

wav2vec2-alignment / README.md

Hochien's picture

Upload README.md with huggingface_hub

735df05 verified 13 days ago

|

history blame contribute delete

1.22 kB

	---
	language:
	- en
	- multilingual
	license: apache-2.0
	tags:
	- onnx
	- audio
	- automatic-speech-recognition
	- phoneme-recognition
	- wav2vec2
	base_model: facebook/wav2vec2-lv-60-espeak-cv-ft
	---

	# Wav2Vec2-LV-60-Espeak-CV-FT (ONNX)

	This is an ONNX export of the [facebook/wav2vec2-lv-60-espeak-cv-ft](https://huggingface.co/facebook/wav2vec2-lv-60-espeak-cv-ft) model.

	It is designed for client-side inference in the UltrClick ContentPro application to perform forced alignment of lyrics to audio.

	## Model Details

	- Original Model: `facebook/wav2vec2-lv-60-espeak-cv-ft`
	- Format: ONNX (Open Neural Network Exchange)
	- Precision: FP16 (Float16)
	- Output: IPA Phoneme logits (392 vocab size)
	- Sample Rate: 16kHz

	## Usage

	This model is intended to be used with the ONNX Runtime (e.g., via `ort` in Rust or `onnxruntime` in Python).

	### Input
	- Name: `audio`
	- Shape: `[batch_size, samples]`
	- Type: Float32 tensor

	### Output
	- Name: `logits`
	- Shape: `[batch_size, frames, 392]` (392 is the vocab size)

	## License

	This model is a derivative of the original `facebook/wav2vec2-lv-60-espeak-cv-ft` model and retains the Apache 2.0 license.