inference4j
/

wav2vec2-base-960h

Automatic Speech Recognition

Model card Files Files and versions

wav2vec2-base-960h / README.md

vccarvalho11's picture

Upload wav2vec2-base-960h ONNX model

a4f3a03 verified 11 days ago

|

history blame contribute delete

1.58 kB

	---
	library_name: onnx
	tags:
	- wav2vec2
	- speech-to-text
	- automatic-speech-recognition
	- ctc
	- audio
	- onnx
	- inference4j
	license: mit
	pipeline_tag: automatic-speech-recognition
	---

	# Wav2Vec2 Base 960h — ONNX

	ONNX export of [wav2vec2-base-960h](https://huggingface.co/Xenova/wav2vec2-base-960h), a Wav2Vec2 model fine-tuned on 960 hours of LibriSpeech for automatic speech recognition using CTC decoding.

	Mirrored for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java.

	## Original Source

	- Repository: [Xenova (originally facebook/wav2vec2-base-960h)](https://huggingface.co/Xenova/wav2vec2-base-960h)
	- License: mit

	## Usage with inference4j

	```java
	try (Wav2Vec2 model = Wav2Vec2.fromPretrained("models/wav2vec2-base-960h")) {
	Transcription result = model.transcribe(Path.of("audio.wav"));
	System.out.println(result.text());
	}
	```

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| Wav2Vec2 Base (12 transformer layers) \|
	\| Task \| Automatic speech recognition (CTC decoding) \|
	\| Training data \| LibriSpeech 960h \|
	\| Input \| 16kHz mono audio (float32 waveform) \|
	\| Output \| CTC logits → greedy-decoded text \|
	\| Original framework \| PyTorch (HuggingFace Transformers) \|
	\| ONNX export \| By Xenova (Transformers.js) \|

	## License

	This model is licensed under the [MIT License](https://opensource.org/licenses/MIT). Original model by [Facebook AI](https://huggingface.co/facebook/wav2vec2-base-960h), ONNX export by [Xenova](https://huggingface.co/Xenova).