openslr/librispeech_asr
Viewer • Updated • 585k • 109k • 226
ONNX export of facebook/wav2vec2-base-960h for asr.js.
[1, samples] float32[1, frames, 32] float32model.onnx.data)import { loadSpeechModel } from '@asrjs/speech-recognition';
const model = await loadSpeechModel('facebook/wav2vec2-base-960h', {
source: {
kind: 'huggingface',
repoId: 'ysdede/wav2vec2-base-960h-onnx',
modelFilename: 'model.onnx',
modelDataFilename: 'model.onnx.data',
tokenizerFilename: 'vocab.json',
},
});
// ASR
const transcript = await model.transcribe(audio);
// Forced alignment (WhisperX-style)
const logits = await model.session.executor.extractLogits(audio);
const aligner = createWav2Vec2AlignerFromLogits(logits);
const alignment = aligner.align({ transcript: 'your transcript' });
model.onnx — ONNX model graph (1.8 MB)model.onnx.data — External weight data (360 MB)config.json — Original HuggingFace model configvocab.json — Character vocabulary with CTC blank token