Reza2kn's picture
Fix FA32M blank transcripts with NeMo fbank and length-aware ONNX
e8780f6 verified
|
Raw
History Blame Contribute Delete
1.04 kB
---
title: VisualEars FA32M FastConformer FP16 WebGPU
emoji: 🎙️
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: FA32M Persian ASR FP16 WebGPU demo
---
# VisualEars FA32M FastConformer Persian ASR WebGPU Demo
Browser WebGPU demo for [`Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024-onnx-fp16`](https://huggingface.co/Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024-onnx-fp16).
This revision uses the corrected FA32M contract: NeMo-compatible fbank features (`normalize=NA`, preemphasis, centered STFT, Slaney mel filters) plus the `processed_signal_length` ONNX input. That fixes the prior symptom where the app detected speech/finished utterances but decoded mostly empty transcripts.
Parity gate before switch: **267/269 exact transcript matches (99.26%)** vs the source PyTorch NeMo preprocessor + encoder + auxiliary CTC path; ONNX non-empty transcript rate **266/269 (98.88%)** on the short/noisy VisualEars269 set.