Spaces:

Reza2kn
/

visualears-fastconformer-fa-webgpu

Running

Fix FA32M blank transcripts with NeMo fbank and length-aware ONNX

e8780f6 verified 21 days ago

1.04 kB

	---
	title: VisualEars FA32M FastConformer FP16 WebGPU
	emoji: 🎙️
	colorFrom: green
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.35.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: FA32M Persian ASR FP16 WebGPU demo
	---

	# VisualEars FA32M FastConformer Persian ASR WebGPU Demo

	Browser WebGPU demo for [`Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024-onnx-fp16`](https://huggingface.co/Reza2kn/visualears-fastconformer-fa32m-streaming-bpe1024-onnx-fp16).

	This revision uses the corrected FA32M contract: NeMo-compatible fbank features (`normalize=NA`, preemphasis, centered STFT, Slaney mel filters) plus the `processed_signal_length` ONNX input. That fixes the prior symptom where the app detected speech/finished utterances but decoded mostly empty transcripts.

	Parity gate before switch: 267/269 exact transcript matches (99.26%) vs the source PyTorch NeMo preprocessor + encoder + auxiliary CTC path; ONNX non-empty transcript rate 266/269 (98.88%) on the short/noisy VisualEars269 set.