Clean up debug diagnostics now that WebGPU works end-to-end 2cf4acc verified Reza2kn commited on 4 days ago
Use INT4 encoder (MatMulNBits — WebGPU-supported); ORT 1.23 keeps webgpu fp16 fix for decoders b8c2d24 verified Reza2kn commited on 4 days ago
Per-step diagnostics: pinpoint which ORT call crashes 7be4ffd verified Reza2kn commited on 4 days ago
Upgrade onnxruntime-web 1.20 → 1.23 (webgpu bundle); revert WASM forcing — 1.23 fixes fp16 transformer NaNs e6b3306 verified Reza2kn commited on 4 days ago
Force decoders to WASM (WebGPU fp16 in ort-web 1.20 returns all-NaN on this transformer) d62ebc2 verified Reza2kn commited on 4 days ago
Diagnostics: logit type/len/health/first-8/top-5 with Number() coercion 29d0bb3 verified Reza2kn commited on 4 days ago
Await tensor.getData() for WebGPU outputs (audio_embeds + logits) so data is actually copied back to CPU 1320aec verified Reza2kn commited on 4 days ago
fp16: use canonical u16 bit-pattern viewed as Float16Array; diagnostic top-5 dump ed37d13 verified Reza2kn commited on 4 days ago
Encoder: prefer static INT8 (QLinearConv); INT4 fallback. Recovers 92.7%-class quality in browser 9e54b79 verified Reza2kn commited on 4 days ago
Float16Array for fp16 tensors; per-session WebGPU fallback (decoders stay on webgpu) af482b4 verified Reza2kn commited on 4 days ago
Encoder: try INT8 first, auto-fallback to INT4 if ConvInteger unsupported in browser a821180 verified Reza2kn commited on 4 days ago
Use ort.webgpu bundle + auto-fallback to wasm if webgpu init fails 96f08e1 verified Reza2kn commited on 4 days ago
Use INT8 encoder + INT4 decoder (91.9% accuracy); force-English prompt default 61dfe9b verified Reza2kn commited on 4 days ago
Switch Space to static SDK: pure browser inference via onnxruntime-web a4c397e verified Reza2kn commited on 4 days ago
Switch backend to INT4 ONNX models from Reza2kn/mega-asr-onnx 888006f verified Reza2kn commited on 4 days ago