OmniASR v2 (300M) - Optimized 4-bit ONNX

This repository contains the first standalone, 4-bit quantized version of Meta's OmniASR v2, specifically optimized for local mobile inference.

Key Improvements by Edison dos Santos:

  • Zero Dependencies: Unlike other distributions, this model does NOT require specialized ASR libraries. It runs on pure onnxruntime.
  • Mobile Optimized: The 4-bit quantization targets the MatMul and Transformer layers with a block size of 32, tailored for ARM-based chipsets (tested on Dimensity 6300).

Usage

Run inference_example.py (make sure to have an audio.wav file). Check the Ghost Assistant's Technical Report for more info and benchmarks.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Edison2ST/omniASR_CTC_300M_v2_Q4_ONNX

Quantized
(1)
this model