SimFonX
/

whisper-onnx-optimized

Automatic Speech Recognition

Model card Files Files and versions

xet

Community

SimFonX commited on May 22, 2025

Commit

140a297

verified ·

1 Parent(s): c6fcff6

Update README.md

Browse files

Files changed (1) hide show

README.md +87 -3

README.md CHANGED Viewed

@@ -1,3 +1,87 @@
----
-license: mit
----

+# Whisper ONNX Optimized Models
+Optimized Whisper ONNX models packaged for easy deployment. Each zip contains all necessary files for inference.
+## Models Available
+| Model | Language | Size | Target Use | Download |
+|-------|----------|------|------------|----------|
+| **Medium English** | English-only | ~486MB | High quality English transcription | [whisper-medium-en-onnx.zip](medium-en/whisper-medium-en-onnx.zip) |
+| **Small English** | English-only | ~85MB | Fast English transcription | [whisper-small-en-onnx.zip](small-en/whisper-small-en-onnx.zip) |
+| **Small Multilingual** | 99 languages | ~110MB | Fast multilingual transcription | [whisper-small-multilingual-onnx.zip](small-multilingual/whisper-small-multilingual-onnx.zip) |
+| **Medium Multilingual** | 99 languages | ~295MB | High quality multilingual | [whisper-medium-multilingual-onnx.zip](medium-multilingual/whisper-medium-multilingual-onnx.zip) |
+| **Large v3 Turbo** | 99 languages | ~530MB | Best quality, fastest large model | [whisper-large-v3-turbo-onnx.zip](large-v3-turbo/whisper-large-v3-turbo-onnx.zip) |
+## Size Comparison vs GGML Q5_0
+All models are **smaller** than equivalent GGML Q5_0 models:
+- Medium English: 486MB vs 515MB GGML ✅ (-29MB)
+- Small models: ~85-110MB vs 182MB GGML ✅ (-70-97MB)
+- Large v3 Turbo: 530MB vs 574MB GGML ✅ (-44MB)
+## Contents of Each Zip
+Each zip file contains 7 files needed for inference:
+### ONNX Model Files
+- `encoder_model_quantized.onnx` - Audio encoder (processes mel spectrograms)
+- `decoder_model_merged_quantized.onnx` - Text decoder (generates transcription)
+- `decoder_with_past_model_quantized.onnx` - Optimized decoder with KV caching
+### Configuration Files
+- `config.json` - Model configuration
+- `generation_config.json` - Generation parameters
+- `preprocessor_config.json` - Audio preprocessing settings
+- `tokenizer.json` - Tokenizer vocabulary
+## Usage
+### C# with ONNX Runtime
+```csharp
+// Download and extract zip
+var modelPath = "path/to/extracted/model/";
+// Initialize with DirectML support
+var sessionOptions = new SessionOptions();
+sessionOptions.AppendExecutionProvider_DML(0);
+var encoderSession = new InferenceSession(
+    Path.Combine(modelPath, "encoder_model_quantized.onnx"), sessionOptions);
+var decoderSession = new InferenceSession(
+    Path.Combine(modelPath, "decoder_with_past_model_quantized.onnx"), sessionOptions);
+```
+### Python with ONNX Runtime
+```python
+import onnxruntime as ort
+# Load with DirectML/CUDA support
+providers = ['DmlExecutionProvider', 'CPUExecutionProvider']
+encoder_session = ort.InferenceSession('encoder_model_quantized.onnx', providers=providers)
+decoder_session = ort.InferenceSession('decoder_with_past_model_quantized.onnx', providers=providers)
+```
+## Features
+✅ **DirectML Support** - Works with any DirectX 12 GPU (AMD, Intel, NVIDIA)
+✅ **CUDA Support** - Accelerated inference on NVIDIA GPUs
+✅ **CPU Fallback** - Automatic fallback to CPU if GPU unavailable
+✅ **Quantized** - INT8/INT4 quantization for smaller size and faster inference
+✅ **Complete** - All files needed for inference included
+## Model Sources
+These models are repackaged from:
+- [Distil-Whisper](https://huggingface.co/distil-whisper) (English models)
+- [ONNX Community](https://huggingface.co/onnx-community) (Multilingual models)
+## License
+Models inherit their original licenses:
+- Distil-Whisper models: MIT License
+- Whisper models: MIT License
+## Version History
+- **v1.0.0** - Initial release with 5 optimized models