| --- |
| license: apache-2.0 |
| base_model: openai/whisper-medium |
| pipeline_tag: automatic-speech-recognition |
| library_name: openasr |
| tags: |
| - automatic-speech-recognition |
| - speech-to-text |
| - openasr |
| - oasr |
| - whisper-medium |
| --- |
| |
| <div align="center"> |
|
|
| # Whisper Medium Β· OpenASR |
|
|
| **High-accuracy multilingual Whisper at 769M parameters** |
|
|
| [](https://huggingface.co/openai/whisper-medium/blob/main/README.md) |
| [](https://github.com/QuintinShaw/openasr) |
| [](https://openasr.org) |
| [](https://huggingface.co/openai/whisper-medium) |
|
|
| Native speech-to-text in the **[OpenASR](https://github.com/QuintinShaw/openasr)** runtime β |
| engineered for peak performance on CPU & GPU, **no Python at inference time**. |
|
|
| </div> |
|
|
| --- |
|
|
| ## β¨ Highlights |
|
|
| - π§ **Multilingual ASR** β transcribes many languages and can translate speech to English |
| - π― **769M parameters** β near-large accuracy with a more manageable footprint |
| - π **Weak-supervision scale** β trained with Whisper's 680k-hour labelled speech corpus |
| - π¦ **Native in OpenASR** β `.oasr` packs run with no Python at inference, engineered for peak performance on CPU & GPU |
|
|
| ## π Quickstart |
|
|
| ```bash |
| # 1. Install the OpenASR CLI Β· https://openasr.org |
| # 2. Pull a build (pick a quant β see the table below) |
| openasr pull whisper-medium:q8 |
| |
| # 3. Transcribe |
| openasr transcribe audio.wav --model whisper-medium |
| ``` |
|
|
| All builds for this model: |
|
|
| ```bash |
| openasr pull whisper-medium:fp16 |
| openasr pull whisper-medium:q8 |
| openasr pull whisper-medium:q4 |
| ``` |
|
|
| ## π¦ Available builds |
|
|
| | Quant | File (`.oasr`) | Size | RAM peak | RTF Β· M1 CPU | RTF Β· M1 GPU | JFK ΞWER vs fp16 | |
| |:------|:---------------|-----:|---------:|-------------:|-------------:|-----------------:| |
| | fp16 | `whisper-medium-fp16.oasr` | 1.53 GB | 4.03 GB | 0.62Γ | 0.61Γ | 0.0% | |
| | q8_0 | `whisper-medium-q8_0.oasr` | 874 MB | 2.17 GB | 0.46Γ | 0.41Γ | 0.0% | |
| | q4_k | `whisper-medium-q4_k.oasr` | 522 MB | 1.54 GB | 0.51Γ | 0.39Γ | 0.0% | |
|
|
| <sub>RTF = real-time factor on the fixed 11s JFK clip (**lower is faster**); RAM peak measured per pack |
| in an isolated subprocess. JFK ΞWER compares each quantized build's JFK transcript to this model's |
| fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy. |
| **q8_0** is the recommended default β near-reference quality at a fraction of the |
| footprint.</sub> |
| |
| ## π§ About Whisper Medium |
| |
| Whisper Medium is OpenAI's 769M-parameter multilingual Whisper checkpoint. It uses the standard |
| Whisper encoder-decoder architecture for automatic speech recognition and speech translation, |
| trained with large-scale weak supervision on 680k hours of labelled speech. Medium delivers |
| much of the large model's accuracy at a smaller footprint, a strong choice when quality matters |
| but the largest checkpoint is too heavy. This OpenASR repo repackages the original |
| `openai/whisper-medium` weights as `.oasr` packs that run natively in the OpenASR runtime with |
| no Python at inference time. For most users the q8_0 build is the recommended default; q4_k is |
| for tighter memory budgets and fp16 is for verification or maximum fidelity. |
| |
| ## βοΈ How these packs were made |
| |
| Converted from [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) with the OpenASR importer: |
| |
| ```bash |
| openasr model-pack import-whisper-local <src> <out>.oasr \ |
| --package-id whisper-medium --quantization {fp16,q8-0,q4-k} |
| ``` |
| |
| The `.oasr` container is GGUF-backed; packs use zero-copy mmap weight binding and graph |
| buffer reuse to keep peak memory low. |
| |
| ## βοΈ License |
| |
| These packs **inherit the upstream model's license: Apache-2.0** |
| ([source](https://huggingface.co/openai/whisper-medium/blob/main/README.md)). OpenASR packaging retains the upstream copyright and |
| NOTICE; the only modifications are format conversion and quantization. |
| |
| ## π Acknowledgements |
| |
| This pack is a redistribution of **Whisper Medium**, released by **OpenAI** |
| ([openai/whisper-medium](https://huggingface.co/openai/whisper-medium)). |
| All credit for the original model, training recipe, and weights belongs to OpenAI. The |
| upstream Hugging Face model card declares Apache-2.0 licensing; OpenASR only converts the |
| weights into `.oasr` packages and adds quantized builds for local runtime use. |
| |
| ## π Links |
| |
| - π¦ **OpenASR** β <https://github.com/QuintinShaw/openasr> |
| - π **Website** β <https://openasr.org> |
| - π€ **Upstream model** β [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) |
| |