File size: 4,910 Bytes
7e54438
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fe7b669
7e54438
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: apache-2.0
base_model: openai/whisper-medium.en
pipeline_tag: automatic-speech-recognition
library_name: openasr
tags:
  - automatic-speech-recognition
  - speech-to-text
  - openasr
  - oasr
  - whisper-medium.en
---

<div align="center">

# Whisper Medium (English) Β· OpenASR

**High-accuracy English-only Whisper at 769M parameters**

[![License](https://img.shields.io/badge/license-Apache--2.0-2563eb.svg)](https://huggingface.co/openai/whisper-medium.en/blob/main/README.md)
[![Format](https://img.shields.io/badge/format-.oasr-7c3aed.svg)](https://github.com/QuintinShaw/openasr)
[![Runtime](https://img.shields.io/badge/runtime-OpenASR-111827.svg)](https://openasr.org)
[![Base model](https://img.shields.io/badge/base-whisper--medium.en-f59e0b.svg)](https://huggingface.co/openai/whisper-medium.en)

Native speech-to-text in the **[OpenASR](https://github.com/QuintinShaw/openasr)** runtime β€”
engineered for peak performance on CPU & GPU, **no Python at inference time**.

</div>

---

## ✨ Highlights

- πŸ‡¬πŸ‡§ **English-only** β€” specialized for English, typically more accurate on English than the same-size multilingual model
- 🎯 **769M parameters** β€” near-large English accuracy with a more manageable footprint
- 🌐 **Weak-supervision scale** β€” trained with Whisper's 680k-hour labelled speech corpus
- πŸ¦€ **Native in OpenASR** β€” `.oasr` packs run with no Python at inference, engineered for peak performance on CPU & GPU

## πŸš€ Quickstart

```bash
# 1. Install the OpenASR CLI  Β·  https://openasr.org
# 2. Pull a build (pick a quant β€” see the table below)
openasr pull whisper-medium.en:q8

# 3. Transcribe
openasr transcribe audio.wav --model whisper-medium.en
```

All builds for this model:

```bash
openasr pull whisper-medium.en:fp16
openasr pull whisper-medium.en:q8
openasr pull whisper-medium.en:q4
```

## πŸ“¦ Available builds

| Quant | File (`.oasr`) | Size | RAM peak | RTF Β· M1 CPU | RTF Β· M1 GPU | JFK Ξ”WER vs fp16 |
|:------|:---------------|-----:|---------:|-------------:|-------------:|-----------------:|
| fp16 | `whisper-medium.en-fp16.oasr` | 1.53 GB | 4.42 GB | 0.43Γ— | 0.35Γ— | 0.0% |
| q8_0 | `whisper-medium.en-q8_0.oasr` | 874 MB | 2.17 GB | 0.31Γ— | 0.26Γ— | 0.0% |
| q4_k | `whisper-medium.en-q4_k.oasr` | 522 MB | 1.54 GB | 0.30Γ— | 0.25Γ— | 0.0% |

<sub>RTF = real-time factor on the fixed 11s JFK clip (**lower is faster**); RAM peak measured per pack
in an isolated subprocess. JFK Ξ”WER compares each quantized build's JFK transcript to this model's
fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy.
**q8_0** is the recommended default β€” near-reference quality at a fraction of the
footprint.</sub>

## 🧠 About Whisper Medium (English)

Whisper Medium.en is OpenAI's 769M-parameter English-only Whisper checkpoint. It uses the standard
Whisper encoder-decoder architecture for automatic speech recognition, trained with large-scale
weak supervision on 680k hours of labelled speech. As an English-specialized model it tends to
outperform the same-size multilingual Whisper on English audio, delivering high English accuracy
without the footprint of the largest checkpoints. This OpenASR repo repackages the original
`openai/whisper-medium.en` weights as `.oasr` packs that run natively in the OpenASR runtime with
no Python at inference time. For most users the q8_0 build is the recommended default; q4_k is
for tighter memory budgets and fp16 is for verification or maximum fidelity.

## βš™οΈ How these packs were made

Converted from [openai/whisper-medium.en](https://huggingface.co/openai/whisper-medium.en) with the OpenASR importer:

```bash
openasr model-pack import whisper <src> <out>.oasr \
  --package-id whisper-medium.en --quantization {fp16,q8-0,q4-k}
```

The `.oasr` container is GGUF-backed; packs use zero-copy mmap weight binding and graph
buffer reuse to keep peak memory low.

## βš–οΈ License

These packs **inherit the upstream model's license: Apache-2.0**
([source](https://huggingface.co/openai/whisper-medium.en/blob/main/README.md)). OpenASR packaging retains the upstream copyright and
NOTICE; the only modifications are format conversion and quantization.

## πŸ™ Acknowledgements

This pack is a redistribution of **Whisper Medium.en**, released by **OpenAI**
([openai/whisper-medium.en](https://huggingface.co/openai/whisper-medium.en)).
All credit for the original model, training recipe, and weights belongs to OpenAI. The
upstream Hugging Face model card declares Apache-2.0 licensing; OpenASR only converts the
weights into `.oasr` packages and adds quantized builds for local runtime use.

## πŸ”— Links

- πŸ¦€ **OpenASR** β€” <https://github.com/QuintinShaw/openasr>
- 🌐 **Website** β€” <https://openasr.org>
- πŸ€— **Upstream model** β€” [openai/whisper-medium.en](https://huggingface.co/openai/whisper-medium.en)