Commit Β·
97d3c20
verified Β·
0
Parent(s):
publish whisper-medium OpenASR packs
Browse files- .gitattributes +1 -0
- README.md +113 -0
- whisper-medium-fp16.oasr +3 -0
- whisper-medium-q4_k.oasr +3 -0
- whisper-medium-q8_0.oasr +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
*.oasr filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model: openai/whisper-medium
|
| 4 |
+
pipeline_tag: automatic-speech-recognition
|
| 5 |
+
library_name: openasr
|
| 6 |
+
tags:
|
| 7 |
+
- automatic-speech-recognition
|
| 8 |
+
- speech-to-text
|
| 9 |
+
- openasr
|
| 10 |
+
- oasr
|
| 11 |
+
- whisper-medium
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
<div align="center">
|
| 15 |
+
|
| 16 |
+
# Whisper Medium Β· OpenASR
|
| 17 |
+
|
| 18 |
+
**High-accuracy multilingual Whisper at 769M parameters**
|
| 19 |
+
|
| 20 |
+
[](https://huggingface.co/openai/whisper-medium/blob/main/README.md)
|
| 21 |
+
[](https://github.com/QuintinShaw/openasr)
|
| 22 |
+
[](https://openasr.org)
|
| 23 |
+
[](https://huggingface.co/openai/whisper-medium)
|
| 24 |
+
|
| 25 |
+
Native speech-to-text in the **[OpenASR](https://github.com/QuintinShaw/openasr)** runtime β
|
| 26 |
+
engineered for peak performance on CPU & GPU, **no Python at inference time**.
|
| 27 |
+
|
| 28 |
+
</div>
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## β¨ Highlights
|
| 33 |
+
|
| 34 |
+
- π§ **Multilingual ASR** β transcribes many languages and can translate speech to English
|
| 35 |
+
- π― **769M parameters** β near-large accuracy with a more manageable footprint
|
| 36 |
+
- π **Weak-supervision scale** β trained with Whisper's 680k-hour labelled speech corpus
|
| 37 |
+
- π¦ **Native in OpenASR** β `.oasr` packs run with no Python at inference, engineered for peak performance on CPU & GPU
|
| 38 |
+
|
| 39 |
+
## π Quickstart
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
# 1. Install the OpenASR CLI Β· https://openasr.org
|
| 43 |
+
# 2. Pull a build (pick a quant β see the table below)
|
| 44 |
+
openasr pull whisper-medium:q8
|
| 45 |
+
|
| 46 |
+
# 3. Transcribe
|
| 47 |
+
openasr transcribe audio.wav --model whisper-medium
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
All builds for this model:
|
| 51 |
+
|
| 52 |
+
```bash
|
| 53 |
+
openasr pull whisper-medium:fp16
|
| 54 |
+
openasr pull whisper-medium:q8
|
| 55 |
+
openasr pull whisper-medium:q4
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
## π¦ Available builds
|
| 59 |
+
|
| 60 |
+
| Quant | File (`.oasr`) | Size | RAM peak | RTF Β· M1 CPU | RTF Β· M1 GPU | JFK ΞWER vs fp16 |
|
| 61 |
+
|:------|:---------------|-----:|---------:|-------------:|-------------:|-----------------:|
|
| 62 |
+
| fp16 | `whisper-medium-fp16.oasr` | 1.53 GB | 4.03 GB | 0.62Γ | 0.61Γ | 0.0% |
|
| 63 |
+
| q8_0 | `whisper-medium-q8_0.oasr` | 874 MB | 2.17 GB | 0.46Γ | 0.41Γ | 0.0% |
|
| 64 |
+
| q4_k | `whisper-medium-q4_k.oasr` | 522 MB | 1.54 GB | 0.51Γ | 0.39Γ | 0.0% |
|
| 65 |
+
|
| 66 |
+
<sub>RTF = real-time factor on the fixed 11s JFK clip (**lower is faster**); RAM peak measured per pack
|
| 67 |
+
in an isolated subprocess. JFK ΞWER compares each quantized build's JFK transcript to this model's
|
| 68 |
+
fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy.
|
| 69 |
+
**q8_0** is the recommended default β near-reference quality at a fraction of the
|
| 70 |
+
footprint.</sub>
|
| 71 |
+
|
| 72 |
+
## π§ About Whisper Medium
|
| 73 |
+
|
| 74 |
+
Whisper Medium is OpenAI's 769M-parameter multilingual Whisper checkpoint. It uses the standard
|
| 75 |
+
Whisper encoder-decoder architecture for automatic speech recognition and speech translation,
|
| 76 |
+
trained with large-scale weak supervision on 680k hours of labelled speech. Medium delivers
|
| 77 |
+
much of the large model's accuracy at a smaller footprint, a strong choice when quality matters
|
| 78 |
+
but the largest checkpoint is too heavy. This OpenASR repo repackages the original
|
| 79 |
+
`openai/whisper-medium` weights as `.oasr` packs that run natively in the OpenASR runtime with
|
| 80 |
+
no Python at inference time. For most users the q8_0 build is the recommended default; q4_k is
|
| 81 |
+
for tighter memory budgets and fp16 is for verification or maximum fidelity.
|
| 82 |
+
|
| 83 |
+
## βοΈ How these packs were made
|
| 84 |
+
|
| 85 |
+
Converted from [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) with the OpenASR importer:
|
| 86 |
+
|
| 87 |
+
```bash
|
| 88 |
+
openasr model-pack import-whisper-local <src> <out>.oasr \
|
| 89 |
+
--package-id whisper-medium --quantization {fp16,q8-0,q4-k}
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
The `.oasr` container is GGUF-backed; packs use zero-copy mmap weight binding and graph
|
| 93 |
+
buffer reuse to keep peak memory low.
|
| 94 |
+
|
| 95 |
+
## βοΈ License
|
| 96 |
+
|
| 97 |
+
These packs **inherit the upstream model's license: Apache-2.0**
|
| 98 |
+
([source](https://huggingface.co/openai/whisper-medium/blob/main/README.md)). OpenASR packaging retains the upstream copyright and
|
| 99 |
+
NOTICE; the only modifications are format conversion and quantization.
|
| 100 |
+
|
| 101 |
+
## π Acknowledgements
|
| 102 |
+
|
| 103 |
+
This pack is a redistribution of **Whisper Medium**, released by **OpenAI**
|
| 104 |
+
([openai/whisper-medium](https://huggingface.co/openai/whisper-medium)).
|
| 105 |
+
All credit for the original model, training recipe, and weights belongs to OpenAI. The
|
| 106 |
+
upstream Hugging Face model card declares Apache-2.0 licensing; OpenASR only converts the
|
| 107 |
+
weights into `.oasr` packages and adds quantized builds for local runtime use.
|
| 108 |
+
|
| 109 |
+
## π Links
|
| 110 |
+
|
| 111 |
+
- π¦ **OpenASR** β <https://github.com/QuintinShaw/openasr>
|
| 112 |
+
- π **Website** β <https://openasr.org>
|
| 113 |
+
- π€ **Upstream model** β [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)
|
whisper-medium-fp16.oasr
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:08a80860d71f72728ad9676f3f3e7ef45d460c9d38a4eb11199607ed374da200
|
| 3 |
+
size 1534887520
|
whisper-medium-q4_k.oasr
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:180a31a5e134b7d7e1cfc21c8859000f644f14e8b41d52142ff757fe72e63390
|
| 3 |
+
size 521963104
|
whisper-medium-q8_0.oasr
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5e663d322bcaa5743c3e4b3dac680f0b6c79f87edb9d7f1b9147a09329278c37
|
| 3 |
+
size 874284640
|