Commit Β·
cbe07da
verified Β·
0
Parent(s):
publish whisper-large-v3 OpenASR packs
Browse files- .gitattributes +1 -0
- README.md +114 -0
- whisper-large-v3-fp16.oasr +3 -0
- whisper-large-v3-q4_k.oasr +3 -0
- whisper-large-v3-q8_0.oasr +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
*.oasr filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model: openai/whisper-large-v3
|
| 4 |
+
pipeline_tag: automatic-speech-recognition
|
| 5 |
+
library_name: openasr
|
| 6 |
+
tags:
|
| 7 |
+
- automatic-speech-recognition
|
| 8 |
+
- speech-to-text
|
| 9 |
+
- openasr
|
| 10 |
+
- oasr
|
| 11 |
+
- whisper-large-v3
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
<div align="center">
|
| 15 |
+
|
| 16 |
+
# Whisper Large v3 Β· OpenASR
|
| 17 |
+
|
| 18 |
+
**OpenAI's most accurate Whisper, the v3 large checkpoint**
|
| 19 |
+
|
| 20 |
+
[](https://huggingface.co/openai/whisper-large-v3/blob/main/README.md)
|
| 21 |
+
[](https://github.com/QuintinShaw/openasr)
|
| 22 |
+
[](https://openasr.org)
|
| 23 |
+
[](https://huggingface.co/openai/whisper-large-v3)
|
| 24 |
+
|
| 25 |
+
Native speech-to-text in the **[OpenASR](https://github.com/QuintinShaw/openasr)** runtime β
|
| 26 |
+
engineered for peak performance on CPU & GPU, **no Python at inference time**.
|
| 27 |
+
|
| 28 |
+
</div>
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## β¨ Highlights
|
| 33 |
+
|
| 34 |
+
- π§ **Multilingual ASR** β transcribes a wide range of languages and can translate speech to English
|
| 35 |
+
- π **1.55B parameters** β the full-size Whisper, OpenAI's highest-accuracy checkpoint
|
| 36 |
+
- π **v3 improvements** β trained on a larger, more diverse corpus with 128 mel bins for better robustness
|
| 37 |
+
- π¦ **Native in OpenASR** β `.oasr` packs run with no Python at inference, engineered for peak performance on CPU & GPU
|
| 38 |
+
|
| 39 |
+
## π Quickstart
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
# 1. Install the OpenASR CLI Β· https://openasr.org
|
| 43 |
+
# 2. Pull a build (pick a quant β see the table below)
|
| 44 |
+
openasr pull whisper-large-v3:q8
|
| 45 |
+
|
| 46 |
+
# 3. Transcribe
|
| 47 |
+
openasr transcribe audio.wav --model whisper-large-v3
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
All builds for this model:
|
| 51 |
+
|
| 52 |
+
```bash
|
| 53 |
+
openasr pull whisper-large-v3:fp16
|
| 54 |
+
openasr pull whisper-large-v3:q8
|
| 55 |
+
openasr pull whisper-large-v3:q4
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
## π¦ Available builds
|
| 59 |
+
|
| 60 |
+
| Quant | File (`.oasr`) | Size | RAM peak | RTF Β· M1 CPU | RTF Β· M1 GPU | JFK ΞWER vs fp16 |
|
| 61 |
+
|:------|:---------------|-----:|---------:|-------------:|-------------:|-----------------:|
|
| 62 |
+
| fp16 | `whisper-large-v3-fp16.oasr` | 3.09 GB | 4.70 GB | 1.17Γ | 1.13Γ | 0.0% |
|
| 63 |
+
| q8_0 | `whisper-large-v3-q8_0.oasr` | 1.71 GB | 4.05 GB | 0.65Γ | 0.46Γ | 0.0% |
|
| 64 |
+
| q4_k | `whisper-large-v3-q4_k.oasr` | 978 MB | 2.46 GB | 0.61Γ | 0.49Γ | 0.0% |
|
| 65 |
+
|
| 66 |
+
<sub>RTF = real-time factor on the fixed 11s JFK clip (**lower is faster**); RAM peak measured per pack
|
| 67 |
+
in an isolated subprocess. JFK ΞWER compares each quantized build's JFK transcript to this model's
|
| 68 |
+
fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy.
|
| 69 |
+
**q8_0** is the recommended default β near-reference quality at a fraction of the
|
| 70 |
+
footprint.</sub>
|
| 71 |
+
|
| 72 |
+
## π§ About Whisper Large v3
|
| 73 |
+
|
| 74 |
+
Whisper Large v3 is OpenAI's 1.55B-parameter multilingual Whisper checkpoint, the most accurate
|
| 75 |
+
member of the family. It uses the standard Whisper encoder-decoder architecture for automatic
|
| 76 |
+
speech recognition and speech translation; v3 was trained on a larger and more diverse labelled
|
| 77 |
+
corpus and uses 128 mel-frequency bins, improving robustness across languages and conditions
|
| 78 |
+
over earlier large checkpoints. This OpenASR repo repackages the original
|
| 79 |
+
`openai/whisper-large-v3` weights as `.oasr` packs that run natively in the OpenASR runtime with
|
| 80 |
+
no Python at inference time. For most users the q8_0 build is the recommended default; q4_k is
|
| 81 |
+
for tighter memory budgets and fp16 is for verification or maximum fidelity. For a faster
|
| 82 |
+
large-grade option, see the distilled `whisper-large-v3-turbo`.
|
| 83 |
+
|
| 84 |
+
## βοΈ How these packs were made
|
| 85 |
+
|
| 86 |
+
Converted from [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) with the OpenASR importer:
|
| 87 |
+
|
| 88 |
+
```bash
|
| 89 |
+
openasr model-pack import-whisper-local <src> <out>.oasr \
|
| 90 |
+
--package-id whisper-large-v3 --quantization {fp16,q8-0,q4-k}
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
The `.oasr` container is GGUF-backed; packs use zero-copy mmap weight binding and graph
|
| 94 |
+
buffer reuse to keep peak memory low.
|
| 95 |
+
|
| 96 |
+
## βοΈ License
|
| 97 |
+
|
| 98 |
+
These packs **inherit the upstream model's license: Apache-2.0**
|
| 99 |
+
([source](https://huggingface.co/openai/whisper-large-v3/blob/main/README.md)). OpenASR packaging retains the upstream copyright and
|
| 100 |
+
NOTICE; the only modifications are format conversion and quantization.
|
| 101 |
+
|
| 102 |
+
## π Acknowledgements
|
| 103 |
+
|
| 104 |
+
This pack is a redistribution of **Whisper Large v3**, released by **OpenAI**
|
| 105 |
+
([openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)).
|
| 106 |
+
All credit for the original model, training recipe, and weights belongs to OpenAI. The
|
| 107 |
+
upstream Hugging Face model card declares Apache-2.0 licensing; OpenASR only converts the
|
| 108 |
+
weights into `.oasr` packages and adds quantized builds for local runtime use.
|
| 109 |
+
|
| 110 |
+
## π Links
|
| 111 |
+
|
| 112 |
+
- π¦ **OpenASR** β <https://github.com/QuintinShaw/openasr>
|
| 113 |
+
- π **Website** β <https://openasr.org>
|
| 114 |
+
- π€ **Upstream model** β [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
|
whisper-large-v3-fp16.oasr
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8a0052daa836ecc10cd5e71f3dffce21af001b47a624501e5ef08d536a732598
|
| 3 |
+
size 3088750656
|
whisper-large-v3-q4_k.oasr
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2585fb2c9f5506266a88963a80eddb07a759d334416fac851b61a40b0a18df71
|
| 3 |
+
size 978491456
|
whisper-large-v3-q8_0.oasr
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2ea5a9cf974b0524ee570feb50c96d2b754b1108312a880f7778bf55caf49327
|
| 3 |
+
size 1712494656
|