Update README.md
Browse files
README.md
CHANGED
|
@@ -33,6 +33,27 @@ Table: **Word Error Rate (WER)** comparison between KBLab's Whisper models and t
|
|
| 33 |
|
| 34 |
We provide checkpoints in different formats: `Hugging Face`, `whisper.cpp` (GGML), `onnx`, and `ctranslate2` (used in `faster-whisper` and `WhisperX`).
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
#### Hugging Face
|
| 37 |
|
| 38 |
Inference example for using `KB-Whisper` with Hugging Face:
|
|
@@ -66,6 +87,7 @@ generate_kwargs = {"task": "transcribe", "language": "sv"}
|
|
| 66 |
res = pipe("audio.mp3",
|
| 67 |
chunk_length_s=30,
|
| 68 |
generate_kwargs={"task": "transcribe", "language": "sv"})
|
|
|
|
| 69 |
```
|
| 70 |
|
| 71 |
#### Faster-whisper
|
|
|
|
| 33 |
|
| 34 |
We provide checkpoints in different formats: `Hugging Face`, `whisper.cpp` (GGML), `onnx`, and `ctranslate2` (used in `faster-whisper` and `WhisperX`).
|
| 35 |
|
| 36 |
+
### 2025-05-13 Update!
|
| 37 |
+
The default when loading our models through Hugging Face is **Stage 2**.
|
| 38 |
+
As of May 2025 there exists two **Stage 2** versions in addition to the default, namely **Subtitle** and **Strict** that specify the transcription style.
|
| 39 |
+
By specifying `revision="subtitle"` in `.from_pretrained()` the model version with a more condensed style of transcribing is accessed.
|
| 40 |
+
By specifying `revision="strict"` in `.from_pretrained()` the more verbatim-like version of the model is accessed.
|
| 41 |
+
Below is an example of how this argument is passed in the `.from_pretrained()` function
|
| 42 |
+
```python
|
| 43 |
+
import torch
|
| 44 |
+
from datasets import load_dataset
|
| 45 |
+
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
|
| 46 |
+
|
| 47 |
+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
|
| 48 |
+
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
|
| 49 |
+
model_id = "KBLab/kb-whisper-medium"
|
| 50 |
+
model = AutoModelForSpeechSeq2Seq.from_pretrained(
|
| 51 |
+
model_id, torch_dtype=torch_dtype, use_safetensors=True, cache_dir="cache", revision="strict"
|
| 52 |
+
)
|
| 53 |
+
```
|
| 54 |
+
The verbosity of the transcription styles of the three model versions ranges from the least verbose **Subtitle**, to **Stage 2** (default) to the most verbose **Strict**.
|
| 55 |
+
|
| 56 |
+
|
| 57 |
#### Hugging Face
|
| 58 |
|
| 59 |
Inference example for using `KB-Whisper` with Hugging Face:
|
|
|
|
| 87 |
res = pipe("audio.mp3",
|
| 88 |
chunk_length_s=30,
|
| 89 |
generate_kwargs={"task": "transcribe", "language": "sv"})
|
| 90 |
+
print(res)
|
| 91 |
```
|
| 92 |
|
| 93 |
#### Faster-whisper
|