Update README.md
Browse files
README.md
CHANGED
|
@@ -26,6 +26,7 @@ ru_whisper_small is a fine-tuned version of [openai/whisper-small](https://huggi
|
|
| 26 |
|
| 27 |
## Intended uses & limitations
|
| 28 |
|
|
|
|
| 29 |
from transformers import WhisperProcessor, WhisperForConditionalGeneration
|
| 30 |
from datasets import load_dataset
|
| 31 |
|
|
@@ -45,12 +46,14 @@ predicted_ids = model.generate(input_features)
|
|
| 45 |
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
|
| 46 |
|
| 47 |
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
|
|
|
|
| 48 |
|
| 49 |
|
| 50 |
## Long-Form Transcription
|
| 51 |
|
| 52 |
The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers pipeline method. Chunking is enabled by setting chunk_length_s=30 when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing return_timestamps=True:
|
| 53 |
|
|
|
|
| 54 |
import torch
|
| 55 |
from transformers import pipeline
|
| 56 |
from datasets import load_dataset
|
|
@@ -71,12 +74,15 @@ prediction = pipe(sample.copy(), batch_size=8)["text"]
|
|
| 71 |
|
| 72 |
# we can also return timestamps for the predictions
|
| 73 |
prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
|
|
|
|
| 74 |
|
| 75 |
|
| 76 |
## Faster using with Speculative Decoding
|
| 77 |
|
| 78 |
Speculative Decoding was proposed in Fast Inference from Transformers via Speculative Decoding by Yaniv Leviathan et. al. from Google. It works on the premise that a faster, assistant model very often generates the same tokens as a larger main model.
|
| 79 |
|
|
|
|
|
|
|
| 80 |
import torch
|
| 81 |
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
|
| 82 |
|
|
@@ -129,6 +135,7 @@ pipe = pipeline(
|
|
| 129 |
sample = dataset[0]["audio"]
|
| 130 |
result = pipe(sample)
|
| 131 |
print(result["text"])
|
|
|
|
| 132 |
|
| 133 |
|
| 134 |
### Training hyperparameters
|
|
|
|
| 26 |
|
| 27 |
## Intended uses & limitations
|
| 28 |
|
| 29 |
+
```bash
|
| 30 |
from transformers import WhisperProcessor, WhisperForConditionalGeneration
|
| 31 |
from datasets import load_dataset
|
| 32 |
|
|
|
|
| 46 |
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
|
| 47 |
|
| 48 |
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
|
| 49 |
+
```
|
| 50 |
|
| 51 |
|
| 52 |
## Long-Form Transcription
|
| 53 |
|
| 54 |
The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers pipeline method. Chunking is enabled by setting chunk_length_s=30 when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference. It can also be extended to predict sequence level timestamps by passing return_timestamps=True:
|
| 55 |
|
| 56 |
+
```bash
|
| 57 |
import torch
|
| 58 |
from transformers import pipeline
|
| 59 |
from datasets import load_dataset
|
|
|
|
| 74 |
|
| 75 |
# we can also return timestamps for the predictions
|
| 76 |
prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
|
| 77 |
+
```
|
| 78 |
|
| 79 |
|
| 80 |
## Faster using with Speculative Decoding
|
| 81 |
|
| 82 |
Speculative Decoding was proposed in Fast Inference from Transformers via Speculative Decoding by Yaniv Leviathan et. al. from Google. It works on the premise that a faster, assistant model very often generates the same tokens as a larger main model.
|
| 83 |
|
| 84 |
+
|
| 85 |
+
```bash
|
| 86 |
import torch
|
| 87 |
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
|
| 88 |
|
|
|
|
| 135 |
sample = dataset[0]["audio"]
|
| 136 |
result = pipe(sample)
|
| 137 |
print(result["text"])
|
| 138 |
+
```
|
| 139 |
|
| 140 |
|
| 141 |
### Training hyperparameters
|