| --- |
| license: apache-2.0 |
| --- |
| |
| # faster-whisper-large-v3 |
|
|
| This is the model Whisper large-v3 converted to be used in [faster-whisper](https://github.com/guillaumekln/faster-whisper). |
|
|
| ## Using |
|
|
| You can choose between monkey-patching faster-whisper 0.9.0 (while they don't update it) or using my fork (which is |
| easier). |
|
|
|
|
| ### Using my fork |
|
|
| First, install it by executing: |
|
|
| ```shell |
| pip install -U 'transformers[torch]>=4.35.0' https://github.com/PythonicCafe/faster-whisper/archive/refs/heads/feature/large-v3.zip#egg=faster-whisper |
| ``` |
|
|
| Then, use it as the regular faster-whisper: |
|
|
| ```python |
| import time |
| |
| import faster_whisper |
| |
| |
| filename = "my-audio.mp3" |
| initial_prompt = "My podcast recording" # Or `None` |
| word_timestamps = False |
| vad_filter = True |
| temperature = 0.0 |
| language = "pt" |
| model_size = "large-v3" |
| device, compute_type = "cuda", "float16" |
| # or: device, compute_type = "cpu", "float32" |
| |
| model = faster_whisper.WhisperModel(model_size, device=device, compute_type=compute_type) |
| |
| segments, transcription_info = model.transcribe( |
| filename, |
| word_timestamps=word_timestamps, |
| vad_filter=vad_filter, |
| temperature=temperature, |
| language=language, |
| initial_prompt=initial_prompt, |
| ) |
| print(transcription_info) |
| |
| start_time = time.time() |
| for segment in segments: |
| row = { |
| "start": segment.start, |
| "end": segment.end, |
| "text": segment.text, |
| } |
| if word_timestamps: |
| row["words"] = [ |
| {"start": word.start, "end": word.end, "word": word.word} |
| for word in segment.words |
| ] |
| print(row) |
| end_time = time.time() |
| print(f"Transcription finished in {end_time - start_time:.2f}s") |
| ``` |
|
|
|
|
| ### Monkey-patching faster-whisper 0.9.0 |
|
|
| Make sure you have the latest version: |
|
|
| ```shell |
| pip install -U 'faster-whisper>=0.9.0' |
| ``` |
|
|
| Then, use it with some little changes: |
|
|
| ```python |
| import time |
| |
| import faster_whisper.transcribe |
| |
| |
| # Monkey patch 1 (add model to list) |
| faster_whisper.utils._MODELS["large-v3"] = "turicas/faster-whisper-large-v3" |
| |
| # Monkey patch 2 (fix Tokenizer) |
| faster_whisper.transcribe.Tokenizer.encode = lambda self, text: self.tokenizer.encode(text, add_special_tokens=False) |
| |
| filename = "my-audio.mp3" |
| initial_prompt = "My podcast recording" # Or `None` |
| word_timestamps = False |
| vad_filter = True |
| temperature = 0.0 |
| language = "pt" |
| model_size = "large-v3" |
| device, compute_type = "cuda", "float16" |
| # or: device, compute_type = "cpu", "float32" |
| |
| model = faster_whisper.transcribe.WhisperModel(model_size, device=device, compute_type=compute_type) |
| |
| # Monkey patch 3 (change n_mels) |
| from faster_whisper.feature_extractor import FeatureExtractor |
| model.feature_extractor = FeatureExtractor(feature_size=128) |
| |
| # Monkey patch 4 (change tokenizer) |
| from transformers import AutoProcessor |
| model.hf_tokenizer = AutoProcessor.from_pretrained("openai/whisper-large-v3").tokenizer |
| model.hf_tokenizer.token_to_id = lambda token: model.hf_tokenizer.convert_tokens_to_ids(token) |
| |
| segments, transcription_info = model.transcribe( |
| filename, |
| word_timestamps=word_timestamps, |
| vad_filter=vad_filter, |
| temperature=temperature, |
| language=language, |
| initial_prompt=initial_prompt, |
| ) |
| print(transcription_info) |
| |
| start_time = time.time() |
| for segment in segments: |
| row = { |
| "start": segment.start, |
| "end": segment.end, |
| "text": segment.text, |
| } |
| if word_timestamps: |
| row["words"] = [ |
| {"start": word.start, "end": word.end, "word": word.word} |
| for word in segment.words |
| ] |
| print(row) |
| end_time = time.time() |
| print(f"Transcription finished in {end_time - start_time:.2f}s") |
| ``` |
|
|
| ## Converting |
|
|
| If you'd like to convert the model yourself, execute: |
|
|
| ```shell |
| pip install -U 'ctranslate2>=3.21.0' 'transformers-4.35.0' 'OpenNMT-py==2.*' sentencepiece |
| ct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2 |
| ``` |
|
|
| Then, the files will be at `whisper-large-v3-ct2/`. |
|
|
|
|
| ## License |
|
|
| These files have the same license as the original [openai/whisper-large-v3 |
| model](https://huggingface.co/openai/whisper-large): Apache 2.0. |
|
|