faster-distil-whisper-large-v3

Whisper distil-large-v3 model for CTranslate2

This repository contains the conversion of distil-whisper/distil-large-v3 to the CTranslate2 model format.

This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper.

Example

import requests, base64, json

ENDPOINT_URL = "endpoints.huggingface.cloud"  # 🌐 replace with your URL endpoint
HF_TOKEN     = "hf_token"                     # 🔑 replace with your HF token
AUDIO_FILE   = "audio.mp3"                    # 🔊 path to your local audio file

vad_params = {
                "min_silence_duration_ms": 100,
                "speech_pad_ms": 30,
                "min_speech_duration_ms": 40,
                "neg_threshold": 0.2,
                }

headers = {"Authorization": f"Bearer {HF_TOKEN}"}

def trans_fast(audiofile, params):
    """
    audiofile:  path to audio file
    params:     dict containing
                    - 'parameters' dict to pass to transcription model
                    - 'batched' argument (optional), defaults to True to use faster_whisper batched inference
    """
    with open(audiofile, "rb") as f:
        data = f.read() # read the file
        encoded_audio = base64.b64encode(data).decode('utf-8') # encode in b64 to send for transcription
    # Send audio bytes and params
    payload = {
                "inputs": encoded_audio,
                **params
                }
    response = requests.post(ENDPOINT_URL, headers=headers, json=payload)
    return response.json()


params = {
            # dict of parameters for faster_whisper transcription
            "parameters": {"language": "en", "vad_parameters": vad_params},
            # whether or not to use batched mode (defaults to True)
            # "batched": True,
            }
# Example usage
transcript = trans_fast(AUDIO_FILE, params)
print(transcript)

Conversion details

The original model was converted with the following command:

ct2-transformers-converter --model distil-whisper/distil-large-v3 --output_dir faster-distil-whisper-large-v3 \
    --copy_files tokenizer.json preprocessor_config.json --quantization float16

Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the compute_type option in CTranslate2.

More information

For more information about the original model, see its model card.

Downloads last month: 751