Whisper distil-large-v3 model for CTranslate2
This repository contains the conversion of distil-whisper/distil-large-v3 to the CTranslate2 model format.
This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper.
Example
import requests, base64, json
ENDPOINT_URL = "endpoints.huggingface.cloud" # ๐ replace with your URL endpoint
HF_TOKEN = "hf_token" # ๐ replace with your HF token
AUDIO_FILE = "audio.mp3" # ๐ path to your local audio file
vad_params = {
"min_silence_duration_ms": 100,
"speech_pad_ms": 30,
"min_speech_duration_ms": 40,
"neg_threshold": 0.2,
}
headers = {"Authorization": f"Bearer {HF_TOKEN}"}
def trans_fast(audiofile, params):
"""
audiofile: path to audio file
params: dict containing
- 'parameters' dict to pass to transcription model
- 'batched' argument (optional), defaults to True to use faster_whisper batched inference
"""
with open(audiofile, "rb") as f:
data = f.read() # read the file
encoded_audio = base64.b64encode(data).decode('utf-8') # encode in b64 to send for transcription
# Send audio bytes and params
payload = {
"inputs": encoded_audio,
**params
}
response = requests.post(ENDPOINT_URL, headers=headers, json=payload)
return response.json()
params = {
# dict of parameters for faster_whisper transcription
"parameters": {"language": "en", "vad_parameters": vad_params},
# whether or not to use batched mode (defaults to True)
# "batched": True,
}
# Example usage
transcript = trans_fast(AUDIO_FILE, params)
print(transcript)
Conversion details
The original model was converted with the following command:
ct2-transformers-converter --model distil-whisper/distil-large-v3 --output_dir faster-distil-whisper-large-v3 \
--copy_files tokenizer.json preprocessor_config.json --quantization float16
Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the compute_type option in CTranslate2.
More information
For more information about the original model, see its model card.
- Downloads last month
- 147