--- tags: - audio - automatic-speech-recognition - whisper - ctranslate2 - faster-whisper - whisperx license: apache-2.0 base_model: vinai/PhoWhisper-large pipeline_tag: automatic-speech-recognition --- # PhoWhisper Large - CTranslate2 Version (Float32) This repository contains the [vinai/PhoWhisper-large](https://huggingface.co/vinai/PhoWhisper-large) model converted to the **CTranslate2** format in full **Float32** precision. By hosting the model in Float32, users have the flexibility to load it in any precision they prefer at runtime (e.g., `float16`, `bfloat16`, or `int8`) depending on their hardware (GPU/CPU). This version is fully compatible with libraries like [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and [WhisperX](https://github.com/m-bain/whisperX). ## Model Details - **Original Model**: [vinai/PhoWhisper-large](https://huggingface.co/vinai/PhoWhisper-large) - **Format**: CTranslate2 (CT2) - **Quantization**: None (Full `float32` precision) --- ## How to Use ### 1. Using with WhisperX (Python API) You can load this model directly into WhisperX and specify your preferred runtime precision using `compute_type`: ```python import whisperx device = "cuda" # or "cpu" batch_size = 16 # Load the model in Float16 for fast GPU inference model = whisperx.load_model( "qnaug/phowhisper-large-ctranslate2", device=device, compute_type="float16" # Choose: "float32", "float16", "int8" ) # Transcribe audio audio = whisperx.load_audio("sample_audio.mp3") result = model.transcribe(audio, batch_size=batch_size, language="vi") # Optional: Align timestamps model_a, metadata = whisperx.load_align_model(language_code="vi", device=device) result_aligned = whisperx.align(result["segments"], model_a, metadata, audio, device) print(result_aligned["segments"]) ``` ### 2. Using with WhisperX (CLI) ```bash whisperx --model qnaug/phowhisper-large-ctranslate2 --language vi --device cuda --compute_type float16 sample_audio.mp3 ``` ### 3. Using with faster-whisper (Python API) ```python from faster_whisper import WhisperModel # Load the model in Float16 model = WhisperModel( "qnaug/phowhisper-large-ctranslate2", device="cuda", compute_type="float16" # Choose: "float32", "float16", "int8" ) # Transcribe segments, info = model.transcribe("sample_audio.mp3", beam_size=5, language="vi") for segment in segments: print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}") ``` --- ## How the Model Was Converted This model was converted using the `ct2-transformers-converter` tool with the following command: ```bash ct2-transformers-converter --model vinai/PhoWhisper-large \ --output_dir ./phowhisper-large-ctranslate2 \ --copy_files tokenizer.json preprocessor_config.json ``` ## Credits All credits go to the authors of the original model: **VinAI Research**. If you use this model in your research, please cite the original PhoWhisper repository/paper.