---
tags:
- audio
- automatic-speech-recognition
- whisper
- ctranslate2
- faster-whisper
- whisperx
license: apache-2.0
base_model: vinai/PhoWhisper-large
pipeline_tag: automatic-speech-recognition
---

# PhoWhisper Large - CTranslate2 Version (Float32)

This repository contains the [vinai/PhoWhisper-large](https://huggingface.co/vinai/PhoWhisper-large) model converted to the **CTranslate2** format in full **Float32** precision. 

By hosting the model in Float32, users have the flexibility to load it in any precision they prefer at runtime (e.g., `float16`, `bfloat16`, or `int8`) depending on their hardware (GPU/CPU).

This version is fully compatible with libraries like [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and [WhisperX](https://github.com/m-bain/whisperX).

## Model Details
- **Original Model**: [vinai/PhoWhisper-large](https://huggingface.co/vinai/PhoWhisper-large)
- **Format**: CTranslate2 (CT2)
- **Quantization**: None (Full `float32` precision)

---

## How to Use

### 1. Using with WhisperX (Python API)
You can load this model directly into WhisperX and specify your preferred runtime precision using `compute_type`:

```python
import whisperx

device = "cuda" # or "cpu"
batch_size = 16 

# Load the model in Float16 for fast GPU inference
model = whisperx.load_model(
    "qnaug/phowhisper-large-ctranslate2", 
    device=device, 
    compute_type="float16" # Choose: "float32", "float16", "int8"
)

# Transcribe audio
audio = whisperx.load_audio("sample_audio.mp3")
result = model.transcribe(audio, batch_size=batch_size, language="vi")

# Optional: Align timestamps
model_a, metadata = whisperx.load_align_model(language_code="vi", device=device)
result_aligned = whisperx.align(result["segments"], model_a, metadata, audio, device)

print(result_aligned["segments"])
```

### 2. Using with WhisperX (CLI)
```bash
whisperx --model qnaug/phowhisper-large-ctranslate2 --language vi --device cuda --compute_type float16 sample_audio.mp3
```

### 3. Using with faster-whisper (Python API)
```python
from faster_whisper import WhisperModel

# Load the model in Float16
model = WhisperModel(
    "qnaug/phowhisper-large-ctranslate2", 
    device="cuda", 
    compute_type="float16" # Choose: "float32", "float16", "int8"
)

# Transcribe
segments, info = model.transcribe("sample_audio.mp3", beam_size=5, language="vi")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
```

---

## How the Model Was Converted
This model was converted using the `ct2-transformers-converter` tool with the following command:

```bash
ct2-transformers-converter --model vinai/PhoWhisper-large \
    --output_dir ./phowhisper-large-ctranslate2 \
    --copy_files tokenizer.json preprocessor_config.json
```

## Credits
All credits go to the authors of the original model: **VinAI Research**. If you use this model in your research, please cite the original PhoWhisper repository/paper.