qnaug's picture
Update README.md
ebcac21 verified
---
tags:
- audio
- automatic-speech-recognition
- whisper
- ctranslate2
- faster-whisper
- whisperx
license: apache-2.0
base_model: vinai/PhoWhisper-large
pipeline_tag: automatic-speech-recognition
---
# PhoWhisper Large - CTranslate2 Version (Float32)
This repository contains the [vinai/PhoWhisper-large](https://huggingface.co/vinai/PhoWhisper-large) model converted to the **CTranslate2** format in full **Float32** precision.
By hosting the model in Float32, users have the flexibility to load it in any precision they prefer at runtime (e.g., `float16`, `bfloat16`, or `int8`) depending on their hardware (GPU/CPU).
This version is fully compatible with libraries like [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and [WhisperX](https://github.com/m-bain/whisperX).
## Model Details
- **Original Model**: [vinai/PhoWhisper-large](https://huggingface.co/vinai/PhoWhisper-large)
- **Format**: CTranslate2 (CT2)
- **Quantization**: None (Full `float32` precision)
---
## How to Use
### 1. Using with WhisperX (Python API)
You can load this model directly into WhisperX and specify your preferred runtime precision using `compute_type`:
```python
import whisperx
device = "cuda" # or "cpu"
batch_size = 16
# Load the model in Float16 for fast GPU inference
model = whisperx.load_model(
"qnaug/phowhisper-large-ctranslate2",
device=device,
compute_type="float16" # Choose: "float32", "float16", "int8"
)
# Transcribe audio
audio = whisperx.load_audio("sample_audio.mp3")
result = model.transcribe(audio, batch_size=batch_size, language="vi")
# Optional: Align timestamps
model_a, metadata = whisperx.load_align_model(language_code="vi", device=device)
result_aligned = whisperx.align(result["segments"], model_a, metadata, audio, device)
print(result_aligned["segments"])
```
### 2. Using with WhisperX (CLI)
```bash
whisperx --model qnaug/phowhisper-large-ctranslate2 --language vi --device cuda --compute_type float16 sample_audio.mp3
```
### 3. Using with faster-whisper (Python API)
```python
from faster_whisper import WhisperModel
# Load the model in Float16
model = WhisperModel(
"qnaug/phowhisper-large-ctranslate2",
device="cuda",
compute_type="float16" # Choose: "float32", "float16", "int8"
)
# Transcribe
segments, info = model.transcribe("sample_audio.mp3", beam_size=5, language="vi")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
```
---
## How the Model Was Converted
This model was converted using the `ct2-transformers-converter` tool with the following command:
```bash
ct2-transformers-converter --model vinai/PhoWhisper-large \
--output_dir ./phowhisper-large-ctranslate2 \
--copy_files tokenizer.json preprocessor_config.json
```
## Credits
All credits go to the authors of the original model: **VinAI Research**. If you use this model in your research, please cite the original PhoWhisper repository/paper.