YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸ‡»πŸ‡³ Whisper Vietnamese CTranslate2

This repository contains a finetuned Vietnamese ASR (automatic speech recognition) model converted from openai/whisper-small using CTranslate2. It is optimized for fast inference on CPU or GPU.


πŸš€ Try it Online

πŸ‘‰ Test this model directly on Hugging Face Space


πŸ§ͺ Example Usage (Python)

import ctranslate2
import librosa
import transformers
from huggingface_hub import snapshot_download

# Step 1: Download the CTranslate2 model from Hugging Face
model_repo = "duonguyen/whisper-vietnamese-ct2"
model_dir = snapshot_download(repo_id=model_repo)

# Step 2: Load and preprocess the audio
audio_path = "replace with your audio path"
audio, _ = librosa.load(audio_path, sr=16000, mono=True)

# Step 3: Use the original Whisper processor for feature extraction
processor = transformers.WhisperProcessor.from_pretrained("openai/whisper-small", chunk_length=12) 
inputs = processor(audio, return_tensors="np", sampling_rate=16000, do_normalize=True)
features = ctranslate2.StorageView.from_array(inputs.input_features)

# Step 4: Load the CTranslate2 model
model = ctranslate2.models.Whisper(model_dir)

# Step 5: Prepare prompt and language
language = "vi"
prompt = processor.tokenizer.convert_tokens_to_ids(
    [
        "<|startoftranscript|>",
        f"<|{language}|>",
        "<|transcribe|>",
        "<|notimestamps|>",
    ]
)

# Step 6: Transcribe
results = model.generate(features, [prompt])
transcription = processor.decode(results[0].sequences_ids[0], skip_special_tokens=True)

print("Transcription:", transcription)

⚠️ Important:
This model is currently optimized for audio chunks shorter than 12 seconds.
For longer audio inputs, it's recommended to pre-segment the audio using a VAD (Voice Activity Detection) model such as Silero VAD.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using duonguyen/whisper-vietnamese-ct2 1