--- library_name: transformers language: - ht license: apache-2.0 base_model: openai/whisper-medium tags: - generated_from_trainer datasets: - jsbeaudry/creole-text-voice model-index: - name: whisper small creole oswald results: [] --- # whisper-medium-creole-oswald This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the **creole-text-voice** dataset. The main objective is to create a **99% accurate Haitian Creole Speech-to-Text model**, capable of transcribing diverse Haitian voices across accents, regions, and speaking styles. --- ## 🧠 Model description **whisper-medium-creole-oswald** is optimized for Haitian Creole automatic speech recognition (ASR). It builds upon the Whisper architecture by OpenAI and adapts it to Haitian Creole through transfer learning and fine-tuning on a high-quality curated dataset containing hours of Haitian Creole audio-text pairs. - **Architecture**: Whisper Medium - **Fine-tuned for**: Haitian Creole (KreyΓ²l Ayisyen) - **Vocabulary**: Based on Latin script (Creole orthography), preserving diacritics and linguistic nuances. - **Voice types**: Made with female synthetics voices. - **Sampling rate**: 16kHz - **Training objective**: Maximize transcription accuracy for everyday Creole speech --- ### βœ… Intended uses - Transcribe Haitian Creole speech from: - Voice notes - Radio shows - Interviews - Public speeches - Educational content - Synthetic voices - Enable Creole voice interfaces in: - Voice assistants - Transcription services - Language-learning tools - Chatbots and accessibility platforms ### ⚠️ Limitations - May struggle with: - Heavily code-switched speech (Creole + French/English mixed) - Extremely poor audio quality (e.g., heavy background noise) - Very fast or mumbled speech in some dialects - Long duration audio file - Not optimized for **real-time transcription** on low-resource devices - Fine-tuned on a specific dataset – might generalize less to completely unseen voice types or rare accents --- ## πŸ“Š Training and evaluation data The model was trained on the **creole-text-voice** dataset, which includes: - **5 hours** of Haitian Creole Synthetic speech - Annotated, time-aligned text transcripts following standard Creole orthography ### Sources for next steps: - Public domain radio and podcast archives - Open-access interviews and spoken-word audio - Community-submitted voice samples ### Preprocessing steps: - Voice Activity Detection (VAD) - Noise filtering and audio normalization - Manual transcript review and correction ## Model usage script ```python # Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq import librosa import numpy as np import torch processor = AutoProcessor.from_pretrained("jsbeaudry/whisper-medium-oswald") model = AutoModelForSpeechSeq2Seq.from_pretrained("jsbeaudry/whisper-medium-oswald") def transcript (audio_file_path): # Load audio speech_array, sampling_rate = librosa.load(audio_file_path, sr=16000) # Convert the NumPy array to a PyTorch tensor speech_array_pt = torch.from_numpy(speech_array).unsqueeze(0) input_features = processor(speech_array, sampling_rate=sampling_rate, return_tensors="pt").input_features # 2. Generate predictions predicted_ids = model.generate(input_features) # 3. Decode the predictions transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) # print(transcription) return transcription text = transcript("/path_audio") print(text) ``` ## Model usage with gradio (UI) ```python from transformers import pipeline import gradio as gr # Load Whisper model print("Loading model...") pipe = pipeline(model="jsbeaudry/whisper-medium-oswald") print("Model loaded successfully.") # Transcription function def transcribe(audio_path): if audio_path is None: return "Please upload or record an audio file first." result = pipe(audio_path) return result["text"] # Build Gradio interface def create_interface(): with gr.Blocks(title="Whisper Medium - Haitian Creole") as demo: gr.Markdown("# πŸŽ™οΈ Whisper Medium Creole ASR") gr.Markdown( "Upload an audio file or record your voice in Haitian Creole. " "Then click **Transcribe** to see the result." ) with gr.Row(): with gr.Column(): audio_input = gr.Audio(source="upload", type="filepath", label="🎧 Upload Audio") audio_input2 = gr.Audio(source="microphone", type="filepath", label="🎀 Record Audio") with gr.Column(): transcribe_button = gr.Button("πŸ” Transcribe") output_text = gr.Textbox(label="πŸ“ Transcribed Text", lines=4) transcribe_button.click(fn=transcribe, inputs=audio_input, outputs=output_text) transcribe_button.click(fn=transcribe, inputs=audio_input2, outputs=output_text) return demo if __name__ == "__main__": interface = create_interface() interface.launch() ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - num_epochs: 5 - mixed_precision_training: Native AMP ### Framework versions - Transformers 4.46.1 - Pytorch 2.6.0+cu124 - Datasets 3.5.0 - Tokenizers 0.20.3 ## πŸ“Œ Citation If you use this model, please cite: ```bibtex @misc{whispermediumcreoleoswald2025, title={Whisper Medium Creole - Oswald}, author={Jean sauvenel beaudry}, year={2025}, howpublished={\url{https://huggingface.co/jsbeaudry}} }