---
license: mit
datasets:
- facebook/multilingual_librispeech
language:
- fr
base_model:
- openai/whisper-small
pipeline_tag: automatic-speech-recognition
---

# Fine-Tuned Whisper-small Model for French ASR

This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small), trained on french version of [CV17 dataset](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)

# Live demo
Click [here](https://huggingface.co/spaces/nambn0321/ASR_french) (press restart to run the space)

- Then you have two options: Either upload a French audio or record yourself speaking French by clicking on the mic and then the orange dot. 

- Hit submit and the model will output the transcription.

# Performance and Evaluation
- **WER (Word Error Rate):** Measures the percentage of words incorrectly predicted.
- **CER (Character Error Rate):** Measures the percentage of characters incorrectly predicted.

### **Test Set: CV17(16k samples)**
| Model                      | WER (lower is better)     | CER (lower is better)   |
|----------------------------|---------------------------|-------------------------|
| [**Whisper Small** (baseline)](https://huggingface.co/openai/whisper-small)  | 0.3405    | 0.1680    |
| [**Whisper Medium** (baseline)](https://huggingface.co/openai/whisper-medium) | 0.2597    | 0.1264    |
| **My Model**                  | 0.1648    | 0.0676    |

### **Test Set: MLS (2426 samples)**
| Model                      | WER (lower is better)     | CER (lower is better)   |
|----------------------------|---------------------------|-------------------------|
| [**Whisper Small** (baseline)](https://huggingface.co/openai/whisper-small)  | 0.3271    | 0.1066    |
| [**Whisper Medium** (baseline)](https://huggingface.co/openai/whisper-medium) | 0.2974    | 0.0919    |
| **My Model**                 | 0.3269    | 0.1013    |


# Usage 
```python
import torch

from datasets import load_dataset
from transformers import pipeline

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load pipeline
pipe = pipeline("automatic-speech-recognition", model="nambn0321/ASR_french_3", device=device)


pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="fr", task="transcribe")

# Load data (this is an example but when you load your own data, make sure to use torchaudio or librosa to load the audio into the dataset)
ds_mcv_test = load_dataset("mozilla-foundation/common_voice_11_0", "fr", split="test", streaming=True)
test_segment = next(iter(ds_mcv_test))
waveform = test_segment["audio"]

# Run
generated_sentences = pipe(waveform, max_new_tokens=225)["text"]  # greedy
# generated_sentences = pipe(waveform, max_new_tokens=225, generate_kwargs={"num_beams": 5})["text"]  # beam search
```
**NOM**