Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,6 @@ language:
|
|
| 6 |
base_model:
|
| 7 |
- openai/whisper-large-v3
|
| 8 |
pipeline_tag: automatic-speech-recognition
|
| 9 |
-
library_name: transformers
|
| 10 |
tags:
|
| 11 |
- bsc
|
| 12 |
- projecte-aina
|
|
@@ -14,7 +13,6 @@ tags:
|
|
| 14 |
- automatic-speech-recognition
|
| 15 |
- whisper-large-v3
|
| 16 |
- code-switching
|
| 17 |
-
- spanish-catalan
|
| 18 |
- spanish
|
| 19 |
- catalan
|
| 20 |
---
|
|
@@ -46,13 +44,9 @@ The "whisper-timestamped-cs" is an acoustic model suitable for Automatic Speech
|
|
| 46 |
|
| 47 |
This model can be used for Automatic Speech Recognition (ASR) in code-switching conditions between Spanish and Catalan. The model is intended to transcribe audio files to plain text.
|
| 48 |
|
| 49 |
-
## How to Get Started with the Model
|
| 50 |
-
|
| 51 |
-
To see an updated and functional version of this code, please see our [Notebook](https://colab.research.google.com/drive/1MHiPrffNTwiyWeUyMQvSdSbfkef_8aJC?usp=sharing)
|
| 52 |
-
|
| 53 |
### Installation
|
| 54 |
|
| 55 |
-
To use this model, you may install [
|
| 56 |
|
| 57 |
Create a virtual environment:
|
| 58 |
```bash
|
|
@@ -64,66 +58,20 @@ source /path/to/venv/bin/activate
|
|
| 64 |
```
|
| 65 |
Install the modules:
|
| 66 |
```bash
|
| 67 |
-
pip install
|
| 68 |
```
|
| 69 |
|
| 70 |
### For Inference
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
```bash
|
| 74 |
-
#Install Prerequisites
|
| 75 |
-
pip install torch
|
| 76 |
-
pip install datasets
|
| 77 |
-
pip install 'transformers[torch]'
|
| 78 |
-
pip install evaluate
|
| 79 |
-
pip install jiwer
|
| 80 |
-
```
|
| 81 |
|
| 82 |
```python
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
from transformers import WhisperForConditionalGeneration, WhisperProcessor
|
| 91 |
-
|
| 92 |
-
#Load the processor and model.
|
| 93 |
-
MODEL_NAME="langtech-veu/whisper-timestamped-cs"
|
| 94 |
-
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
|
| 95 |
-
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
|
| 96 |
-
|
| 97 |
-
#Load the dataset
|
| 98 |
-
from datasets import load_dataset, load_metric, Audio
|
| 99 |
-
ds=load_dataset("projecte-aina/parlament_parla",split='test')
|
| 100 |
-
|
| 101 |
-
#Downsample to 16kHz
|
| 102 |
-
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
|
| 103 |
-
|
| 104 |
-
#Process the dataset
|
| 105 |
-
def map_to_pred(batch):
|
| 106 |
-
audio = batch["audio"]
|
| 107 |
-
input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
|
| 108 |
-
batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
|
| 109 |
-
|
| 110 |
-
with torch.no_grad():
|
| 111 |
-
predicted_ids = model.generate(input_features.to("cuda"))[0]
|
| 112 |
-
|
| 113 |
-
transcription = processor.decode(predicted_ids)
|
| 114 |
-
batch["prediction"] = processor.tokenizer._normalize(transcription)
|
| 115 |
-
|
| 116 |
-
return batch
|
| 117 |
-
|
| 118 |
-
#Do the evaluation
|
| 119 |
-
result = ds.map(map_to_pred)
|
| 120 |
-
|
| 121 |
-
#Compute the overall WER now.
|
| 122 |
-
from evaluate import load
|
| 123 |
-
|
| 124 |
-
wer = load("wer")
|
| 125 |
-
WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
|
| 126 |
-
print(WER)
|
| 127 |
```
|
| 128 |
|
| 129 |
## Training Details
|
|
@@ -132,12 +80,6 @@ print(WER)
|
|
| 132 |
|
| 133 |
The specific dataset used to create the model is a corpus called CAESAR-tiny, which has not been released at the moment.
|
| 134 |
|
| 135 |
-
### Training procedure
|
| 136 |
-
|
| 137 |
-
This model is the result of finetuning the model ["openai/whisper-large-v3"](https://huggingface.co/openai/whisper-large-v3) by following this [tutorial](https://huggingface.co/blog/fine-tune-whisper) provided by Hugging Face.
|
| 138 |
-
|
| 139 |
-
### Training Hyperparameters
|
| 140 |
-
|
| 141 |
## Citation
|
| 142 |
If this model contributes to your research, please cite the work:
|
| 143 |
```bibtex
|
|
|
|
| 6 |
base_model:
|
| 7 |
- openai/whisper-large-v3
|
| 8 |
pipeline_tag: automatic-speech-recognition
|
|
|
|
| 9 |
tags:
|
| 10 |
- bsc
|
| 11 |
- projecte-aina
|
|
|
|
| 13 |
- automatic-speech-recognition
|
| 14 |
- whisper-large-v3
|
| 15 |
- code-switching
|
|
|
|
| 16 |
- spanish
|
| 17 |
- catalan
|
| 18 |
---
|
|
|
|
| 44 |
|
| 45 |
This model can be used for Automatic Speech Recognition (ASR) in code-switching conditions between Spanish and Catalan. The model is intended to transcribe audio files to plain text.
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
### Installation
|
| 48 |
|
| 49 |
+
To use this model, you may install [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped):
|
| 50 |
|
| 51 |
Create a virtual environment:
|
| 52 |
```bash
|
|
|
|
| 58 |
```
|
| 59 |
Install the modules:
|
| 60 |
```bash
|
| 61 |
+
pip install git+https://github.com/linto-ai/whisper-timestamped
|
| 62 |
```
|
| 63 |
|
| 64 |
### For Inference
|
| 65 |
+
To transcribe audio in code-switching using this model, you can follow this example:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
```python
|
| 68 |
+
import whisper_timestamped as whisper
|
| 69 |
+
|
| 70 |
+
model = whisper.load_model("langtech-veu/whisper-timestamped-cs", device="cpu")
|
| 71 |
+
result = whisper.transcribe(model, "/path/to/the/audio.wav")
|
| 72 |
+
|
| 73 |
+
import json
|
| 74 |
+
print(json.dumps(result, indent = 2, ensure_ascii = False))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
```
|
| 76 |
|
| 77 |
## Training Details
|
|
|
|
| 80 |
|
| 81 |
The specific dataset used to create the model is a corpus called CAESAR-tiny, which has not been released at the moment.
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
## Citation
|
| 84 |
If this model contributes to your research, please cite the work:
|
| 85 |
```bibtex
|