AbirMessaoudi commited on
Commit
ac1421f
·
verified ·
1 Parent(s): f4cee5b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -68
README.md CHANGED
@@ -6,7 +6,6 @@ language:
6
  base_model:
7
  - openai/whisper-large-v3
8
  pipeline_tag: automatic-speech-recognition
9
- library_name: transformers
10
  tags:
11
  - bsc
12
  - projecte-aina
@@ -14,7 +13,6 @@ tags:
14
  - automatic-speech-recognition
15
  - whisper-large-v3
16
  - code-switching
17
- - spanish-catalan
18
  - spanish
19
  - catalan
20
  ---
@@ -46,13 +44,9 @@ The "whisper-timestamped-cs" is an acoustic model suitable for Automatic Speech
46
 
47
  This model can be used for Automatic Speech Recognition (ASR) in code-switching conditions between Spanish and Catalan. The model is intended to transcribe audio files to plain text.
48
 
49
- ## How to Get Started with the Model
50
-
51
- To see an updated and functional version of this code, please see our [Notebook](https://colab.research.google.com/drive/1MHiPrffNTwiyWeUyMQvSdSbfkef_8aJC?usp=sharing)
52
-
53
  ### Installation
54
 
55
- To use this model, you may install [datasets](https://huggingface.co/docs/datasets/installation) and [transformers](https://huggingface.co/docs/transformers/installation):
56
 
57
  Create a virtual environment:
58
  ```bash
@@ -64,66 +58,20 @@ source /path/to/venv/bin/activate
64
  ```
65
  Install the modules:
66
  ```bash
67
- pip install datasets transformers
68
  ```
69
 
70
  ### For Inference
71
- In order to transcribe audio in Catalan using this model, you can follow this example:
72
-
73
- ```bash
74
- #Install Prerequisites
75
- pip install torch
76
- pip install datasets
77
- pip install 'transformers[torch]'
78
- pip install evaluate
79
- pip install jiwer
80
- ```
81
 
82
  ```python
83
- #This code works with GPU
84
-
85
- #Notice that: load_metric is no longer part of datasets.
86
- #you have to remove it and use evaluate's load instead.
87
- #(Note from November 2024)
88
-
89
- import torch
90
- from transformers import WhisperForConditionalGeneration, WhisperProcessor
91
-
92
- #Load the processor and model.
93
- MODEL_NAME="langtech-veu/whisper-timestamped-cs"
94
- processor = WhisperProcessor.from_pretrained(MODEL_NAME)
95
- model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
96
-
97
- #Load the dataset
98
- from datasets import load_dataset, load_metric, Audio
99
- ds=load_dataset("projecte-aina/parlament_parla",split='test')
100
-
101
- #Downsample to 16kHz
102
- ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
103
-
104
- #Process the dataset
105
- def map_to_pred(batch):
106
- audio = batch["audio"]
107
- input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
108
- batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
109
-
110
- with torch.no_grad():
111
- predicted_ids = model.generate(input_features.to("cuda"))[0]
112
-
113
- transcription = processor.decode(predicted_ids)
114
- batch["prediction"] = processor.tokenizer._normalize(transcription)
115
-
116
- return batch
117
-
118
- #Do the evaluation
119
- result = ds.map(map_to_pred)
120
-
121
- #Compute the overall WER now.
122
- from evaluate import load
123
-
124
- wer = load("wer")
125
- WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
126
- print(WER)
127
  ```
128
 
129
  ## Training Details
@@ -132,12 +80,6 @@ print(WER)
132
 
133
  The specific dataset used to create the model is a corpus called CAESAR-tiny, which has not been released at the moment.
134
 
135
- ### Training procedure
136
-
137
- This model is the result of finetuning the model ["openai/whisper-large-v3"](https://huggingface.co/openai/whisper-large-v3) by following this [tutorial](https://huggingface.co/blog/fine-tune-whisper) provided by Hugging Face.
138
-
139
- ### Training Hyperparameters
140
-
141
  ## Citation
142
  If this model contributes to your research, please cite the work:
143
  ```bibtex
 
6
  base_model:
7
  - openai/whisper-large-v3
8
  pipeline_tag: automatic-speech-recognition
 
9
  tags:
10
  - bsc
11
  - projecte-aina
 
13
  - automatic-speech-recognition
14
  - whisper-large-v3
15
  - code-switching
 
16
  - spanish
17
  - catalan
18
  ---
 
44
 
45
  This model can be used for Automatic Speech Recognition (ASR) in code-switching conditions between Spanish and Catalan. The model is intended to transcribe audio files to plain text.
46
 
 
 
 
 
47
  ### Installation
48
 
49
+ To use this model, you may install [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped):
50
 
51
  Create a virtual environment:
52
  ```bash
 
58
  ```
59
  Install the modules:
60
  ```bash
61
+ pip install git+https://github.com/linto-ai/whisper-timestamped
62
  ```
63
 
64
  ### For Inference
65
+ To transcribe audio in code-switching using this model, you can follow this example:
 
 
 
 
 
 
 
 
 
66
 
67
  ```python
68
+ import whisper_timestamped as whisper
69
+
70
+ model = whisper.load_model("langtech-veu/whisper-timestamped-cs", device="cpu")
71
+ result = whisper.transcribe(model, "/path/to/the/audio.wav")
72
+
73
+ import json
74
+ print(json.dumps(result, indent = 2, ensure_ascii = False))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ```
76
 
77
  ## Training Details
 
80
 
81
  The specific dataset used to create the model is a corpus called CAESAR-tiny, which has not been released at the moment.
82
 
 
 
 
 
 
 
83
  ## Citation
84
  If this model contributes to your research, please cite the work:
85
  ```bibtex