langtech-veu
/

whisper-timestamped-cs

@@ -6,7 +6,6 @@ language:
 base_model:
 - openai/whisper-large-v3
 pipeline_tag: automatic-speech-recognition
-library_name: transformers
 tags:
 - bsc
 - projecte-aina
@@ -14,7 +13,6 @@ tags:
 - automatic-speech-recognition
 - whisper-large-v3
 - code-switching
-- spanish-catalan
 - spanish
 - catalan
 ---
@@ -46,13 +44,9 @@ The "whisper-timestamped-cs" is an acoustic model suitable for Automatic Speech
 This model can be used for Automatic Speech Recognition (ASR) in code-switching conditions between Spanish and Catalan. The model is intended to transcribe audio files to plain text.
-## How to Get Started with the Model
-To see an updated and functional version of this code, please see our [Notebook](https://colab.research.google.com/drive/1MHiPrffNTwiyWeUyMQvSdSbfkef_8aJC?usp=sharing)
 ### Installation
-To use this model, you may install [datasets](https://huggingface.co/docs/datasets/installation) and [transformers](https://huggingface.co/docs/transformers/installation):
 Create a virtual environment:
 ```bash
@@ -64,66 +58,20 @@ source /path/to/venv/bin/activate
 ```
 Install the modules:
 ```bash
-pip install datasets transformers
 ```
 ### For Inference
-In order to transcribe audio in Catalan using this model, you can follow this example:
-```bash
-#Install Prerequisites
-pip install torch
-pip install datasets
-pip install 'transformers[torch]'
-pip install evaluate
-pip install jiwer
-```
 ```python
-#This code works with GPU
-#Notice that: load_metric is no longer part of datasets.
-#you have to remove it and use evaluate's load instead.
-#(Note from November 2024)
-import torch
-from transformers import WhisperForConditionalGeneration, WhisperProcessor
-#Load the processor and model.
-MODEL_NAME="langtech-veu/whisper-timestamped-cs"
-processor = WhisperProcessor.from_pretrained(MODEL_NAME)
-model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
-#Load the dataset
-from datasets import load_dataset, load_metric, Audio
-ds=load_dataset("projecte-aina/parlament_parla",split='test')
-#Downsample to 16kHz
-ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
-#Process the dataset
-def map_to_pred(batch):
-	audio = batch["audio"]
-	input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
-	batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
-	with torch.no_grad():
-		predicted_ids = model.generate(input_features.to("cuda"))[0]
-	transcription = processor.decode(predicted_ids)
-	batch["prediction"] = processor.tokenizer._normalize(transcription)
-	return batch
-#Do the evaluation
-result = ds.map(map_to_pred)
-#Compute the overall WER now.
-from evaluate import load
-wer = load("wer")
-WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
-print(WER)
 ```
 ## Training Details
@@ -132,12 +80,6 @@ print(WER)
 The specific dataset used to create the model is a corpus called CAESAR-tiny, which has not been released at the moment.
-### Training procedure
-This model is the result of finetuning the model ["openai/whisper-large-v3"](https://huggingface.co/openai/whisper-large-v3) by following this [tutorial](https://huggingface.co/blog/fine-tune-whisper) provided by Hugging Face.
-### Training Hyperparameters
 ## Citation
 If this model contributes to your research, please cite the work:
 ```bibtex

 base_model:
 - openai/whisper-large-v3
 pipeline_tag: automatic-speech-recognition
 tags:
 - bsc
 - projecte-aina
 - automatic-speech-recognition
 - whisper-large-v3
 - code-switching
 - spanish
 - catalan
 ---
 This model can be used for Automatic Speech Recognition (ASR) in code-switching conditions between Spanish and Catalan. The model is intended to transcribe audio files to plain text.
 ### Installation
+To use this model, you may install [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped):
 Create a virtual environment:
 ```bash
 ```
 Install the modules:
 ```bash
+pip install git+https://github.com/linto-ai/whisper-timestamped
 ```
 ### For Inference
+To transcribe audio in code-switching using this model, you can follow this example:
 ```python
+import whisper_timestamped as whisper
+model = whisper.load_model("langtech-veu/whisper-timestamped-cs", device="cpu")
+result = whisper.transcribe(model, "/path/to/the/audio.wav")
+import json
+print(json.dumps(result, indent = 2, ensure_ascii = False))
 ```
 ## Training Details
 The specific dataset used to create the model is a corpus called CAESAR-tiny, which has not been released at the moment.
 ## Citation
 If this model contributes to your research, please cite the work:
 ```bibtex