Transducens
/

error-preserving-whisper

Model card Files Files and versions

Gabi00 commited on Sep 27, 2024

Commit

3139cc0

·

verified ·

1 Parent(s): e888c3f

Update README.md

Files changed (1) hide show

README.md +51 -1

README.md CHANGED Viewed

@@ -16,4 +16,54 @@ This fine-tuned version of OpenAI’s Whisper model is specifically trained to h
 It is designed to transcribe and process non-standard or erroneous English input, including mispronunciations,
 grammatical mistakes, slang, and non-native speaker errors. This model helps improve transcription accuracy
 in scenarios where speakers use incorrect or informal English, making it useful in language learning,
-transcription of casual conversations, or analyzing spoken communication from non-native English speakers.

 It is designed to transcribe and process non-standard or erroneous English input, including mispronunciations,
 grammatical mistakes, slang, and non-native speaker errors. This model helps improve transcription accuracy
 in scenarios where speakers use incorrect or informal English, making it useful in language learning,
+transcription of casual conversations, or analyzing spoken communication from non-native English speakers.
+## Usage Guide
+This project was executed on an Ubuntu 22.04.3 system running Linux kernel 6.8.0-40-generic.
+Whisper large-v3 is supported in Hugging Face Transformers. To run the model, first install the Transformers library.
+For this example, we'll also install Hugging Face Datasets to load toy audio dataset from
+the Hugging Face Hub, and Hugging Face  Accelerate to reduce the model loading time:
+```bash
+pip install --upgrade pip
+pip install --upgrade transformers datasets[audio] accelerate
+```
+The model can be used with the pipeline class to transcribe audios of arbitrary length:
+```python
+import torch
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
+from datasets import load_dataset
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+model_id = "openai/whisper-large-v3"
+model = AutoModelForSpeechSeq2Seq.from_pretrained(
+    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
+)
+model.to(device)
+processor = AutoProcessor.from_pretrained(model_id)
+pipe = pipeline(
+    "automatic-speech-recognition",
+    model=model,
+    tokenizer=processor.tokenizer,
+    feature_extractor=processor.feature_extractor,
+    torch_dtype=torch_dtype,
+    device=device,
+)
+dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
+sample = dataset[0]["audio"]
+result = pipe(sample)
+print(result["text"])
+```