Gabi00 commited on
Commit
3139cc0
·
verified ·
1 Parent(s): e888c3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -1
README.md CHANGED
@@ -16,4 +16,54 @@ This fine-tuned version of OpenAI’s Whisper model is specifically trained to h
16
  It is designed to transcribe and process non-standard or erroneous English input, including mispronunciations,
17
  grammatical mistakes, slang, and non-native speaker errors. This model helps improve transcription accuracy
18
  in scenarios where speakers use incorrect or informal English, making it useful in language learning,
19
- transcription of casual conversations, or analyzing spoken communication from non-native English speakers.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  It is designed to transcribe and process non-standard or erroneous English input, including mispronunciations,
17
  grammatical mistakes, slang, and non-native speaker errors. This model helps improve transcription accuracy
18
  in scenarios where speakers use incorrect or informal English, making it useful in language learning,
19
+ transcription of casual conversations, or analyzing spoken communication from non-native English speakers.
20
+
21
+ ## Usage Guide
22
+
23
+ This project was executed on an Ubuntu 22.04.3 system running Linux kernel 6.8.0-40-generic.
24
+
25
+ Whisper large-v3 is supported in Hugging Face Transformers. To run the model, first install the Transformers library.
26
+ For this example, we'll also install Hugging Face Datasets to load toy audio dataset from
27
+ the Hugging Face Hub, and Hugging Face Accelerate to reduce the model loading time:
28
+
29
+ ```bash
30
+ pip install --upgrade pip
31
+ pip install --upgrade transformers datasets[audio] accelerate
32
+ ```
33
+
34
+ The model can be used with the pipeline class to transcribe audios of arbitrary length:
35
+
36
+ ```python
37
+ import torch
38
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
39
+ from datasets import load_dataset
40
+
41
+
42
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
43
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
44
+
45
+ model_id = "openai/whisper-large-v3"
46
+
47
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
48
+ model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
49
+ )
50
+ model.to(device)
51
+
52
+ processor = AutoProcessor.from_pretrained(model_id)
53
+
54
+ pipe = pipeline(
55
+ "automatic-speech-recognition",
56
+ model=model,
57
+ tokenizer=processor.tokenizer,
58
+ feature_extractor=processor.feature_extractor,
59
+ torch_dtype=torch_dtype,
60
+ device=device,
61
+ )
62
+
63
+ dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
64
+ sample = dataset[0]["audio"]
65
+
66
+ result = pipe(sample)
67
+ print(result["text"])
68
+ ```
69
+