eolang commited on
Commit
3e22be3
·
verified ·
1 Parent(s): 7ba8c82

Upload README (1).md

Browse files
Files changed (1) hide show
  1. README (1).md +80 -0
README (1).md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Jacaranda-Health/ASR-STT
4
+ tags:
5
+ - speech-to-text
6
+ - automatic-speech-recognition
7
+ - quantized
8
+ - 8bit
9
+ language:
10
+ - en
11
+ pipeline_tag: automatic-speech-recognition
12
+ ---
13
+
14
+ # ASR-STT 8BIT Quantized
15
+
16
+ This is a 8bit quantized version of [Jacaranda-Health/ASR-STT](https://huggingface.co/Jacaranda-Health/ASR-STT).
17
+
18
+ ## Model Details
19
+ - **Base Model**: Jacaranda-Health/ASR-STT
20
+ - **Quantization**: 8bit
21
+ - **Size Reduction**: 73.1% smaller than original
22
+ - **Original Size**: 2913.89 MB
23
+ - **Quantized Size**: 784.94 MB
24
+
25
+ ## Usage
26
+
27
+ ```python
28
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, BitsAndBytesConfig
29
+ import torch
30
+ import librosa
31
+
32
+ # Load processor
33
+ processor = AutoProcessor.from_pretrained("eolang/ASR-STT-8bit")
34
+
35
+ # Configure quantization
36
+ quantization_config = BitsAndBytesConfig(
37
+ load_in_8bit=True
38
+ llm_int8_threshold=6.0,
39
+ llm_int8_has_fp16_weight=False
40
+
41
+ )
42
+
43
+ # Load quantized model
44
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
45
+ "eolang/ASR-STT-8bit",
46
+ quantization_config=quantization_config,
47
+ device_map="auto"
48
+ )
49
+
50
+ # Transcription function
51
+ def transcribe(filepath):
52
+ audio, sr = librosa.load(filepath, sr=16000)
53
+ inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
54
+
55
+ # Convert to half precision for quantized models
56
+ if torch.cuda.is_available():
57
+ inputs = {k: v.cuda().half() for k, v in inputs.items()}
58
+ else:
59
+ inputs = {k: v.half() for k, v in inputs.items()}
60
+
61
+ with torch.no_grad():
62
+ generated_ids = model.generate(inputs["input_features"])
63
+
64
+ return processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
65
+
66
+ # Example usage
67
+ transcription = transcribe("path/to/audio.wav")
68
+ print(transcription)
69
+ ```
70
+
71
+ ## Performance
72
+ - Faster inference due to reduced precision
73
+ - Lower memory usage
74
+ - Maintained transcription quality
75
+
76
+ ## Requirements
77
+ - transformers
78
+ - torch
79
+ - bitsandbytes
80
+ - librosa