|
|
--- |
|
|
library_name: transformers |
|
|
language: |
|
|
- en |
|
|
- sw |
|
|
base_model: |
|
|
- Jacaranda-Health/ASR-STT |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
|
|
|
# ASR-STT 8BIT Quantized |
|
|
|
|
|
This is an 8-bit quantized version of [Jacaranda-Health/ASR-STT](https://huggingface.co/Jacaranda-Health/ASR-STT). |
|
|
|
|
|
## Model Details |
|
|
- **Base Model**: Jacaranda-Health/ASR-STT |
|
|
- **Quantization**: 8bit |
|
|
- **Size Reduction**: 73.1% smaller than original |
|
|
- **Original Size**: 2913.89 MB |
|
|
- **Quantized Size**: 784.94 MB |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, BitsAndBytesConfig |
|
|
import torch |
|
|
import librosa |
|
|
|
|
|
# Load processor |
|
|
processor = AutoProcessor.from_pretrained("Jacaranda-Health/ASR-STT-8bit") |
|
|
|
|
|
# Configure quantization |
|
|
quantization_config = BitsAndBytesConfig( |
|
|
load_in_8bit=True |
|
|
llm_int8_threshold=6.0, |
|
|
llm_int8_has_fp16_weight=False |
|
|
|
|
|
) |
|
|
|
|
|
# Load quantized model |
|
|
model = AutoModelForSpeechSeq2Seq.from_pretrained( |
|
|
"Jacaranda-Health/ASR-STT-8bit", |
|
|
quantization_config=quantization_config, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Transcription function |
|
|
def transcribe(filepath): |
|
|
audio, sr = librosa.load(filepath, sr=16000) |
|
|
inputs = processor(audio, sampling_rate=sr, return_tensors="pt") |
|
|
|
|
|
# Convert to half precision for quantized models |
|
|
if torch.cuda.is_available(): |
|
|
inputs = {k: v.cuda().half() for k, v in inputs.items()} |
|
|
else: |
|
|
inputs = {k: v.half() for k, v in inputs.items()} |
|
|
|
|
|
with torch.no_grad(): |
|
|
generated_ids = model.generate(inputs["input_features"]) |
|
|
|
|
|
return processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
|
|
|
|
# Example usage |
|
|
transcription = transcribe("path/to/audio.wav") |
|
|
print(transcription) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
- Faster inference due to reduced precision |
|
|
- Lower memory usage |
|
|
- Maintained transcription quality |
|
|
|
|
|
## Requirements |
|
|
- transformers |
|
|
- torch |
|
|
- bitsandbytes |
|
|
- librosa |