Automatic Speech Recognition
Transformers
Safetensors
English
Swahili
whisper
8-bit precision
bitsandbytes
Instructions to use Jacaranda-Health/ASR-STT-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jacaranda-Health/ASR-STT-8bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Jacaranda-Health/ASR-STT-8bit")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("Jacaranda-Health/ASR-STT-8bit") model = AutoModelForSpeechSeq2Seq.from_pretrained("Jacaranda-Health/ASR-STT-8bit") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("Jacaranda-Health/ASR-STT-8bit")
model = AutoModelForSpeechSeq2Seq.from_pretrained("Jacaranda-Health/ASR-STT-8bit")Quick Links
ASR-STT 8BIT Quantized
This is an 8-bit quantized version of Jacaranda-Health/ASR-STT.
Model Details
- Base Model: Jacaranda-Health/ASR-STT
- Quantization: 8bit
- Size Reduction: 73.1% smaller than original
- Original Size: 2913.89 MB
- Quantized Size: 784.94 MB
Usage
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, BitsAndBytesConfig
import torch
import librosa
# Load processor
processor = AutoProcessor.from_pretrained("Jacaranda-Health/ASR-STT-8bit")
# Configure quantization
quantization_config = BitsAndBytesConfig(
load_in_8bit=True
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False
)
# Load quantized model
model = AutoModelForSpeechSeq2Seq.from_pretrained(
"Jacaranda-Health/ASR-STT-8bit",
quantization_config=quantization_config,
device_map="auto"
)
# Transcription function
def transcribe(filepath):
audio, sr = librosa.load(filepath, sr=16000)
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
# Convert to half precision for quantized models
if torch.cuda.is_available():
inputs = {k: v.cuda().half() for k, v in inputs.items()}
else:
inputs = {k: v.half() for k, v in inputs.items()}
with torch.no_grad():
generated_ids = model.generate(inputs["input_features"])
return processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
# Example usage
transcription = transcribe("path/to/audio.wav")
print(transcription)
Performance
- Faster inference due to reduced precision
- Lower memory usage
- Maintained transcription quality
Requirements
- transformers
- torch
- bitsandbytes
- librosa
- Downloads last month
- 3
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Jacaranda-Health/ASR-STT-8bit")