--- library_name: transformers language: - en - sw base_model: - Jacaranda-Health/ASR-STT pipeline_tag: automatic-speech-recognition --- # ASR-STT 8BIT Quantized This is an 8-bit quantized version of [Jacaranda-Health/ASR-STT](https://huggingface.co/Jacaranda-Health/ASR-STT). ## Model Details - **Base Model**: Jacaranda-Health/ASR-STT - **Quantization**: 8bit - **Size Reduction**: 73.1% smaller than original - **Original Size**: 2913.89 MB - **Quantized Size**: 784.94 MB ## Usage ```python from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, BitsAndBytesConfig import torch import librosa # Load processor processor = AutoProcessor.from_pretrained("Jacaranda-Health/ASR-STT-8bit") # Configure quantization quantization_config = BitsAndBytesConfig( load_in_8bit=True llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False ) # Load quantized model model = AutoModelForSpeechSeq2Seq.from_pretrained( "Jacaranda-Health/ASR-STT-8bit", quantization_config=quantization_config, device_map="auto" ) # Transcription function def transcribe(filepath): audio, sr = librosa.load(filepath, sr=16000) inputs = processor(audio, sampling_rate=sr, return_tensors="pt") # Convert to half precision for quantized models if torch.cuda.is_available(): inputs = {k: v.cuda().half() for k, v in inputs.items()} else: inputs = {k: v.half() for k, v in inputs.items()} with torch.no_grad(): generated_ids = model.generate(inputs["input_features"]) return processor.batch_decode(generated_ids, skip_special_tokens=True)[0] # Example usage transcription = transcribe("path/to/audio.wav") print(transcription) ``` ## Performance - Faster inference due to reduced precision - Lower memory usage - Maintained transcription quality ## Requirements - transformers - torch - bitsandbytes - librosa