| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - librispeech_asr |
| | metrics: |
| | - wer |
| | pipeline_tag: automatic-speech-recognition |
| | tags: |
| | - automatic-speech-recognition |
| | - int8 |
| | - ONNX |
| | - PostTrainingStatic |
| | - Intel® Neural Compressor |
| | - neural-compressor |
| | library_name: transformers |
| | --- |
| | |
| | ## Model Details: INT8 Whisper medium |
| |
|
| | Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning. |
| |
|
| | This int8 ONNX model is generated by [neural-compressor](https://github.com/intel/neural-compressor) and the fp32 model can be exported with below command: |
| | ```shell |
| | optimum-cli export onnx --model openai/whisper-medium whisper-medium-with-past/ --task automatic-speech-recognition-with-past --opset 13 |
| | ``` |
| |
|
| | | Model Detail | Description | |
| | | ----------- | ----------- | |
| | | Model Authors - Company | Intel | |
| | | Date | May 15, 2022 | |
| | | Version | 1 | |
| | | Type | Speech Recognition | |
| | | Paper or Other Resources | - | |
| | | License | Apache 2.0 | |
| | | Questions or Comments | [Community Tab](https://huggingface.co/Intel/whisper-medium-int8-static/discussions)| |
| |
|
| | | Intended Use | Description | |
| | | ----------- | ----------- | |
| | | Primary intended uses | You can use the raw model for automatic speech recognition inference | |
| | | Primary intended users | Anyone doing automatic speech recognition inference | |
| | | Out-of-scope uses | This model in most cases will need to be fine-tuned for your particular task. The model should not be used to intentionally create hostile or alienating environments for people.| |
| |
|
| |
|
| | ### How to use |
| |
|
| | Download the model by cloning the repository: |
| | ```shell |
| | git clone https://huggingface.co/Intel/whisper-medium-int8-static |
| | ``` |
| |
|
| | Evaluate the model with below code: |
| | ```python |
| | import os |
| | from evaluate import load |
| | from datasets import load_dataset |
| | from transformers import WhisperForConditionalGeneration, WhisperProcessor, AutoConfig |
| | |
| | model_name = 'openai/whisper-medium' |
| | model_path = 'whisper-medium-int8-static' |
| | processor = WhisperProcessor.from_pretrained(model_name) |
| | model = WhisperForConditionalGeneration.from_pretrained(model_name) |
| | config = AutoConfig.from_pretrained(model_name) |
| | wer = load("wer") |
| | librispeech_test_clean = load_dataset("librispeech_asr", "clean", split="test") |
| | |
| | from optimum.onnxruntime import ORTModelForSpeechSeq2Seq |
| | from transformers import PretrainedConfig |
| | model_config = PretrainedConfig.from_pretrained(model_name) |
| | predictions = [] |
| | references = [] |
| | sessions = ORTModelForSpeechSeq2Seq.load_model( |
| | os.path.join(model_path, 'encoder_model.onnx'), |
| | os.path.join(model_path, 'decoder_model.onnx'), |
| | os.path.join(model_path, 'decoder_with_past_model.onnx')) |
| | model = ORTModelForSpeechSeq2Seq(sessions[0], sessions[1], model_config, model_path, sessions[2]) |
| | for idx, batch in enumerate(librispeech_test_clean): |
| | audio = batch["audio"] |
| | input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features |
| | reference = processor.tokenizer._normalize(batch['text']) |
| | references.append(reference) |
| | predicted_ids = model.generate(input_features)[0] |
| | transcription = processor.decode(predicted_ids) |
| | prediction = processor.tokenizer._normalize(transcription) |
| | predictions.append(prediction) |
| | wer_result = wer.compute(references=references, predictions=predictions) |
| | print(f"Result wer: {wer_result * 100}") |
| | accuracy = 1 - wer_result |
| | print("Accuracy: %.5f" % accuracy) |
| | ``` |
| |
|
| | ## Metrics (Model Performance): |
| | | Model | Model Size (GB) | wer | |
| | |---|:---:|:---:| |
| | | FP32 |4.9|2.88| |
| | | INT8 |1.6|3.31| |
| |
|