| | --- |
| | library_name: transformers.js |
| | tags: |
| | - transformers.js |
| | - onnx |
| | - whisper |
| | pipeline_tag: automatic-speech-recognition |
| | --- |
| | |
| | # Whisper Base ONNX |
| |
|
| | This is an ONNX conversion of OpenAI's [whisper-base](https://huggingface.co/openai/whisper-base) model, optimized for use with [Transformers.js](https://huggingface.co/docs/transformers.js). |
| |
|
| | ## Model Details |
| |
|
| | - **Model Type:** Whisper (Encoder-Decoder) |
| | - **Task:** Automatic Speech Recognition |
| | - **Format:** ONNX (INT8 Quantized) |
| | - **Size:** ~75MB (quantized from ~300MB) |
| |
|
| | ## Usage |
| |
|
| | ```javascript |
| | import { pipeline } from '@huggingface/transformers'; |
| | |
| | const transcriber = await pipeline('automatic-speech-recognition', 'markusingvarsson/whisper-test'); |
| | const result = await transcriber('audio.wav'); |
| | console.log(result.text); |
| | ``` |
| |
|
| | ## Conversion Details |
| |
|
| | This model was converted using a custom conversion pipeline that: |
| | 1. Downloads the original HuggingFace model |
| | 2. Exports to ONNX format with KV caching |
| | 3. Applies INT8 quantization for smaller size |
| | 4. Adds Whisper-specific alignment heads for timestamp support |
| |
|
| | The quantized models are approximately 4x smaller than the original while maintaining accuracy. |
| |
|