| --- |
| license: mit |
| tags: |
| - automatic-speech-recognition |
| - audio |
| - whisper |
| - ferrotorch |
| --- |
| |
| # `ferrotorch/whisper-tiny-encoder` |
|
|
| whisper-tiny encoder (openai/whisper-tiny). 4-layer 6-head Transformer audio encoder, d_model=384, encoder_ffn_dim=1536, num_mel_bins=80, max_source_positions=1500. MIT-licensed. Pinned encoder-only — decoder/proj_out weights are dropped from this mirror. Real-artifact baseline for audio encoder parity vs transformers (#1149). |
|
|
| ## Provenance |
|
|
| * Upstream: `openai/whisper-tiny` (mit). |
| * Conversion script: |
| [`ferrotorch/scripts/pin_pretrained_whisper_weights.py`](https://github.com/dollspace-gay/ferrotorch/blob/main/scripts/pin_pretrained_whisper_weights.py). |
| * Ferrotorch issue: <https://github.com/dollspace-gay/ferrotorch/issues/1149>. |
| * SHA-256 of `model.safetensors` (this file is pinned in |
| `ferrotorch-hub/src/registry.rs`): `4ce29194b87ef05385203f8b09914f5c3b060200c2b503d6d420459ffb80a294`. |
| * Number of trainable parameters in the encoder slice: |
| **8,208,384**. |
| * Config snapshot: d_model=384, |
| encoder_layers=4, |
| encoder_attention_heads=6, |
| encoder_ffn_dim=1536, |
| num_mel_bins=80, |
| max_source_positions=1500, |
| activation_function='gelu'. |
| * Non-encoder keys dropped from the upstream checkpoint (this |
| mirror is encoder-only): 100 total, first few: |
| `['model.decoder.embed_positions.weight', 'model.decoder.embed_tokens.weight', 'model.decoder.layer_norm.bias']`. |
|
|
| ## Value-parity probe |
|
|
| Three extra files are uploaded so the ferrotorch-side harness can |
| reproduce the parity verdict without re-running the upstream |
| Whisper model: |
|
|
| * `_value_parity_audio.bin` — deterministic synthetic 30-second |
| audio (sum of three sine waves with a slow envelope), |
| 16 kHz mono float32, shape `[1, 480000]`. |
| * `_value_parity_mel.bin` — `WhisperFeatureExtractor(audio)` |
| output `[1, 80, 3000]` float32 from the upstream feature |
| extractor. The Rust-side `ferrotorch_whisper::audio` is |
| compared against this. |
| * `_value_parity_encoder_output.bin` — float32 encoder hidden |
| states `[1, 1500, 384]` from |
| `WhisperModel.encoder(input_features=mel).last_hidden_state`. |
| Format: `[u32 ndim][u32 × ndim shape][f32 × prod(shape)]` |
| little-endian (matches every other ferrotorch dump). |
|
|
| ## How to load |
|
|
| ```rust |
| use ferrotorch_whisper::{ |
| HfWhisperConfig, WhisperConfig, load_whisper_encoder, |
| }; |
| use ferrotorch_hub::{HubCache, hf_download_model}; |
| |
| let cache = HubCache::with_default_dir(); |
| let repo_dir = hf_download_model("ferrotorch/whisper-tiny-encoder", "main", &cache)?; |
| let hf_cfg = HfWhisperConfig::from_file(repo_dir.join("config.json"))?; |
| let cfg = WhisperConfig::from_hf(&hf_cfg)?; |
| let (encoder, _drop_report) = load_whisper_encoder::<f32>( |
| &repo_dir.join("model.safetensors"), |
| cfg, |
| /* strict = */ false, |
| )?; |
| ``` |
|
|
| ## Upstream license |
|
|
| ``` |
| MIT License |
| |
| Copyright (c) 2022 OpenAI |
| |
| Permission is hereby granted, free of charge, to any person obtaining a |
| copy of this software and associated documentation files (the |
| "Software"), to deal in the Software without restriction, including |
| without limitation the rights to use, copy, modify, merge, publish, |
| distribute, sublicense, and/or sell copies of the Software, and to |
| permit persons to whom the Software is furnished to do so, subject to |
| the following conditions: |
| |
| The above copyright notice and this permission notice shall be included |
| in all copies or substantial portions of the Software. |
| |
| THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS |
| OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF |
| MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. |
| IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY |
| CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, |
| TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE |
| SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
| |
| ``` |
|
|