ferrotorch
/

whisper-tiny-encoder

+---
+license: mit
+tags:
+  - automatic-speech-recognition
+  - audio
+  - whisper
+  - ferrotorch
+---
+# `ferrotorch/whisper-tiny-encoder`
+whisper-tiny encoder (openai/whisper-tiny). 4-layer 6-head Transformer audio encoder, d_model=384, encoder_ffn_dim=1536, num_mel_bins=80, max_source_positions=1500. MIT-licensed. Pinned encoder-only — decoder/proj_out weights are dropped from this mirror. Real-artifact baseline for audio encoder parity vs transformers (#1149).
+## Provenance
+* Upstream: `openai/whisper-tiny` (mit).
+* Conversion script:
+  [`ferrotorch/scripts/pin_pretrained_whisper_weights.py`](https://github.com/dollspace-gay/ferrotorch/blob/main/scripts/pin_pretrained_whisper_weights.py).
+* Ferrotorch issue: <https://github.com/dollspace-gay/ferrotorch/issues/1149>.
+* SHA-256 of `model.safetensors` (this file is pinned in
+  `ferrotorch-hub/src/registry.rs`): `4ce29194b87ef05385203f8b09914f5c3b060200c2b503d6d420459ffb80a294`.
+* Number of trainable parameters in the encoder slice:
+  **8,208,384**.
+* Config snapshot: d_model=384,
+  encoder_layers=4,
+  encoder_attention_heads=6,
+  encoder_ffn_dim=1536,
+  num_mel_bins=80,
+  max_source_positions=1500,
+  activation_function='gelu'.
+* Non-encoder keys dropped from the upstream checkpoint (this
+  mirror is encoder-only): 100 total, first few:
+  `['model.decoder.embed_positions.weight', 'model.decoder.embed_tokens.weight', 'model.decoder.layer_norm.bias']`.
+## Value-parity probe
+Three extra files are uploaded so the ferrotorch-side harness can
+reproduce the parity verdict without re-running the upstream
+Whisper model:
+* `_value_parity_audio.bin` — deterministic synthetic 30-second
+  audio (sum of three sine waves with a slow envelope),
+  16 kHz mono float32, shape `[1, 480000]`.
+* `_value_parity_mel.bin` — `WhisperFeatureExtractor(audio)`
+  output `[1, 80, 3000]` float32 from the upstream feature
+  extractor. The Rust-side `ferrotorch_whisper::audio` is
+  compared against this.
+* `_value_parity_encoder_output.bin` — float32 encoder hidden
+  states `[1, 1500, 384]` from
+  `WhisperModel.encoder(input_features=mel).last_hidden_state`.
+  Format: `[u32 ndim][u32 × ndim shape][f32 × prod(shape)]`
+  little-endian (matches every other ferrotorch dump).
+## How to load
+```rust
+use ferrotorch_whisper::{
+    HfWhisperConfig, WhisperConfig, load_whisper_encoder,
+};
+use ferrotorch_hub::{HubCache, hf_download_model};
+let cache = HubCache::with_default_dir();
+let repo_dir = hf_download_model("ferrotorch/whisper-tiny-encoder", "main", &cache)?;
+let hf_cfg = HfWhisperConfig::from_file(repo_dir.join("config.json"))?;
+let cfg = WhisperConfig::from_hf(&hf_cfg)?;
+let (encoder, _drop_report) = load_whisper_encoder::<f32>(
+    &repo_dir.join("model.safetensors"),
+    cfg,
+    /* strict = */ false,
+)?;
+```
+## Upstream license
+```
+MIT License
+Copyright (c) 2022 OpenAI
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be included
+in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+```