dollspace commited on
Commit
2e18d7c
·
verified ·
1 Parent(s): 668b24d

feat: pin encoder-only artifact for whisper-tiny-encoder (#1149)

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - automatic-speech-recognition
5
+ - audio
6
+ - whisper
7
+ - ferrotorch
8
+ ---
9
+
10
+ # `ferrotorch/whisper-tiny-encoder`
11
+
12
+ whisper-tiny encoder (openai/whisper-tiny). 4-layer 6-head Transformer audio encoder, d_model=384, encoder_ffn_dim=1536, num_mel_bins=80, max_source_positions=1500. MIT-licensed. Pinned encoder-only — decoder/proj_out weights are dropped from this mirror. Real-artifact baseline for audio encoder parity vs transformers (#1149).
13
+
14
+ ## Provenance
15
+
16
+ * Upstream: `openai/whisper-tiny` (mit).
17
+ * Conversion script:
18
+ [`ferrotorch/scripts/pin_pretrained_whisper_weights.py`](https://github.com/dollspace-gay/ferrotorch/blob/main/scripts/pin_pretrained_whisper_weights.py).
19
+ * Ferrotorch issue: <https://github.com/dollspace-gay/ferrotorch/issues/1149>.
20
+ * SHA-256 of `model.safetensors` (this file is pinned in
21
+ `ferrotorch-hub/src/registry.rs`): `4ce29194b87ef05385203f8b09914f5c3b060200c2b503d6d420459ffb80a294`.
22
+ * Number of trainable parameters in the encoder slice:
23
+ **8,208,384**.
24
+ * Config snapshot: d_model=384,
25
+ encoder_layers=4,
26
+ encoder_attention_heads=6,
27
+ encoder_ffn_dim=1536,
28
+ num_mel_bins=80,
29
+ max_source_positions=1500,
30
+ activation_function='gelu'.
31
+ * Non-encoder keys dropped from the upstream checkpoint (this
32
+ mirror is encoder-only): 100 total, first few:
33
+ `['model.decoder.embed_positions.weight', 'model.decoder.embed_tokens.weight', 'model.decoder.layer_norm.bias']`.
34
+
35
+ ## Value-parity probe
36
+
37
+ Three extra files are uploaded so the ferrotorch-side harness can
38
+ reproduce the parity verdict without re-running the upstream
39
+ Whisper model:
40
+
41
+ * `_value_parity_audio.bin` — deterministic synthetic 30-second
42
+ audio (sum of three sine waves with a slow envelope),
43
+ 16 kHz mono float32, shape `[1, 480000]`.
44
+ * `_value_parity_mel.bin` — `WhisperFeatureExtractor(audio)`
45
+ output `[1, 80, 3000]` float32 from the upstream feature
46
+ extractor. The Rust-side `ferrotorch_whisper::audio` is
47
+ compared against this.
48
+ * `_value_parity_encoder_output.bin` — float32 encoder hidden
49
+ states `[1, 1500, 384]` from
50
+ `WhisperModel.encoder(input_features=mel).last_hidden_state`.
51
+ Format: `[u32 ndim][u32 × ndim shape][f32 × prod(shape)]`
52
+ little-endian (matches every other ferrotorch dump).
53
+
54
+ ## How to load
55
+
56
+ ```rust
57
+ use ferrotorch_whisper::{
58
+ HfWhisperConfig, WhisperConfig, load_whisper_encoder,
59
+ };
60
+ use ferrotorch_hub::{HubCache, hf_download_model};
61
+
62
+ let cache = HubCache::with_default_dir();
63
+ let repo_dir = hf_download_model("ferrotorch/whisper-tiny-encoder", "main", &cache)?;
64
+ let hf_cfg = HfWhisperConfig::from_file(repo_dir.join("config.json"))?;
65
+ let cfg = WhisperConfig::from_hf(&hf_cfg)?;
66
+ let (encoder, _drop_report) = load_whisper_encoder::<f32>(
67
+ &repo_dir.join("model.safetensors"),
68
+ cfg,
69
+ /* strict = */ false,
70
+ )?;
71
+ ```
72
+
73
+ ## Upstream license
74
+
75
+ ```
76
+ MIT License
77
+
78
+ Copyright (c) 2022 OpenAI
79
+
80
+ Permission is hereby granted, free of charge, to any person obtaining a
81
+ copy of this software and associated documentation files (the
82
+ "Software"), to deal in the Software without restriction, including
83
+ without limitation the rights to use, copy, modify, merge, publish,
84
+ distribute, sublicense, and/or sell copies of the Software, and to
85
+ permit persons to whom the Software is furnished to do so, subject to
86
+ the following conditions:
87
+
88
+ The above copyright notice and this permission notice shall be included
89
+ in all copies or substantial portions of the Software.
90
+
91
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
92
+ OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
93
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
94
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
95
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
96
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
97
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
98
+
99
+ ```