Surya commited on
Commit
48fd86d
·
unverified ·
1 Parent(s): 2acb528

all things model

Browse files
README.md CHANGED
@@ -1,3 +1,102 @@
1
- ---
2
- license: gpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Whisper model files in custom ggml format
2
+
3
+ The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L27)
4
+ are converted to custom `ggml` format in order to be able to load them in C/C++.
5
+ Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
6
+
7
+ You can either obtain the original models and generate the `ggml` files yourself using the conversion script,
8
+ or you can use the [download-ggml-model.sh](download-ggml-model.sh) script to download the already converted models.
9
+ Currently, they are hosted on the following locations:
10
+
11
+ - https://huggingface.co/ggerganov/whisper.cpp
12
+ - https://ggml.ggerganov.com
13
+
14
+ Sample download:
15
+
16
+ ```java
17
+ $ ./download-ggml-model.sh base.en
18
+ Downloading ggml model base.en ...
19
+ models/ggml-base.en.bin 100%[=============================================>] 141.11M 5.41MB/s in 22s
20
+ Done! Model 'base.en' saved in 'models/ggml-base.en.bin'
21
+ You can now use it like this:
22
+
23
+ $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
24
+ ```
25
+
26
+ To convert the files yourself, use the convert-pt-to-ggml.py script. Here is an example usage.
27
+ The original PyTorch files are assumed to have been downloaded into ~/.cache/whisper
28
+ Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
29
+ ```
30
+ mkdir models/whisper-medium
31
+ python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
32
+ mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
33
+ rmdir models/whisper-medium
34
+ ```
35
+
36
+ A third option to obtain the model files is to download them from Hugging Face:
37
+
38
+ https://huggingface.co/ggerganov/whisper.cpp/tree/main
39
+
40
+ ## Available models
41
+
42
+ | Model | Disk | SHA |
43
+ | --- | --- | --- |
44
+ | tiny | 75 MiB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
45
+ | tiny.en | 75 MiB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
46
+ | base | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
47
+ | base.en | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` |
48
+ | small | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
49
+ | small.en | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
50
+ | medium | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
51
+ | medium.en | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
52
+ | large-v1 | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
53
+ | large-v2 | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
54
+ | large-v3 | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |
55
+
56
+ ## Model files for testing purposes
57
+
58
+ The model files prefixed with `for-tests-` are empty (i.e. do not contain any weights) and are used by the CI for
59
+ testing purposes. They are directly included in this repository for convenience and the Github Actions CI uses them to
60
+ run various sanitizer tests.
61
+
62
+ ## Fine-tuned models
63
+
64
+ There are community efforts for creating fine-tuned Whisper models using extra training data. For example, this
65
+ [blog post](https://huggingface.co/blog/fine-tune-whisper) describes a method for fine-tuning using Hugging Face (HF)
66
+ Transformer implementation of Whisper. The produced models are in slightly different format compared to the original
67
+ OpenAI format. To read the HF models you can use the [convert-h5-to-ggml.py](convert-h5-to-ggml.py) script like this:
68
+
69
+ ```bash
70
+ git clone https://github.com/openai/whisper
71
+ git clone https://github.com/ggerganov/whisper.cpp
72
+
73
+ # clone HF fine-tuned model (this is just an example)
74
+ git clone https://huggingface.co/openai/whisper-medium
75
+
76
+ # convert the model to ggml
77
+ python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
78
+ ```
79
+
80
+ ## Distilled models
81
+
82
+ Initial support for https://huggingface.co/distil-whisper is available.
83
+
84
+ Currently, the chunk-based transcription strategy is not implemented, so there can be sub-optimal quality when using the distilled models with `whisper.cpp`.
85
+
86
+ ```bash
87
+ # clone OpenAI whisper and whisper.cpp
88
+ git clone https://github.com/openai/whisper
89
+ git clone https://github.com/ggerganov/whisper.cpp
90
+
91
+ # get the models
92
+ cd whisper.cpp/models
93
+ git clone https://huggingface.co/distil-whisper/distil-medium.en
94
+ git clone https://huggingface.co/distil-whisper/distil-large-v2
95
+
96
+ # convert to ggml
97
+ python3 ./convert-h5-to-ggml.py ./distil-medium.en/ ../../whisper .
98
+ mv ggml-model.bin ggml-medium.en-distil.bin
99
+
100
+ python3 ./convert-h5-to-ggml.py ./distil-large-v2/ ../../whisper .
101
+ mv ggml-model.bin ggml-large-v2-distil.bin
102
+ ```
convert-h5-to-coreml.py ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import importlib.util
3
+
4
+ spec = importlib.util.spec_from_file_location('whisper_to_coreml', 'models/convert-whisper-to-coreml.py')
5
+ whisper_to_coreml = importlib.util.module_from_spec(spec)
6
+ spec.loader.exec_module(whisper_to_coreml)
7
+
8
+ from whisper import load_model
9
+
10
+ from copy import deepcopy
11
+ import torch
12
+ from transformers import WhisperForConditionalGeneration
13
+ from huggingface_hub import metadata_update
14
+
15
+ # https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets/blob/main/src/multiple_datasets/hub_default_utils.py
16
+ WHISPER_MAPPING = {
17
+ "layers": "blocks",
18
+ "fc1": "mlp.0",
19
+ "fc2": "mlp.2",
20
+ "final_layer_norm": "mlp_ln",
21
+ "layers": "blocks",
22
+ ".self_attn.q_proj": ".attn.query",
23
+ ".self_attn.k_proj": ".attn.key",
24
+ ".self_attn.v_proj": ".attn.value",
25
+ ".self_attn_layer_norm": ".attn_ln",
26
+ ".self_attn.out_proj": ".attn.out",
27
+ ".encoder_attn.q_proj": ".cross_attn.query",
28
+ ".encoder_attn.k_proj": ".cross_attn.key",
29
+ ".encoder_attn.v_proj": ".cross_attn.value",
30
+ ".encoder_attn_layer_norm": ".cross_attn_ln",
31
+ ".encoder_attn.out_proj": ".cross_attn.out",
32
+ "decoder.layer_norm.": "decoder.ln.",
33
+ "encoder.layer_norm.": "encoder.ln_post.",
34
+ "embed_tokens": "token_embedding",
35
+ "encoder.embed_positions.weight": "encoder.positional_embedding",
36
+ "decoder.embed_positions.weight": "decoder.positional_embedding",
37
+ "layer_norm": "ln_post",
38
+ }
39
+
40
+ # https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets/blob/main/src/multiple_datasets/hub_default_utils.py
41
+ def rename_keys(s_dict):
42
+ keys = list(s_dict.keys())
43
+ for key in keys:
44
+ new_key = key
45
+ for k, v in WHISPER_MAPPING.items():
46
+ if k in key:
47
+ new_key = new_key.replace(k, v)
48
+
49
+ print(f"{key} -> {new_key}")
50
+
51
+ s_dict[new_key] = s_dict.pop(key)
52
+ return s_dict
53
+
54
+ # https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets/blob/main/src/multiple_datasets/hub_default_utils.py
55
+ def convert_hf_whisper(hf_model_name_or_path: str, whisper_state_path: str):
56
+ transformer_model = WhisperForConditionalGeneration.from_pretrained(hf_model_name_or_path)
57
+ config = transformer_model.config
58
+
59
+ # first build dims
60
+ dims = {
61
+ 'n_mels': config.num_mel_bins,
62
+ 'n_vocab': config.vocab_size,
63
+ 'n_audio_ctx': config.max_source_positions,
64
+ 'n_audio_state': config.d_model,
65
+ 'n_audio_head': config.encoder_attention_heads,
66
+ 'n_audio_layer': config.encoder_layers,
67
+ 'n_text_ctx': config.max_target_positions,
68
+ 'n_text_state': config.d_model,
69
+ 'n_text_head': config.decoder_attention_heads,
70
+ 'n_text_layer': config.decoder_layers
71
+ }
72
+
73
+ state_dict = deepcopy(transformer_model.model.state_dict())
74
+ state_dict = rename_keys(state_dict)
75
+
76
+ torch.save({"dims": dims, "model_state_dict": state_dict}, whisper_state_path)
77
+
78
+ # Ported from models/convert-whisper-to-coreml.py
79
+ if __name__ == "__main__":
80
+ parser = argparse.ArgumentParser()
81
+ parser.add_argument("--model-name", type=str, help="name of model to convert (e.g. tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3)", required=True)
82
+ parser.add_argument("--model-path", type=str, help="path to the model (e.g. if published on HuggingFace: Oblivion208/whisper-tiny-cantonese)", required=True)
83
+ parser.add_argument("--encoder-only", type=bool, help="only convert encoder", default=False)
84
+ parser.add_argument("--quantize", type=bool, help="quantize weights to F16", default=False)
85
+ parser.add_argument("--optimize-ane", type=bool, help="optimize for ANE execution (currently broken)", default=False)
86
+ args = parser.parse_args()
87
+
88
+ if args.model_name not in ["tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large-v2", "large-v3"]:
89
+ raise ValueError("Invalid model name")
90
+
91
+ pt_target_path = f"models/hf-{args.model_name}.pt"
92
+ convert_hf_whisper(args.model_path, pt_target_path)
93
+
94
+ whisper = load_model(pt_target_path).cpu()
95
+ hparams = whisper.dims
96
+ print(hparams)
97
+
98
+ if args.optimize_ane:
99
+ whisperANE = whisper_to_coreml.WhisperANE(hparams).eval()
100
+ whisperANE.load_state_dict(whisper.state_dict())
101
+
102
+ encoder = whisperANE.encoder
103
+ decoder = whisperANE.decoder
104
+ else:
105
+ encoder = whisper.encoder
106
+ decoder = whisper.decoder
107
+
108
+ # Convert encoder
109
+ encoder = whisper_to_coreml.convert_encoder(hparams, encoder, quantize=args.quantize)
110
+ encoder.save(f"models/coreml-encoder-{args.model_name}.mlpackage")
111
+
112
+ if args.encoder_only is False:
113
+ # Convert decoder
114
+ decoder = whisper_to_coreml.convert_decoder(hparams, decoder, quantize=args.quantize)
115
+ decoder.save(f"models/coreml-decoder-{args.model_name}.mlpackage")
116
+
117
+ print("done converting")
convert-h5-to-ggml.py ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Convert Hugging Face fine-tuned models to ggml format
2
+ #
3
+ # Usage:
4
+ #
5
+ # git clone https://github.com/openai/whisper
6
+ # git clone https://github.com/ggerganov/whisper.cpp
7
+ # git clone https://huggingface.co/openai/whisper-medium
8
+ #
9
+ # python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
10
+ #
11
+ # This script is similar to "convert-pt-to-ggml.py"
12
+ #
13
+ # For more info:
14
+ #
15
+ # https://github.com/ggerganov/whisper.cpp/issues/157
16
+ #
17
+
18
+ import io
19
+ import os
20
+ import sys
21
+ import struct
22
+ import json
23
+ import code
24
+ import torch
25
+ import numpy as np
26
+ from pathlib import Path
27
+
28
+ from transformers import WhisperForConditionalGeneration
29
+
30
+ conv_map = {
31
+ 'self_attn.k_proj' : 'attn.key',
32
+ 'self_attn.q_proj' : 'attn.query',
33
+ 'self_attn.v_proj' : 'attn.value',
34
+ 'self_attn.out_proj' : 'attn.out',
35
+ 'self_attn_layer_norm' : 'attn_ln',
36
+ 'encoder_attn.q_proj' : 'cross_attn.query',
37
+ 'encoder_attn.v_proj' : 'cross_attn.value',
38
+ 'encoder_attn.out_proj' : 'cross_attn.out',
39
+ 'encoder_attn_layer_norm' : 'cross_attn_ln',
40
+ 'fc1' : 'mlp.0',
41
+ 'fc2' : 'mlp.2',
42
+ 'final_layer_norm' : 'mlp_ln',
43
+ 'encoder.layer_norm.bias' : 'encoder.ln_post.bias',
44
+ 'encoder.layer_norm.weight' : 'encoder.ln_post.weight',
45
+ 'encoder.embed_positions.weight': 'encoder.positional_embedding',
46
+ 'decoder.layer_norm.bias' : 'decoder.ln.bias',
47
+ 'decoder.layer_norm.weight' : 'decoder.ln.weight',
48
+ 'decoder.embed_positions.weight': 'decoder.positional_embedding',
49
+ 'decoder.embed_tokens.weight' : 'decoder.token_embedding.weight',
50
+ 'proj_out.weight' : 'decoder.proj.weight',
51
+ }
52
+
53
+ # ref: https://github.com/openai/gpt-2/blob/master/src/encoder.py
54
+ def bytes_to_unicode():
55
+ """
56
+ Returns list of utf-8 byte and a corresponding list of unicode strings.
57
+ The reversible bpe codes work on unicode strings.
58
+ This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
59
+ When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
60
+ This is a significant percentage of your normal, say, 32K bpe vocab.
61
+ To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
62
+ And avoids mapping to whitespace/control characters the bpe code barfs on.
63
+ """
64
+ bs = list(range(ord("!"), ord("~")+1))+list(range(ord("¡"), ord("¬")+1))+list(range(ord("®"), ord("ÿ")+1))
65
+ cs = bs[:]
66
+ n = 0
67
+ for b in range(2**8):
68
+ if b not in bs:
69
+ bs.append(b)
70
+ cs.append(2**8+n)
71
+ n += 1
72
+ cs = [chr(n) for n in cs]
73
+ return dict(zip(bs, cs))
74
+
75
+ if len(sys.argv) < 4:
76
+ print("Usage: convert-h5-to-ggml.py dir_model path-to-whisper-repo dir-output [use-f32]\n")
77
+ sys.exit(1)
78
+
79
+ dir_model = Path(sys.argv[1])
80
+ dir_whisper = Path(sys.argv[2])
81
+ dir_out = Path(sys.argv[3])
82
+
83
+ encoder = json.load((dir_model / "vocab.json").open("r", encoding="utf8"))
84
+ encoder_added = json.load((dir_model / "added_tokens.json").open( "r", encoding="utf8"))
85
+ hparams = json.load((dir_model / "config.json").open("r", encoding="utf8") )
86
+
87
+ model = WhisperForConditionalGeneration.from_pretrained(dir_model)
88
+
89
+ #code.interact(local=locals())
90
+
91
+ n_mels = hparams["num_mel_bins"]
92
+ with np.load(os.path.join(dir_whisper, "whisper/assets", "mel_filters.npz")) as f:
93
+ filters = torch.from_numpy(f[f"mel_{n_mels}"])
94
+
95
+ dir_tokenizer = dir_model
96
+
97
+ fname_out = dir_out / "ggml-model.bin"
98
+
99
+ tokens = json.load(open(dir_tokenizer / "vocab.json", "r", encoding="utf8"))
100
+
101
+ # use 16-bit or 32-bit floats
102
+ use_f16 = True
103
+ if len(sys.argv) > 4:
104
+ use_f16 = False
105
+ fname_out = dir_out / "ggml-model-f32.bin"
106
+
107
+ fout = open(fname_out, "wb")
108
+
109
+ fout.write(struct.pack("i", 0x67676d6c)) # magic: ggml in hex
110
+ fout.write(struct.pack("i", hparams["vocab_size"]))
111
+ fout.write(struct.pack("i", hparams["max_source_positions"]))
112
+ fout.write(struct.pack("i", hparams["d_model"]))
113
+ fout.write(struct.pack("i", hparams["encoder_attention_heads"]))
114
+ fout.write(struct.pack("i", hparams["encoder_layers"]))
115
+ fout.write(struct.pack("i", hparams["max_length"]))
116
+ fout.write(struct.pack("i", hparams["d_model"]))
117
+ fout.write(struct.pack("i", hparams["decoder_attention_heads"]))
118
+ fout.write(struct.pack("i", hparams["decoder_layers"]))
119
+ fout.write(struct.pack("i", hparams["num_mel_bins"]))
120
+ fout.write(struct.pack("i", use_f16))
121
+
122
+ fout.write(struct.pack("i", filters.shape[0]))
123
+ fout.write(struct.pack("i", filters.shape[1]))
124
+ for i in range(filters.shape[0]):
125
+ for j in range(filters.shape[1]):
126
+ fout.write(struct.pack("f", filters[i][j]))
127
+
128
+ byte_encoder = bytes_to_unicode()
129
+ byte_decoder = {v:k for k, v in byte_encoder.items()}
130
+
131
+ fout.write(struct.pack("i", len(tokens)))
132
+
133
+ tokens = sorted(tokens.items(), key=lambda x: x[1])
134
+ for key in tokens:
135
+ text = bytearray([byte_decoder[c] for c in key[0]])
136
+ fout.write(struct.pack("i", len(text)))
137
+ fout.write(text)
138
+
139
+ list_vars = model.state_dict()
140
+ for name in list_vars.keys():
141
+ # this seems to not be used
142
+ # ref: https://github.com/huggingface/transformers/blob/9a5b84a0076a04fe9596da72e8668069d4f09ea0/src/transformers/models/whisper/modeling_whisper.py#L1099-L1106
143
+ if name == "proj_out.weight":
144
+ print('Skipping', name)
145
+ continue
146
+
147
+ src = name
148
+
149
+ nn = name
150
+ if name != "proj_out.weight":
151
+ nn = nn.split(".")[1:]
152
+ else:
153
+ nn = nn.split(".")
154
+
155
+ if nn[1] == "layers":
156
+ nn[1] = "blocks"
157
+ if ".".join(nn[3:-1]) == "encoder_attn.k_proj":
158
+ mapped = "attn.key" if nn[0] == "encoder" else "cross_attn.key"
159
+ else:
160
+ mapped = conv_map[".".join(nn[3:-1])]
161
+ name = ".".join(nn[:3] + [mapped] + nn[-1:])
162
+ else:
163
+ name = ".".join(nn)
164
+ name = conv_map[name] if name in conv_map else name
165
+
166
+ print(src, ' -> ', name)
167
+ data = list_vars[src].squeeze().numpy()
168
+ data = data.astype(np.float16)
169
+
170
+ # reshape conv bias from [n] to [n, 1]
171
+ if name in ["encoder.conv1.bias", "encoder.conv2.bias"]:
172
+ data = data.reshape(data.shape[0], 1)
173
+ print(" Reshaped variable: " , name , " to shape: ", data.shape)
174
+
175
+ n_dims = len(data.shape)
176
+ print(name, n_dims, data.shape)
177
+
178
+ # looks like the whisper models are in f16 by default
179
+ # so we need to convert the small tensors to f32 until we fully support f16 in ggml
180
+ # ftype == 0 -> float32, ftype == 1 -> float16
181
+ ftype = 1
182
+ if use_f16:
183
+ if n_dims < 2 or \
184
+ name == "encoder.conv1.bias" or \
185
+ name == "encoder.conv2.bias" or \
186
+ name == "encoder.positional_embedding" or \
187
+ name == "decoder.positional_embedding":
188
+ print(" Converting to float32")
189
+ data = data.astype(np.float32)
190
+ ftype = 0
191
+ else:
192
+ data = data.astype(np.float32)
193
+ ftype = 0
194
+
195
+ # header
196
+ str_ = name.encode('utf-8')
197
+ fout.write(struct.pack("iii", n_dims, len(str_), ftype))
198
+ for i in range(n_dims):
199
+ fout.write(struct.pack("i", data.shape[n_dims - 1 - i]))
200
+ fout.write(str_)
201
+
202
+ # data
203
+ data.tofile(fout)
204
+
205
+ fout.close()
206
+
207
+ print("Done. Output file: " , fname_out)
208
+ print("")
convert-pt-to-ggml.py ADDED
@@ -0,0 +1,342 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Convert Whisper transformer model from PyTorch to ggml format
2
+ #
3
+ # Usage: python convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
4
+ #
5
+ # You need to clone the original repo in ~/path/to/repo/whisper/
6
+ #
7
+ # git clone https://github.com/openai/whisper ~/path/to/repo/whisper/
8
+ #
9
+ # It is used to various assets needed by the algorithm:
10
+ #
11
+ # - tokenizer
12
+ # - mel filters
13
+ #
14
+ # Also, you need to have the original models in ~/.cache/whisper/
15
+ # See the original repo for more details.
16
+ #
17
+ # This script loads the specified model and whisper assets and saves them in ggml format.
18
+ # The output is a single binary file containing the following information:
19
+ #
20
+ # - hparams
21
+ # - mel filters
22
+ # - tokenizer vocab
23
+ # - model variables
24
+ #
25
+ # For each variable, write the following:
26
+ #
27
+ # - Number of dimensions (int)
28
+ # - Name length (int)
29
+ # - Dimensions (int[n_dims])
30
+ # - Name (char[name_length])
31
+ # - Data (float[n_dims])
32
+ #
33
+
34
+ import io
35
+ import os
36
+ import sys
37
+ import struct
38
+ import json
39
+ import code
40
+ import torch
41
+ import numpy as np
42
+ import base64
43
+ from pathlib import Path
44
+ #from transformers import GPTJForCausalLM
45
+ #from transformers import GPT2TokenizerFast
46
+
47
+ # ref: https://github.com/openai/whisper/blob/8cf36f3508c9acd341a45eb2364239a3d81458b9/whisper/tokenizer.py#L10-L110
48
+ #LANGUAGES = {
49
+ # "en": "english",
50
+ # "zh": "chinese",
51
+ # "de": "german",
52
+ # "es": "spanish",
53
+ # "ru": "russian",
54
+ # "ko": "korean",
55
+ # "fr": "french",
56
+ # "ja": "japanese",
57
+ # "pt": "portuguese",
58
+ # "tr": "turkish",
59
+ # "pl": "polish",
60
+ # "ca": "catalan",
61
+ # "nl": "dutch",
62
+ # "ar": "arabic",
63
+ # "sv": "swedish",
64
+ # "it": "italian",
65
+ # "id": "indonesian",
66
+ # "hi": "hindi",
67
+ # "fi": "finnish",
68
+ # "vi": "vietnamese",
69
+ # "iw": "hebrew",
70
+ # "uk": "ukrainian",
71
+ # "el": "greek",
72
+ # "ms": "malay",
73
+ # "cs": "czech",
74
+ # "ro": "romanian",
75
+ # "da": "danish",
76
+ # "hu": "hungarian",
77
+ # "ta": "tamil",
78
+ # "no": "norwegian",
79
+ # "th": "thai",
80
+ # "ur": "urdu",
81
+ # "hr": "croatian",
82
+ # "bg": "bulgarian",
83
+ # "lt": "lithuanian",
84
+ # "la": "latin",
85
+ # "mi": "maori",
86
+ # "ml": "malayalam",
87
+ # "cy": "welsh",
88
+ # "sk": "slovak",
89
+ # "te": "telugu",
90
+ # "fa": "persian",
91
+ # "lv": "latvian",
92
+ # "bn": "bengali",
93
+ # "sr": "serbian",
94
+ # "az": "azerbaijani",
95
+ # "sl": "slovenian",
96
+ # "kn": "kannada",
97
+ # "et": "estonian",
98
+ # "mk": "macedonian",
99
+ # "br": "breton",
100
+ # "eu": "basque",
101
+ # "is": "icelandic",
102
+ # "hy": "armenian",
103
+ # "ne": "nepali",
104
+ # "mn": "mongolian",
105
+ # "bs": "bosnian",
106
+ # "kk": "kazakh",
107
+ # "sq": "albanian",
108
+ # "sw": "swahili",
109
+ # "gl": "galician",
110
+ # "mr": "marathi",
111
+ # "pa": "punjabi",
112
+ # "si": "sinhala",
113
+ # "km": "khmer",
114
+ # "sn": "shona",
115
+ # "yo": "yoruba",
116
+ # "so": "somali",
117
+ # "af": "afrikaans",
118
+ # "oc": "occitan",
119
+ # "ka": "georgian",
120
+ # "be": "belarusian",
121
+ # "tg": "tajik",
122
+ # "sd": "sindhi",
123
+ # "gu": "gujarati",
124
+ # "am": "amharic",
125
+ # "yi": "yiddish",
126
+ # "lo": "lao",
127
+ # "uz": "uzbek",
128
+ # "fo": "faroese",
129
+ # "ht": "haitian creole",
130
+ # "ps": "pashto",
131
+ # "tk": "turkmen",
132
+ # "nn": "nynorsk",
133
+ # "mt": "maltese",
134
+ # "sa": "sanskrit",
135
+ # "lb": "luxembourgish",
136
+ # "my": "myanmar",
137
+ # "bo": "tibetan",
138
+ # "tl": "tagalog",
139
+ # "mg": "malagasy",
140
+ # "as": "assamese",
141
+ # "tt": "tatar",
142
+ # "haw": "hawaiian",
143
+ # "ln": "lingala",
144
+ # "ha": "hausa",
145
+ # "ba": "bashkir",
146
+ # "jw": "javanese",
147
+ # "su": "sundanese",
148
+ #}
149
+
150
+ ## ref: https://github.com/openai/whisper/blob/8cf36f3508c9acd341a45eb2364239a3d81458b9/whisper/tokenizer.py#L273-L292
151
+ #def build_tokenizer(path_to_whisper_repo: str, name: str = "gpt2"):
152
+ # os.environ["TOKENIZERS_PARALLELISM"] = "false"
153
+ # path = os.path.join(path_to_whisper_repo, "whisper/assets", name)
154
+ # tokenizer = GPT2TokenizerFast.from_pretrained(path)
155
+ #
156
+ # specials = [
157
+ # "<|startoftranscript|>",
158
+ # *[f"<|{lang}|>" for lang in LANGUAGES.keys()],
159
+ # "<|translate|>",
160
+ # "<|transcribe|>",
161
+ # "<|startoflm|>",
162
+ # "<|startofprev|>",
163
+ # "<|nocaptions|>",
164
+ # "<|notimestamps|>",
165
+ # ]
166
+ #
167
+ # tokenizer.add_special_tokens(dict(additional_special_tokens=specials))
168
+ # return tokenizer
169
+
170
+ # ref: https://github.com/openai/gpt-2/blob/master/src/encoder.py
171
+ def bytes_to_unicode():
172
+ """
173
+ Returns list of utf-8 byte and a corresponding list of unicode strings.
174
+ The reversible bpe codes work on unicode strings.
175
+ This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
176
+ When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
177
+ This is a signficant percentage of your normal, say, 32K bpe vocab.
178
+ To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
179
+ And avoids mapping to whitespace/control characters the bpe code barfs on.
180
+ """
181
+ bs = list(range(ord("!"), ord("~")+1))+list(range(ord("¡"), ord("¬")+1))+list(range(ord("®"), ord("ÿ")+1))
182
+ cs = bs[:]
183
+ n = 0
184
+ for b in range(2**8):
185
+ if b not in bs:
186
+ bs.append(b)
187
+ cs.append(2**8+n)
188
+ n += 1
189
+ cs = [chr(n) for n in cs]
190
+ return dict(zip(bs, cs))
191
+
192
+
193
+ if len(sys.argv) < 4:
194
+ print("Usage: convert-pt-to-ggml.py model.pt path-to-whisper-repo dir-output [use-f32]\n")
195
+ sys.exit(1)
196
+
197
+ fname_inp = Path(sys.argv[1])
198
+ dir_whisper = Path(sys.argv[2])
199
+ dir_out = Path(sys.argv[3])
200
+
201
+ # try to load PyTorch binary data
202
+ try:
203
+ model_bytes = open(fname_inp, "rb").read()
204
+ with io.BytesIO(model_bytes) as fp:
205
+ checkpoint = torch.load(fp, map_location="cpu")
206
+ except Exception:
207
+ print("Error: failed to load PyTorch model file:" , fname_inp)
208
+ sys.exit(1)
209
+
210
+ hparams = checkpoint["dims"]
211
+ print("hparams:", hparams)
212
+
213
+ list_vars = checkpoint["model_state_dict"]
214
+
215
+ #print(list_vars['encoder.positional_embedding'])
216
+ #print(list_vars['encoder.conv1.weight'])
217
+ #print(list_vars['encoder.conv1.weight'].shape)
218
+
219
+ # load mel filters
220
+ n_mels = hparams["n_mels"]
221
+ with np.load(dir_whisper / "whisper" / "assets" / "mel_filters.npz") as f:
222
+ filters = torch.from_numpy(f[f"mel_{n_mels}"])
223
+ #print (filters)
224
+
225
+ #code.interact(local=locals())
226
+
227
+ # load tokenizer
228
+ # for backwards compatibility, also check for older hf_transformers format tokenizer files
229
+ # old format: dir_whisper/whisper/assets/[multilingual/gpt2]/vocab.json
230
+ # new format: dir_whisper/whisper/assets/[multilingual/gpt2].tiktoken
231
+ multilingual = hparams["n_vocab"] >= 51865
232
+ tokenizer = dir_whisper / "whisper" / "assets" / (multilingual and "multilingual.tiktoken" or "gpt2.tiktoken")
233
+ tokenizer_type = "tiktoken"
234
+ if not tokenizer.is_file():
235
+ tokenizer = dir_whisper / "whisper" / "assets" / (multilingual and "multilingual" or "gpt2") / "vocab.json"
236
+ tokenizer_type = "hf_transformers"
237
+ if not tokenizer.is_file():
238
+ print("Error: failed to find either tiktoken or hf_transformers tokenizer file:", tokenizer)
239
+ sys.exit(1)
240
+
241
+ byte_encoder = bytes_to_unicode()
242
+ byte_decoder = {v:k for k, v in byte_encoder.items()}
243
+
244
+ if tokenizer_type == "tiktoken":
245
+ with open(tokenizer, "rb") as f:
246
+ contents = f.read()
247
+ tokens = {base64.b64decode(token): int(rank) for token, rank in (line.split() for line in contents.splitlines() if line)}
248
+ elif tokenizer_type == "hf_transformers":
249
+ with open(tokenizer, "r", encoding="utf8") as f:
250
+ _tokens_raw = json.load(f)
251
+ if '<|endoftext|>' in _tokens_raw:
252
+ # ensures exact same model as tokenizer_type == tiktoken
253
+ # details: https://github.com/ggerganov/whisper.cpp/pull/725
254
+ del _tokens_raw['<|endoftext|>']
255
+ tokens = {bytes([byte_decoder[c] for c in token]): int(idx) for token, idx in _tokens_raw.items()}
256
+
257
+ # output in the same directory as the model
258
+ fname_out = dir_out / "ggml-model.bin"
259
+
260
+ # use 16-bit or 32-bit floats
261
+ use_f16 = True
262
+ if len(sys.argv) > 4:
263
+ use_f16 = False
264
+ fname_out = dir_out / "ggml-model-f32.bin"
265
+
266
+ fout = fname_out.open("wb")
267
+
268
+ fout.write(struct.pack("i", 0x67676d6c)) # magic: ggml in hex
269
+ fout.write(struct.pack("i", hparams["n_vocab"]))
270
+ fout.write(struct.pack("i", hparams["n_audio_ctx"]))
271
+ fout.write(struct.pack("i", hparams["n_audio_state"]))
272
+ fout.write(struct.pack("i", hparams["n_audio_head"]))
273
+ fout.write(struct.pack("i", hparams["n_audio_layer"]))
274
+ fout.write(struct.pack("i", hparams["n_text_ctx"]))
275
+ fout.write(struct.pack("i", hparams["n_text_state"]))
276
+ fout.write(struct.pack("i", hparams["n_text_head"]))
277
+ fout.write(struct.pack("i", hparams["n_text_layer"]))
278
+ fout.write(struct.pack("i", hparams["n_mels"]))
279
+ fout.write(struct.pack("i", use_f16))
280
+
281
+ # write mel filters
282
+ fout.write(struct.pack("i", filters.shape[0]))
283
+ fout.write(struct.pack("i", filters.shape[1]))
284
+ for i in range(filters.shape[0]):
285
+ for j in range(filters.shape[1]):
286
+ fout.write(struct.pack("f", filters[i][j]))
287
+
288
+ # write tokenizer
289
+ fout.write(struct.pack("i", len(tokens)))
290
+
291
+ for key in tokens:
292
+ fout.write(struct.pack("i", len(key)))
293
+ fout.write(key)
294
+
295
+ for name in list_vars.keys():
296
+ data = list_vars[name].squeeze().numpy()
297
+ print("Processing variable: " , name , " with shape: ", data.shape)
298
+
299
+ # reshape conv bias from [n] to [n, 1]
300
+ if name in ["encoder.conv1.bias", "encoder.conv2.bias"]:
301
+ data = data.reshape(data.shape[0], 1)
302
+ print(f" Reshaped variable: {name} to shape: ", data.shape)
303
+
304
+ n_dims = len(data.shape)
305
+
306
+ # looks like the whisper models are in f16 by default
307
+ # so we need to convert the small tensors to f32 until we fully support f16 in ggml
308
+ # ftype == 0 -> float32, ftype == 1 -> float16
309
+ ftype = 1
310
+ if use_f16:
311
+ if n_dims < 2 or \
312
+ name == "encoder.conv1.bias" or \
313
+ name == "encoder.conv2.bias" or \
314
+ name == "encoder.positional_embedding" or \
315
+ name == "decoder.positional_embedding":
316
+ print(" Converting to float32")
317
+ data = data.astype(np.float32)
318
+ ftype = 0
319
+ else:
320
+ data = data.astype(np.float32)
321
+ ftype = 0
322
+
323
+ #if name.startswith("encoder"):
324
+ # if name.endswith("mlp.0.weight") or \
325
+ # name.endswith("mlp.2.weight"):
326
+ # print(" Transposing")
327
+ # data = data.transpose()
328
+
329
+ # header
330
+ str_ = name.encode('utf-8')
331
+ fout.write(struct.pack("iii", n_dims, len(str_), ftype))
332
+ for i in range(n_dims):
333
+ fout.write(struct.pack("i", data.shape[n_dims - 1 - i]))
334
+ fout.write(str_)
335
+
336
+ # data
337
+ data.tofile(fout)
338
+
339
+ fout.close()
340
+
341
+ print("Done. Output file: " , fname_out)
342
+ print("")
convert-whisper-to-coreml.py ADDED
@@ -0,0 +1,331 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import torch
3
+ import torch.nn.functional as F
4
+ import coremltools as ct
5
+
6
+ from torch import Tensor
7
+ from torch import nn
8
+ from typing import Dict
9
+ from typing import Optional
10
+ from ane_transformers.reference.layer_norm import LayerNormANE as LayerNormANEBase
11
+ from coremltools.models.neural_network.quantization_utils import quantize_weights
12
+ from whisper.model import Whisper, AudioEncoder, TextDecoder, ResidualAttentionBlock, MultiHeadAttention, ModelDimensions
13
+ from whisper import load_model
14
+
15
+ # Use for changing dim of input in encoder and decoder embeddings
16
+ def linear_to_conv2d_map(state_dict, prefix, local_metadata, strict,
17
+ missing_keys, unexpected_keys, error_msgs):
18
+ """
19
+ Unsqueeze twice to map nn.Linear weights to nn.Conv2d weights
20
+ """
21
+ for k in state_dict:
22
+ is_attention = all(substr in k for substr in ['attn', '.weight'])
23
+ is_mlp = any(k.endswith(s) for s in ['mlp.0.weight', 'mlp.2.weight'])
24
+
25
+ if (is_attention or is_mlp) and len(state_dict[k].shape) == 2:
26
+ state_dict[k] = state_dict[k][:, :, None, None]
27
+
28
+
29
+ def correct_for_bias_scale_order_inversion(state_dict, prefix, local_metadata,
30
+ strict, missing_keys,
31
+ unexpected_keys, error_msgs):
32
+ state_dict[prefix + 'bias'] = state_dict[prefix + 'bias'] / state_dict[prefix + 'weight']
33
+ return state_dict
34
+
35
+ class LayerNormANE(LayerNormANEBase):
36
+
37
+ def __init__(self, *args, **kwargs):
38
+ super().__init__(*args, **kwargs)
39
+ self._register_load_state_dict_pre_hook(
40
+ correct_for_bias_scale_order_inversion)
41
+
42
+ class MultiHeadAttentionANE(MultiHeadAttention):
43
+ def __init__(self, n_state: int, n_head: int):
44
+ super().__init__(n_state, n_head)
45
+ self.query = nn.Conv2d(n_state, n_state, kernel_size=1)
46
+ self.key = nn.Conv2d(n_state, n_state, kernel_size=1, bias=False)
47
+ self.value = nn.Conv2d(n_state, n_state, kernel_size=1)
48
+ self.out = nn.Conv2d(n_state, n_state, kernel_size=1)
49
+
50
+ def forward(self,
51
+ x: Tensor,
52
+ xa: Optional[Tensor] = None,
53
+ mask: Optional[Tensor] = None,
54
+ kv_cache: Optional[dict] = None):
55
+
56
+ q = self.query(x)
57
+
58
+ if kv_cache is None or xa is None or self.key not in kv_cache:
59
+ # hooks, if installed (i.e. kv_cache is not None), will prepend the cached kv tensors;
60
+ # otherwise, perform key/value projections for self- or cross-attention as usual.
61
+ k = self.key(x if xa is None else xa)
62
+ v = self.value(x if xa is None else xa)
63
+
64
+ else:
65
+ # for cross-attention, calculate keys and values once and reuse in subsequent calls.
66
+ k = kv_cache[self.key]
67
+ v = kv_cache[self.value]
68
+
69
+ wv, qk = self.qkv_attention_ane(q, k, v, mask)
70
+
71
+ return self.out(wv), qk
72
+
73
+ def qkv_attention_ane(self, q: Tensor, k: Tensor, v: Tensor, mask: Optional[Tensor] = None):
74
+
75
+ _, dim, _, seqlen = q.size()
76
+
77
+ dim_per_head = dim // self.n_head
78
+
79
+ scale = float(dim_per_head)**-0.5
80
+
81
+ q = q * scale
82
+
83
+ mh_q = q.split(dim_per_head, dim=1)
84
+ mh_k = k.transpose(1,3).split(dim_per_head, dim=3)
85
+ mh_v = v.split(dim_per_head, dim=1)
86
+
87
+ mh_qk = [
88
+ torch.einsum('bchq,bkhc->bkhq', [qi, ki])
89
+ for qi, ki in zip(mh_q, mh_k)
90
+ ] # (batch_size, max_seq_length, 1, max_seq_length) * n_heads
91
+
92
+ if mask is not None:
93
+ for head_idx in range(self.n_head):
94
+ mh_qk[head_idx] = mh_qk[head_idx] + mask[:, :seqlen, :, :seqlen]
95
+
96
+ attn_weights = [aw.softmax(dim=1) for aw in mh_qk] # (batch_size, max_seq_length, 1, max_seq_length) * n_heads
97
+ attn = [torch.einsum('bkhq,bchk->bchq', wi, vi) for wi, vi in zip(attn_weights, mh_v)] # (batch_size, dim_per_head, 1, max_seq_length) * n_heads
98
+ attn = torch.cat(attn, dim=1) # (batch_size, dim, 1, max_seq_length)
99
+
100
+ return attn, torch.cat(mh_qk, dim=1).float().detach()
101
+
102
+
103
+ class ResidualAttentionBlockANE(ResidualAttentionBlock):
104
+ def __init__(self, n_state: int, n_head: int, cross_attention: bool = False):
105
+ super().__init__(n_state, n_head, cross_attention)
106
+ self.attn = MultiHeadAttentionANE(n_state, n_head)
107
+ self.attn_ln = LayerNormANE(n_state)
108
+ self.cross_attn = MultiHeadAttentionANE(n_state, n_head) if cross_attention else None
109
+ self.cross_attn_ln = LayerNormANE(n_state) if cross_attention else None
110
+
111
+ n_mlp = n_state * 4
112
+ self.mlp = nn.Sequential(
113
+ nn.Conv2d(n_state, n_mlp, kernel_size=1),
114
+ nn.GELU(),
115
+ nn.Conv2d(n_mlp, n_state, kernel_size=1)
116
+ )
117
+ self.mlp_ln = LayerNormANE(n_state)
118
+
119
+
120
+ class AudioEncoderANE(AudioEncoder):
121
+ def __init__(self, n_mels: int, n_ctx: int, n_state: int, n_head: int, n_layer: int):
122
+ super().__init__(n_mels, n_ctx, n_state, n_head, n_layer)
123
+
124
+ self.blocks = nn.ModuleList(
125
+ [ResidualAttentionBlockANE(n_state, n_head) for _ in range(n_layer)]
126
+ )
127
+ self.ln_post = LayerNormANE(n_state)
128
+
129
+ def forward(self, x: Tensor):
130
+ """
131
+ x : torch.Tensor, shape = (batch_size, n_mels, n_ctx)
132
+ the mel spectrogram of the audio
133
+ """
134
+ x = F.gelu(self.conv1(x))
135
+ x = F.gelu(self.conv2(x))
136
+
137
+ assert x.shape[1:] == self.positional_embedding.shape[::-1], "incorrect audio shape"
138
+
139
+ # Add positional embedding and add dummy dim for ANE
140
+ x = (x + self.positional_embedding.transpose(0,1)).to(x.dtype).unsqueeze(2)
141
+
142
+ for block in self.blocks:
143
+ x = block(x)
144
+
145
+ x = self.ln_post(x)
146
+
147
+ # """
148
+ # TODO:
149
+ # I think we need to transpose the result here to make it fit whisper.cpp memory order.
150
+ # However, even doing this, the results are still wrong. Kind of less wrong compared to
151
+ # not transposing, but still wrong.
152
+
153
+ # Also, I don't know why the original OpenAI implementation does not need to transpose
154
+
155
+ # transpose to (batch_size, n_ctx, n_state)
156
+ # x : torch.Tensor, shape = (batch_size, n_state, 1, n_ctx)
157
+
158
+ # """
159
+ # x = x.transpose(1,3)
160
+
161
+ return x
162
+
163
+ class TextDecoderANE(TextDecoder):
164
+
165
+ def __init__(self, n_vocab: int, n_ctx: int, n_state: int, n_head: int, n_layer: int):
166
+ super().__init__(n_vocab, n_ctx, n_state, n_head, n_layer)
167
+
168
+ self.blocks= nn.ModuleList(
169
+ [ResidualAttentionBlockANE(n_state, n_head, cross_attention=True) for _ in range(n_layer)]
170
+ )
171
+ self.ln= LayerNormANE(n_state)
172
+
173
+ def forward(self, x: Tensor, xa: Tensor, kv_cache: Optional[dict] = None):
174
+ """
175
+ x : torch.LongTensor, shape = (batch_size, <= n_ctx)
176
+ the text tokens
177
+ xa : torch.Tensor, shape = (batch_size, n_mels, n_audio_ctx)
178
+ the encoded audio features to be attended on
179
+ """
180
+ offset = next(iter(kv_cache.values())).shape[3] if kv_cache else 0
181
+ x = self.token_embedding(x) + self.positional_embedding[offset : offset + x.shape[-1]]
182
+ x = x.to(xa.dtype)
183
+
184
+ # Reformat for ANE
185
+ mask = self.mask[None, None, :, :].permute(0,3,1,2)
186
+ x = x.transpose(1,2).unsqueeze(2)
187
+
188
+ for block in self.blocks:
189
+ x = block(x, xa, mask=mask, kv_cache=kv_cache)
190
+
191
+ x = self.ln(x)
192
+
193
+ # Reformat back from ANE
194
+ x = x.permute(0,2,3,1).squeeze(0)
195
+
196
+ # ANE can only load tensors with dim size of at most 16,384 - whisper uses 51,864 (en) or 51,865 (multi-lang) tokens so we need to compute in chunks
197
+ if self.token_embedding.weight.shape[0] >= 51865:
198
+ # split in 11 chunks - 4715 each
199
+ splits = self.token_embedding.weight.split(self.token_embedding.weight.shape[0]//11, dim=0)
200
+ logits = torch.cat([torch.einsum('bid,jd->bij', x, split) for split in splits]).view(*x.shape[:2], -1)
201
+ else:
202
+ # split in 12 chunks - 4322 each
203
+ assert(self.token_embedding.weight.shape[0] == 51864)
204
+ splits = self.token_embedding.weight.split(self.token_embedding.weight.shape[0]//12, dim=0)
205
+ logits = torch.cat([torch.einsum('bid,jd->bij', x, split) for split in splits]).view(*x.shape[:2], -1)
206
+
207
+ return logits
208
+
209
+ class WhisperANE(Whisper):
210
+ def __init__(self, dims: ModelDimensions):
211
+ super().__init__(dims)
212
+
213
+ self.encoder = AudioEncoderANE(
214
+ self.dims.n_mels,
215
+ self.dims.n_audio_ctx,
216
+ self.dims.n_audio_state,
217
+ self.dims.n_audio_head,
218
+ self.dims.n_audio_layer,
219
+ )
220
+ self.decoder = TextDecoderANE(
221
+ self.dims.n_vocab,
222
+ self.dims.n_text_ctx,
223
+ self.dims.n_text_state,
224
+ self.dims.n_text_head,
225
+ self.dims.n_text_layer,
226
+ )
227
+
228
+ self._register_load_state_dict_pre_hook(linear_to_conv2d_map)
229
+
230
+ def forward(self, mel: torch.Tensor, tokens: torch.Tensor) -> Dict[str, torch.Tensor]:
231
+ return self.decoder(tokens, self.encoder(mel))
232
+
233
+ def install_kv_cache_hooks(self, cache: Optional[dict] = None):
234
+ cache = {**cache} if cache is not None else {}
235
+ hooks = []
236
+
237
+ def save_to_cache(module, _, output):
238
+ if module not in cache or output.shape[3] > self.decoder.positional_embedding.shape[0]:
239
+ cache[module] = output # save as-is, for the first token or cross attention
240
+ else:
241
+ cache[module] = torch.cat([cache[module], output], dim=3).detach()
242
+ return cache[module]
243
+
244
+ def install_hooks(layer: nn.Module):
245
+ if isinstance(layer, MultiHeadAttentionANE):
246
+ hooks.append(layer.key.register_forward_hook(save_to_cache))
247
+ hooks.append(layer.value.register_forward_hook(save_to_cache))
248
+
249
+ self.decoder.apply(install_hooks)
250
+ return cache, hooks
251
+
252
+ def convert_encoder(hparams, model, quantize=False):
253
+ model.eval()
254
+
255
+ input_shape = (1, hparams.n_mels, 3000)
256
+ input_data = torch.randn(input_shape)
257
+ traced_model = torch.jit.trace(model, input_data)
258
+
259
+ model = ct.convert(
260
+ traced_model,
261
+ convert_to=None if quantize else "mlprogram", # convert will fail if weights are quantized, not sure why
262
+ inputs=[ct.TensorType(name="logmel_data", shape=input_shape)],
263
+ outputs=[ct.TensorType(name="output")],
264
+ compute_units=ct.ComputeUnit.ALL
265
+ )
266
+
267
+ if quantize:
268
+ model = quantize_weights(model, nbits=16)
269
+
270
+ return model
271
+
272
+ def convert_decoder(hparams, model, quantize=False):
273
+ model.eval()
274
+
275
+ tokens_shape = (1, 1)
276
+ audio_shape = (1, hparams.n_audio_state, 1, 1500)
277
+
278
+ audio_data = torch.randn(audio_shape)
279
+ token_data = torch.randint(50257, tokens_shape).long()
280
+ traced_model = torch.jit.trace(model, (token_data, audio_data))
281
+
282
+ model = ct.convert(
283
+ traced_model,
284
+ convert_to=None if quantize else "mlprogram", # convert will fail if weights are quantized, not sure why
285
+ inputs=[
286
+ ct.TensorType(name="token_data", shape=tokens_shape, dtype=int),
287
+ ct.TensorType(name="audio_data", shape=audio_shape)
288
+ ]
289
+ )
290
+
291
+ if quantize:
292
+ model = quantize_weights(model, nbits=16)
293
+
294
+ return model
295
+
296
+
297
+ if __name__ == "__main__":
298
+ parser = argparse.ArgumentParser()
299
+ parser.add_argument("--model", type=str, help="model to convert (e.g. tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3)", required=True)
300
+ parser.add_argument("--encoder-only", type=bool, help="only convert encoder", default=False)
301
+ parser.add_argument("--quantize", type=bool, help="quantize weights to F16", default=False)
302
+ parser.add_argument("--optimize-ane", type=bool, help="optimize for ANE execution (currently broken)", default=False)
303
+ args = parser.parse_args()
304
+
305
+ if args.model not in ["tiny", "tiny.en", "base", "base.en", "small", "small.en", "small.en-tdrz", "medium", "medium.en", "large-v1", "large-v2", "large-v3"]:
306
+ raise ValueError("Invalid model name")
307
+
308
+ whisper = load_model(args.model).cpu()
309
+ hparams = whisper.dims
310
+ print(hparams)
311
+
312
+ if args.optimize_ane:
313
+ whisperANE = WhisperANE(hparams).eval()
314
+ whisperANE.load_state_dict(whisper.state_dict())
315
+
316
+ encoder = whisperANE.encoder
317
+ decoder = whisperANE.decoder
318
+ else:
319
+ encoder = whisper.encoder
320
+ decoder = whisper.decoder
321
+
322
+ # Convert encoder
323
+ encoder = convert_encoder(hparams, encoder, quantize=args.quantize)
324
+ encoder.save(f"models/coreml-encoder-{args.model}.mlpackage")
325
+
326
+ if args.encoder_only is False:
327
+ # Convert decoder
328
+ decoder = convert_decoder(hparams, decoder, quantize=args.quantize)
329
+ decoder.save(f"models/coreml-decoder-{args.model}.mlpackage")
330
+
331
+ print("done converting")
convert-whisper-to-openvino.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import torch
3
+ from whisper import load_model
4
+ import os
5
+ from openvino.tools import mo
6
+ from openvino.runtime import serialize
7
+ import shutil
8
+
9
+ def convert_encoder(hparams, encoder, mname):
10
+ encoder.eval()
11
+
12
+ mel = torch.zeros((1, hparams.n_mels, 3000))
13
+
14
+ onnx_folder=os.path.join(os.path.dirname(__file__),"onnx_encoder")
15
+
16
+ #create a directory to store the onnx model, and other collateral that is saved during onnx export procedure
17
+ if not os.path.isdir(onnx_folder):
18
+ os.makedirs(onnx_folder)
19
+
20
+ onnx_path = os.path.join(onnx_folder, "whisper_encoder.onnx")
21
+
22
+ torch.onnx.export(
23
+ encoder,
24
+ mel,
25
+ onnx_path,
26
+ input_names=["mel"],
27
+ output_names=["output_features"]
28
+ )
29
+
30
+ # use model optimizer to convert onnx to OpenVINO IR format
31
+ encoder_model = mo.convert_model(onnx_path, compress_to_fp16=True)
32
+ serialize(encoder_model, xml_path=os.path.join(os.path.dirname(__file__),"ggml-" + mname + "-encoder-openvino.xml"))
33
+
34
+ #cleanup
35
+ if os.path.isdir(onnx_folder):
36
+ shutil.rmtree(onnx_folder)
37
+
38
+
39
+ if __name__ == "__main__":
40
+ parser = argparse.ArgumentParser()
41
+ parser.add_argument("--model", type=str, help="model to convert (e.g. tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3)", required=True)
42
+ args = parser.parse_args()
43
+
44
+ if args.model not in ["tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large-v2", "large-v3"]:
45
+ raise ValueError("Invalid model name")
46
+
47
+ whisper = load_model(args.model).cpu()
48
+ hparams = whisper.dims
49
+
50
+ encoder = whisper.encoder
51
+
52
+ # Convert encoder to onnx
53
+ convert_encoder(hparams, encoder, args.model)
coreml-encoder-base.en.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:598255dff1e5eb81f2c32e6e9c6b3c4916bbbf4d2b39f4749d5dcb438f33f420
3
+ size 58049
coreml-encoder-base.en.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc998211e55f0972c70e3d29103477cfe8c6dd485cd68438951f83fa3ee3b770
3
+ size 41188544
coreml-encoder-base.en.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "36C90F61-3ED1-4D0A-A009-9C0067D75407": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Specification",
7
+ "name": "model.mlmodel",
8
+ "path": "com.apple.CoreML/model.mlmodel"
9
+ },
10
+ "945A3445-84F5-4FAA-BCEF-C53E04FA3A47": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Weights",
13
+ "name": "weights",
14
+ "path": "com.apple.CoreML/weights"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "36C90F61-3ED1-4D0A-A009-9C0067D75407"
18
+ }
download-coreml-model.sh ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # This script downloads Whisper model files that have already been converted to Core ML format.
4
+ # This way you don't have to convert them yourself.
5
+
6
+ src="https://huggingface.co/datasets/ggerganov/whisper.cpp-coreml"
7
+ pfx="resolve/main/ggml"
8
+
9
+ # get the path of this script
10
+ function get_script_path() {
11
+ if [ -x "$(command -v realpath)" ]; then
12
+ echo "$(dirname $(realpath $0))"
13
+ else
14
+ local ret="$(cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P)"
15
+ echo "$ret"
16
+ fi
17
+ }
18
+
19
+ models_path="$(get_script_path)"
20
+
21
+ # Whisper models
22
+ models=( "tiny.en" "tiny" "base.en" "base" "small.en" "small" "medium.en" "medium" "large-v1" "large-v2" "large-v3" )
23
+
24
+ # list available models
25
+ function list_models {
26
+ printf "\n"
27
+ printf " Available models:"
28
+ for model in "${models[@]}"; do
29
+ printf " $model"
30
+ done
31
+ printf "\n\n"
32
+ }
33
+
34
+ if [ "$#" -ne 1 ]; then
35
+ printf "Usage: $0 <model>\n"
36
+ list_models
37
+
38
+ exit 1
39
+ fi
40
+
41
+ model=$1
42
+
43
+ if [[ ! " ${models[@]} " =~ " ${model} " ]]; then
44
+ printf "Invalid model: $model\n"
45
+ list_models
46
+
47
+ exit 1
48
+ fi
49
+
50
+ # download Core ML model
51
+
52
+ printf "Downloading Core ML model $model from '$src' ...\n"
53
+
54
+ cd $models_path
55
+
56
+ if [ -f "ggml-$model.mlmodel" ]; then
57
+ printf "Model $model already exists. Skipping download.\n"
58
+ exit 0
59
+ fi
60
+
61
+ if [ -x "$(command -v wget)" ]; then
62
+ wget --quiet --show-progress -O ggml-$model.mlmodel $src/$pfx-$model.mlmodel
63
+ elif [ -x "$(command -v curl)" ]; then
64
+ curl -L --output ggml-$model.mlmodel $src/$pfx-$model.mlmodel
65
+ else
66
+ printf "Either wget or curl is required to download models.\n"
67
+ exit 1
68
+ fi
69
+
70
+
71
+ if [ $? -ne 0 ]; then
72
+ printf "Failed to download Core ML model $model \n"
73
+ printf "Please try again later or download the original Whisper model files and convert them yourself.\n"
74
+ exit 1
75
+ fi
76
+
77
+ printf "Done! Model '$model' saved in 'models/ggml-$model.mlmodel'\n"
78
+ printf "Run the following command to compile it:\n\n"
79
+ printf " $ xcrun coremlc compile ./models/ggml-$model.mlmodel ./models\n\n"
80
+ printf "You can now use it like this:\n\n"
81
+ printf " $ ./main -m models/ggml-$model.bin -f samples/jfk.wav\n"
82
+ printf "\n"
download-ggml-model.cmd ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+
3
+ pushd %~dp0
4
+ set models_path=%CD%
5
+ for %%d in (%~dp0..) do set root_path=%%~fd
6
+ popd
7
+
8
+ set argc=0
9
+ for %%x in (%*) do set /A argc+=1
10
+
11
+ set models=tiny.en tiny base.en base small.en small medium.en medium large-v1 large-v2 large-v3
12
+
13
+ if %argc% neq 1 (
14
+ echo.
15
+ echo Usage: download-ggml-model.cmd model
16
+ CALL :list_models
17
+ goto :eof
18
+ )
19
+
20
+ set model=%1
21
+
22
+ for %%b in (%models%) do (
23
+ if "%%b"=="%model%" (
24
+ CALL :download_model
25
+ goto :eof
26
+ )
27
+ )
28
+
29
+ echo Invalid model: %model%
30
+ CALL :list_models
31
+ goto :eof
32
+
33
+ :download_model
34
+ echo Downloading ggml model %model%...
35
+
36
+ cd "%models_path%"
37
+
38
+ if exist "ggml-%model%.bin" (
39
+ echo Model %model% already exists. Skipping download.
40
+ goto :eof
41
+ )
42
+
43
+ PowerShell -NoProfile -ExecutionPolicy Bypass -Command "Start-BitsTransfer -Source https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-%model%.bin -Destination ggml-%model%.bin"
44
+
45
+ if %ERRORLEVEL% neq 0 (
46
+ echo Failed to download ggml model %model%
47
+ echo Please try again later or download the original Whisper model files and convert them yourself.
48
+ goto :eof
49
+ )
50
+
51
+ echo Done! Model %model% saved in %root_path%\models\ggml-%model%.bin
52
+ echo You can now use it like this:
53
+ echo main.exe -m %root_path%\models\ggml-%model%.bin -f %root_path%\samples\jfk.wav
54
+
55
+ goto :eof
56
+
57
+ :list_models
58
+ echo.
59
+ echo Available models:
60
+ (for %%a in (%models%) do (
61
+ echo %%a
62
+ ))
63
+ echo.
64
+ exit /b
download-ggml-model.sh ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # This script downloads Whisper model files that have already been converted to ggml format.
4
+ # This way you don't have to convert them yourself.
5
+
6
+ #src="https://ggml.ggerganov.com"
7
+ #pfx="ggml-model-whisper"
8
+
9
+ src="https://huggingface.co/ggerganov/whisper.cpp"
10
+ pfx="resolve/main/ggml"
11
+
12
+ # get the path of this script
13
+ function get_script_path() {
14
+ if [ -x "$(command -v realpath)" ]; then
15
+ echo "$(dirname "$(realpath "$0")")"
16
+ else
17
+ local ret="$(cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P)"
18
+ echo "$ret"
19
+ fi
20
+ }
21
+
22
+ models_path="$(get_script_path)"
23
+
24
+ # Whisper models
25
+ models=(
26
+ "tiny.en"
27
+ "tiny"
28
+ "tiny-q5_1"
29
+ "tiny.en-q5_1"
30
+ "base.en"
31
+ "base"
32
+ "base-q5_1"
33
+ "base.en-q5_1"
34
+ "small.en"
35
+ "small.en-tdrz"
36
+ "small"
37
+ "small-q5_1"
38
+ "small.en-q5_1"
39
+ "medium"
40
+ "medium.en"
41
+ "medium-q5_0"
42
+ "medium.en-q5_0"
43
+ "large-v1"
44
+ "large-v2"
45
+ "large-v3"
46
+ "large-q5_0"
47
+ )
48
+
49
+ # list available models
50
+ function list_models {
51
+ printf "\n"
52
+ printf " Available models:"
53
+ for model in "${models[@]}"; do
54
+ printf " $model"
55
+ done
56
+ printf "\n\n"
57
+ }
58
+
59
+ if [ "$#" -ne 1 ]; then
60
+ printf "Usage: $0 <model>\n"
61
+ list_models
62
+
63
+ exit 1
64
+ fi
65
+
66
+ model=$1
67
+
68
+ if [[ ! " ${models[@]} " =~ " ${model} " ]]; then
69
+ printf "Invalid model: $model\n"
70
+ list_models
71
+
72
+ exit 1
73
+ fi
74
+
75
+ # check if model contains `tdrz` and update the src and pfx accordingly
76
+ if [[ $model == *"tdrz"* ]]; then
77
+ src="https://huggingface.co/akashmjn/tinydiarize-whisper.cpp"
78
+ pfx="resolve/main/ggml"
79
+ fi
80
+
81
+ # download ggml model
82
+
83
+ printf "Downloading ggml model $model from '$src' ...\n"
84
+
85
+ cd "$models_path"
86
+
87
+ if [ -f "ggml-$model.bin" ]; then
88
+ printf "Model $model already exists. Skipping download.\n"
89
+ exit 0
90
+ fi
91
+
92
+ if [ -x "$(command -v wget)" ]; then
93
+ wget --no-config --quiet --show-progress -O ggml-$model.bin $src/$pfx-$model.bin
94
+ elif [ -x "$(command -v curl)" ]; then
95
+ curl -L --output ggml-$model.bin $src/$pfx-$model.bin
96
+ else
97
+ printf "Either wget or curl is required to download models.\n"
98
+ exit 1
99
+ fi
100
+
101
+
102
+ if [ $? -ne 0 ]; then
103
+ printf "Failed to download ggml model $model \n"
104
+ printf "Please try again later or download the original Whisper model files and convert them yourself.\n"
105
+ exit 1
106
+ fi
107
+
108
+ printf "Done! Model '$model' saved in 'models/ggml-$model.bin'\n"
109
+ printf "You can now use it like this:\n\n"
110
+ printf " $ ./main -m models/ggml-$model.bin -f samples/jfk.wav\n"
111
+ printf "\n"
for-tests-ggml-base.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ddf6ff3e5f9e0da794fee41652559af1efaa6118f3cc699f250991c515b6af2a
3
+ size 575451
for-tests-ggml-base.en.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bc042ca584ff1895897e95bffb34ccf357be46c1fca97cf7fbe32f2060aa9e8
3
+ size 586836
for-tests-ggml-large.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf987facca89f2d75a843d5467d91668fba5c23debf66a0644df53f0accf0cfb
3
+ size 575451
for-tests-ggml-medium.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f676437ddef445443e95fc77d88d59013e9f6dc05d25ebcbabd89abeefc5565b
3
+ size 575451
for-tests-ggml-medium.en.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52c051196f9b2737679722239bc7f649f4a3b0a84d418be0adfd7aed72480827
3
+ size 586836
for-tests-ggml-small.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3cd79f6d818b13aea6427e0c56ca97d6d82274585efb8bd25187a37b944024b
3
+ size 575451
for-tests-ggml-small.en.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5618c8b3cf34b1fa4493789eb92c9ff68796fb789a58180a8c4b3fb5b28789e2
3
+ size 586836
for-tests-ggml-tiny.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c486fb9f14a28b1c1dc252741a431646cc573450c900b9d9c406e10294aa01e6
3
+ size 575451
for-tests-ggml-tiny.en.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd6b7796204a1cdf7164666423034e6e1a7a3e9f5c22327b4b7974c4584bd82d
3
+ size 586836
generate-coreml-interface.sh ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #
3
+ # This generates:
4
+ # - coreml/whisper-encoder-impl.h and coreml/whisper-encoder-impl.m
5
+ # - coreml/whisper-decoder-impl.h and coreml/whisper-decoder-impl.m
6
+ #
7
+
8
+ wd=$(dirname "$0")
9
+ cd "$wd/../"
10
+
11
+ python3 models/convert-whisper-to-coreml.py --model tiny.en
12
+
13
+ mv -v models/coreml-encoder-tiny.en.mlpackage models/whisper-encoder-impl.mlpackage
14
+ xcrun coremlc generate models/whisper-encoder-impl.mlpackage coreml/
15
+ mv coreml/whisper_encoder_impl.h coreml/whisper-encoder-impl.h
16
+ mv coreml/whisper_encoder_impl.m coreml/whisper-encoder-impl.m
17
+ sed -i '' 's/whisper_encoder_impl\.h/whisper-encoder-impl.h/g' coreml/whisper-encoder-impl.m
18
+ sed -i '' 's/whisper_encoder_impl\.m/whisper-encoder-impl.m/g' coreml/whisper-encoder-impl.m
19
+ sed -i '' 's/whisper_encoder_impl\.h/whisper-encoder-impl.h/g' coreml/whisper-encoder-impl.h
20
+
21
+ mv -v models/coreml-decoder-tiny.en.mlpackage models/whisper-decoder-impl.mlpackage
22
+ xcrun coremlc generate models/whisper-decoder-impl.mlpackage coreml/
23
+ mv coreml/whisper_decoder_impl.h coreml/whisper-decoder-impl.h
24
+ mv coreml/whisper_decoder_impl.m coreml/whisper-decoder-impl.m
25
+ sed -i '' 's/whisper_decoder_impl\.h/whisper-decoder-impl.h/g' coreml/whisper-decoder-impl.m
26
+ sed -i '' 's/whisper_decoder_impl\.m/whisper-decoder-impl.m/g' coreml/whisper-decoder-impl.m
27
+ sed -i '' 's/whisper_decoder_impl\.h/whisper-decoder-impl.h/g' coreml/whisper-decoder-impl.h
28
+
29
+ rm -rfv models/whisper-encoder-impl.mlpackage models/whisper-decoder-impl.mlpackage
generate-coreml-model.sh ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Usage: ./generate-coreml-model.sh <model-name>
4
+ if [ $# -eq 0 ]; then
5
+ echo "No model name supplied"
6
+ echo "Usage for Whisper models: ./generate-coreml-model.sh <model-name>"
7
+ echo "Usage for HuggingFace models: ./generate-coreml-model.sh -h5 <model-name> <model-path>"
8
+ exit 1
9
+ elif [[ "$1" == "-h5" && $# != 3 ]]; then
10
+ echo "No model name and model path supplied for a HuggingFace model"
11
+ echo "Usage for HuggingFace models: ./generate-coreml-model.sh -h5 <model-name> <model-path>"
12
+ exit 1
13
+ fi
14
+
15
+ mname="$1"
16
+
17
+ wd=$(dirname "$0")
18
+ cd "$wd/../"
19
+
20
+ if [[ $mname == "-h5" ]]; then
21
+ mname="$2"
22
+ mpath="$3"
23
+ echo $mpath
24
+ python3 models/convert-h5-to-coreml.py --model-name $mname --model-path $mpath --encoder-only True
25
+ else
26
+ python3 models/convert-whisper-to-coreml.py --model $mname --encoder-only True
27
+ fi
28
+
29
+ xcrun coremlc compile models/coreml-encoder-${mname}.mlpackage models/
30
+ rm -rf models/ggml-${mname}-encoder.mlmodelc
31
+ mv -v models/coreml-encoder-${mname}.mlmodelc models/ggml-${mname}-encoder.mlmodelc
32
+
33
+ # TODO: decoder (sometime in the future maybe)
34
+ #xcrun coremlc compile models/whisper-decoder-${mname}.mlpackage models/
35
+ #rm -rf models/ggml-${mname}-decoder.mlmodelc
36
+ #mv -v models/coreml_decoder_${mname}.mlmodelc models/ggml-${mname}-decoder.mlmodelc
ggml-base.en-encoder.mlmodelc/analytics/coremldata.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:461c6790016895f31a85af19613c6a21d3b937f5fea6bc52387360a4100947e1
3
+ size 243
ggml-base.en-encoder.mlmodelc/coremldata.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97d47ff2029aaa5e922ecf427c1e9fccc08d7e7b8226be5c6f482fceaf583dd4
3
+ size 319
ggml-base.en-encoder.mlmodelc/metadata.json ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "metadataOutputVersion" : "3.0",
4
+ "storagePrecision" : "Float16",
5
+ "outputSchema" : [
6
+ {
7
+ "hasShapeFlexibility" : "0",
8
+ "isOptional" : "0",
9
+ "dataType" : "Float32",
10
+ "formattedType" : "MultiArray (Float32 1 × 1500 × 512)",
11
+ "shortDescription" : "",
12
+ "shape" : "[1, 1500, 512]",
13
+ "name" : "output",
14
+ "type" : "MultiArray"
15
+ }
16
+ ],
17
+ "modelParameters" : [
18
+
19
+ ],
20
+ "specificationVersion" : 6,
21
+ "mlProgramOperationTypeHistogram" : {
22
+ "Linear" : 36,
23
+ "Matmul" : 12,
24
+ "Cast" : 2,
25
+ "Conv" : 2,
26
+ "Softmax" : 6,
27
+ "Add" : 13,
28
+ "LayerNorm" : 13,
29
+ "Mul" : 12,
30
+ "Transpose" : 25,
31
+ "Gelu" : 8,
32
+ "Reshape" : 24
33
+ },
34
+ "computePrecision" : "Mixed (Float16, Float32, Int32)",
35
+ "isUpdatable" : "0",
36
+ "availability" : {
37
+ "macOS" : "12.0",
38
+ "tvOS" : "15.0",
39
+ "visionOS" : "1.0",
40
+ "watchOS" : "8.0",
41
+ "iOS" : "15.0",
42
+ "macCatalyst" : "15.0"
43
+ },
44
+ "modelType" : {
45
+ "name" : "MLModelType_mlProgram"
46
+ },
47
+ "userDefinedMetadata" : {
48
+ "com.github.apple.coremltools.source_dialect" : "TorchScript",
49
+ "com.github.apple.coremltools.source" : "torch==1.11.0",
50
+ "com.github.apple.coremltools.version" : "7.1"
51
+ },
52
+ "inputSchema" : [
53
+ {
54
+ "hasShapeFlexibility" : "0",
55
+ "isOptional" : "0",
56
+ "dataType" : "Float32",
57
+ "formattedType" : "MultiArray (Float32 1 × 80 × 3000)",
58
+ "shortDescription" : "",
59
+ "shape" : "[1, 80, 3000]",
60
+ "name" : "logmel_data",
61
+ "type" : "MultiArray"
62
+ }
63
+ ],
64
+ "generatedClassName" : "coreml_encoder_base_en",
65
+ "method" : "predict"
66
+ }
67
+ ]
ggml-base.en-encoder.mlmodelc/model.mil ADDED
@@ -0,0 +1,388 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ program(1.0)
2
+ [buildInfo = dict<tensor<string, []>, tensor<string, []>>({{"coremlc-component-MIL", "5.33.5"}, {"coremlc-version", "1877.40.3"}, {"coremltools-component-torch", "1.11.0"}, {"coremltools-source-dialect", "TorchScript"}, {"coremltools-version", "7.1"}})]
3
+ {
4
+ func main<ios15>(tensor<fp32, [1, 80, 3000]> logmel_data) {
5
+ tensor<int32, []> var_20 = const()[name = tensor<string, []>("op_20"), val = tensor<int32, []>(1)];
6
+ tensor<int32, [1]> var_28 = const()[name = tensor<string, []>("op_28"), val = tensor<int32, [1]>([1])];
7
+ tensor<int32, [1]> var_30 = const()[name = tensor<string, []>("op_30"), val = tensor<int32, [1]>([1])];
8
+ tensor<string, []> var_32_pad_type_0 = const()[name = tensor<string, []>("op_32_pad_type_0"), val = tensor<string, []>("custom")];
9
+ tensor<int32, [2]> var_32_pad_0 = const()[name = tensor<string, []>("op_32_pad_0"), val = tensor<int32, [2]>([1, 1])];
10
+ tensor<string, []> logmel_data_to_fp16_dtype_0 = const()[name = tensor<string, []>("logmel_data_to_fp16_dtype_0"), val = tensor<string, []>("fp16")];
11
+ tensor<fp16, [512, 80, 3]> weight_3_to_fp16 = const()[name = tensor<string, []>("weight_3_to_fp16"), val = tensor<fp16, [512, 80, 3]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(64)))];
12
+ tensor<fp16, [512]> bias_3_to_fp16 = const()[name = tensor<string, []>("bias_3_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(245888)))];
13
+ tensor<fp16, [1, 80, 3000]> cast_37 = cast(dtype = logmel_data_to_fp16_dtype_0, x = logmel_data)[name = tensor<string, []>("cast_37")];
14
+ tensor<fp16, [1, 512, 3000]> var_32_cast_fp16 = conv(bias = bias_3_to_fp16, dilations = var_30, groups = var_20, pad = var_32_pad_0, pad_type = var_32_pad_type_0, strides = var_28, weight = weight_3_to_fp16, x = cast_37)[name = tensor<string, []>("op_32_cast_fp16")];
15
+ tensor<string, []> input_1_mode_0 = const()[name = tensor<string, []>("input_1_mode_0"), val = tensor<string, []>("EXACT")];
16
+ tensor<fp16, [1, 512, 3000]> input_1_cast_fp16 = gelu(mode = input_1_mode_0, x = var_32_cast_fp16)[name = tensor<string, []>("input_1_cast_fp16")];
17
+ tensor<int32, []> var_36 = const()[name = tensor<string, []>("op_36"), val = tensor<int32, []>(1)];
18
+ tensor<int32, [1]> var_45 = const()[name = tensor<string, []>("op_45"), val = tensor<int32, [1]>([2])];
19
+ tensor<int32, [1]> var_47 = const()[name = tensor<string, []>("op_47"), val = tensor<int32, [1]>([1])];
20
+ tensor<string, []> var_49_pad_type_0 = const()[name = tensor<string, []>("op_49_pad_type_0"), val = tensor<string, []>("custom")];
21
+ tensor<int32, [2]> var_49_pad_0 = const()[name = tensor<string, []>("op_49_pad_0"), val = tensor<int32, [2]>([1, 1])];
22
+ tensor<fp16, [512, 512, 3]> weight_7_to_fp16 = const()[name = tensor<string, []>("weight_7_to_fp16"), val = tensor<fp16, [512, 512, 3]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(246976)))];
23
+ tensor<fp16, [512]> bias_7_to_fp16 = const()[name = tensor<string, []>("bias_7_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(1819904)))];
24
+ tensor<fp16, [1, 512, 1500]> var_49_cast_fp16 = conv(bias = bias_7_to_fp16, dilations = var_47, groups = var_36, pad = var_49_pad_0, pad_type = var_49_pad_type_0, strides = var_45, weight = weight_7_to_fp16, x = input_1_cast_fp16)[name = tensor<string, []>("op_49_cast_fp16")];
25
+ tensor<string, []> x_3_mode_0 = const()[name = tensor<string, []>("x_3_mode_0"), val = tensor<string, []>("EXACT")];
26
+ tensor<fp16, [1, 512, 1500]> x_3_cast_fp16 = gelu(mode = x_3_mode_0, x = var_49_cast_fp16)[name = tensor<string, []>("x_3_cast_fp16")];
27
+ tensor<int32, [3]> var_54 = const()[name = tensor<string, []>("op_54"), val = tensor<int32, [3]>([0, 2, 1])];
28
+ tensor<fp16, [1500, 512]> positional_embedding_to_fp16 = const()[name = tensor<string, []>("positional_embedding_to_fp16"), val = tensor<fp16, [1500, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(1820992)))];
29
+ tensor<fp16, [1, 1500, 512]> transpose_60 = transpose(perm = var_54, x = x_3_cast_fp16)[name = tensor<string, []>("transpose_60")];
30
+ tensor<fp16, [1, 1500, 512]> var_57_cast_fp16 = add(x = transpose_60, y = positional_embedding_to_fp16)[name = tensor<string, []>("op_57_cast_fp16")];
31
+ tensor<int32, []> var_70 = const()[name = tensor<string, []>("op_70"), val = tensor<int32, []>(-1)];
32
+ tensor<int32, [1]> var_87_axes_0 = const()[name = tensor<string, []>("op_87_axes_0"), val = tensor<int32, [1]>([-1])];
33
+ tensor<fp16, [512]> blocks_0_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_0_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3357056)))];
34
+ tensor<fp16, [512]> blocks_0_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_0_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3358144)))];
35
+ tensor<fp16, []> var_76_to_fp16 = const()[name = tensor<string, []>("op_76_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
36
+ tensor<fp16, [1, 1500, 512]> var_87_cast_fp16 = layer_norm(axes = var_87_axes_0, beta = blocks_0_attn_ln_bias_to_fp16, epsilon = var_76_to_fp16, gamma = blocks_0_attn_ln_weight_to_fp16, x = var_57_cast_fp16)[name = tensor<string, []>("op_87_cast_fp16")];
37
+ tensor<fp16, [512, 512]> var_98_to_fp16 = const()[name = tensor<string, []>("op_98_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3359232)))];
38
+ tensor<fp16, [512]> var_99_to_fp16 = const()[name = tensor<string, []>("op_99_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3883584)))];
39
+ tensor<fp16, [1, 1500, 512]> linear_0_cast_fp16 = linear(bias = var_99_to_fp16, weight = var_98_to_fp16, x = var_87_cast_fp16)[name = tensor<string, []>("linear_0_cast_fp16")];
40
+ tensor<fp16, [512, 512]> var_102_to_fp16 = const()[name = tensor<string, []>("op_102_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3884672)))];
41
+ tensor<fp16, [512]> linear_1_bias_0_to_fp16 = const()[name = tensor<string, []>("linear_1_bias_0_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(4409024)))];
42
+ tensor<fp16, [1, 1500, 512]> linear_1_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_102_to_fp16, x = var_87_cast_fp16)[name = tensor<string, []>("linear_1_cast_fp16")];
43
+ tensor<fp16, [512, 512]> var_106_to_fp16 = const()[name = tensor<string, []>("op_106_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(4410112)))];
44
+ tensor<fp16, [512]> var_107_to_fp16 = const()[name = tensor<string, []>("op_107_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(4934464)))];
45
+ tensor<fp16, [1, 1500, 512]> linear_2_cast_fp16 = linear(bias = var_107_to_fp16, weight = var_106_to_fp16, x = var_87_cast_fp16)[name = tensor<string, []>("linear_2_cast_fp16")];
46
+ tensor<int32, [4]> var_115 = const()[name = tensor<string, []>("op_115"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
47
+ tensor<fp16, [1, 1500, 8, 64]> var_116_cast_fp16 = reshape(shape = var_115, x = linear_0_cast_fp16)[name = tensor<string, []>("op_116_cast_fp16")];
48
+ tensor<fp16, [1, 1, 1, 1]> const_42_to_fp16 = const()[name = tensor<string, []>("const_42_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
49
+ tensor<fp16, [1, 1500, 8, 64]> q_3_cast_fp16 = mul(x = var_116_cast_fp16, y = const_42_to_fp16)[name = tensor<string, []>("q_3_cast_fp16")];
50
+ tensor<int32, [4]> var_122 = const()[name = tensor<string, []>("op_122"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
51
+ tensor<fp16, [1, 1500, 8, 64]> var_123_cast_fp16 = reshape(shape = var_122, x = linear_1_cast_fp16)[name = tensor<string, []>("op_123_cast_fp16")];
52
+ tensor<fp16, [1, 1, 1, 1]> const_43_to_fp16 = const()[name = tensor<string, []>("const_43_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
53
+ tensor<fp16, [1, 1500, 8, 64]> k_3_cast_fp16 = mul(x = var_123_cast_fp16, y = const_43_to_fp16)[name = tensor<string, []>("k_3_cast_fp16")];
54
+ tensor<int32, [4]> var_129 = const()[name = tensor<string, []>("op_129"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
55
+ tensor<fp16, [1, 1500, 8, 64]> var_130_cast_fp16 = reshape(shape = var_129, x = linear_2_cast_fp16)[name = tensor<string, []>("op_130_cast_fp16")];
56
+ tensor<int32, [4]> var_131 = const()[name = tensor<string, []>("op_131"), val = tensor<int32, [4]>([0, 2, 1, 3])];
57
+ tensor<bool, []> qk_1_transpose_x_0 = const()[name = tensor<string, []>("qk_1_transpose_x_0"), val = tensor<bool, []>(false)];
58
+ tensor<bool, []> qk_1_transpose_y_0 = const()[name = tensor<string, []>("qk_1_transpose_y_0"), val = tensor<bool, []>(false)];
59
+ tensor<int32, [4]> transpose_24_perm_0 = const()[name = tensor<string, []>("transpose_24_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
60
+ tensor<int32, [4]> transpose_25_perm_0 = const()[name = tensor<string, []>("transpose_25_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
61
+ tensor<fp16, [1, 8, 64, 1500]> transpose_57 = transpose(perm = transpose_25_perm_0, x = k_3_cast_fp16)[name = tensor<string, []>("transpose_57")];
62
+ tensor<fp16, [1, 8, 1500, 64]> transpose_58 = transpose(perm = transpose_24_perm_0, x = q_3_cast_fp16)[name = tensor<string, []>("transpose_58")];
63
+ tensor<fp16, [1, 8, 1500, 1500]> qk_1_cast_fp16 = matmul(transpose_x = qk_1_transpose_x_0, transpose_y = qk_1_transpose_y_0, x = transpose_58, y = transpose_57)[name = tensor<string, []>("qk_1_cast_fp16")];
64
+ tensor<fp16, [1, 8, 1500, 1500]> var_135_cast_fp16 = softmax(axis = var_70, x = qk_1_cast_fp16)[name = tensor<string, []>("op_135_cast_fp16")];
65
+ tensor<bool, []> var_137_transpose_x_0 = const()[name = tensor<string, []>("op_137_transpose_x_0"), val = tensor<bool, []>(false)];
66
+ tensor<bool, []> var_137_transpose_y_0 = const()[name = tensor<string, []>("op_137_transpose_y_0"), val = tensor<bool, []>(false)];
67
+ tensor<fp16, [1, 8, 1500, 64]> transpose_59 = transpose(perm = var_131, x = var_130_cast_fp16)[name = tensor<string, []>("transpose_59")];
68
+ tensor<fp16, [1, 8, 1500, 64]> var_137_cast_fp16 = matmul(transpose_x = var_137_transpose_x_0, transpose_y = var_137_transpose_y_0, x = var_135_cast_fp16, y = transpose_59)[name = tensor<string, []>("op_137_cast_fp16")];
69
+ tensor<int32, [4]> var_138 = const()[name = tensor<string, []>("op_138"), val = tensor<int32, [4]>([0, 2, 1, 3])];
70
+ tensor<int32, [3]> concat_0 = const()[name = tensor<string, []>("concat_0"), val = tensor<int32, [3]>([1, 1500, 512])];
71
+ tensor<fp16, [1, 1500, 8, 64]> transpose_56 = transpose(perm = var_138, x = var_137_cast_fp16)[name = tensor<string, []>("transpose_56")];
72
+ tensor<fp16, [1, 1500, 512]> x_11_cast_fp16 = reshape(shape = concat_0, x = transpose_56)[name = tensor<string, []>("x_11_cast_fp16")];
73
+ tensor<fp16, [512, 512]> var_143_to_fp16 = const()[name = tensor<string, []>("op_143_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(4935552)))];
74
+ tensor<fp16, [512]> var_144_to_fp16 = const()[name = tensor<string, []>("op_144_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(5459904)))];
75
+ tensor<fp16, [1, 1500, 512]> linear_3_cast_fp16 = linear(bias = var_144_to_fp16, weight = var_143_to_fp16, x = x_11_cast_fp16)[name = tensor<string, []>("linear_3_cast_fp16")];
76
+ tensor<fp16, [1, 1500, 512]> x_13_cast_fp16 = add(x = var_57_cast_fp16, y = linear_3_cast_fp16)[name = tensor<string, []>("x_13_cast_fp16")];
77
+ tensor<int32, [1]> var_151_axes_0 = const()[name = tensor<string, []>("op_151_axes_0"), val = tensor<int32, [1]>([-1])];
78
+ tensor<fp16, [512]> blocks_0_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_0_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(5460992)))];
79
+ tensor<fp16, [512]> blocks_0_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_0_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(5462080)))];
80
+ tensor<fp16, [1, 1500, 512]> var_151_cast_fp16 = layer_norm(axes = var_151_axes_0, beta = blocks_0_mlp_ln_bias_to_fp16, epsilon = var_76_to_fp16, gamma = blocks_0_mlp_ln_weight_to_fp16, x = x_13_cast_fp16)[name = tensor<string, []>("op_151_cast_fp16")];
81
+ tensor<fp16, [2048, 512]> var_160_to_fp16 = const()[name = tensor<string, []>("op_160_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(5463168)))];
82
+ tensor<fp16, [2048]> var_161_to_fp16 = const()[name = tensor<string, []>("op_161_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(7560384)))];
83
+ tensor<fp16, [1, 1500, 2048]> linear_4_cast_fp16 = linear(bias = var_161_to_fp16, weight = var_160_to_fp16, x = var_151_cast_fp16)[name = tensor<string, []>("linear_4_cast_fp16")];
84
+ tensor<string, []> x_17_mode_0 = const()[name = tensor<string, []>("x_17_mode_0"), val = tensor<string, []>("EXACT")];
85
+ tensor<fp16, [1, 1500, 2048]> x_17_cast_fp16 = gelu(mode = x_17_mode_0, x = linear_4_cast_fp16)[name = tensor<string, []>("x_17_cast_fp16")];
86
+ tensor<fp16, [512, 2048]> var_166_to_fp16 = const()[name = tensor<string, []>("op_166_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(7564544)))];
87
+ tensor<fp16, [512]> var_167_to_fp16 = const()[name = tensor<string, []>("op_167_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(9661760)))];
88
+ tensor<fp16, [1, 1500, 512]> linear_5_cast_fp16 = linear(bias = var_167_to_fp16, weight = var_166_to_fp16, x = x_17_cast_fp16)[name = tensor<string, []>("linear_5_cast_fp16")];
89
+ tensor<fp16, [1, 1500, 512]> x_19_cast_fp16 = add(x = x_13_cast_fp16, y = linear_5_cast_fp16)[name = tensor<string, []>("x_19_cast_fp16")];
90
+ tensor<int32, []> var_177 = const()[name = tensor<string, []>("op_177"), val = tensor<int32, []>(-1)];
91
+ tensor<int32, [1]> var_194_axes_0 = const()[name = tensor<string, []>("op_194_axes_0"), val = tensor<int32, [1]>([-1])];
92
+ tensor<fp16, [512]> blocks_1_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_1_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(9662848)))];
93
+ tensor<fp16, [512]> blocks_1_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_1_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(9663936)))];
94
+ tensor<fp16, []> var_183_to_fp16 = const()[name = tensor<string, []>("op_183_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
95
+ tensor<fp16, [1, 1500, 512]> var_194_cast_fp16 = layer_norm(axes = var_194_axes_0, beta = blocks_1_attn_ln_bias_to_fp16, epsilon = var_183_to_fp16, gamma = blocks_1_attn_ln_weight_to_fp16, x = x_19_cast_fp16)[name = tensor<string, []>("op_194_cast_fp16")];
96
+ tensor<fp16, [512, 512]> var_205_to_fp16 = const()[name = tensor<string, []>("op_205_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(9665024)))];
97
+ tensor<fp16, [512]> var_206_to_fp16 = const()[name = tensor<string, []>("op_206_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(10189376)))];
98
+ tensor<fp16, [1, 1500, 512]> linear_6_cast_fp16 = linear(bias = var_206_to_fp16, weight = var_205_to_fp16, x = var_194_cast_fp16)[name = tensor<string, []>("linear_6_cast_fp16")];
99
+ tensor<fp16, [512, 512]> var_209_to_fp16 = const()[name = tensor<string, []>("op_209_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(10190464)))];
100
+ tensor<fp16, [1, 1500, 512]> linear_7_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_209_to_fp16, x = var_194_cast_fp16)[name = tensor<string, []>("linear_7_cast_fp16")];
101
+ tensor<fp16, [512, 512]> var_213_to_fp16 = const()[name = tensor<string, []>("op_213_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(10714816)))];
102
+ tensor<fp16, [512]> var_214_to_fp16 = const()[name = tensor<string, []>("op_214_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11239168)))];
103
+ tensor<fp16, [1, 1500, 512]> linear_8_cast_fp16 = linear(bias = var_214_to_fp16, weight = var_213_to_fp16, x = var_194_cast_fp16)[name = tensor<string, []>("linear_8_cast_fp16")];
104
+ tensor<int32, [4]> var_222 = const()[name = tensor<string, []>("op_222"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
105
+ tensor<fp16, [1, 1500, 8, 64]> var_223_cast_fp16 = reshape(shape = var_222, x = linear_6_cast_fp16)[name = tensor<string, []>("op_223_cast_fp16")];
106
+ tensor<fp16, [1, 1, 1, 1]> const_44_to_fp16 = const()[name = tensor<string, []>("const_44_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
107
+ tensor<fp16, [1, 1500, 8, 64]> q_7_cast_fp16 = mul(x = var_223_cast_fp16, y = const_44_to_fp16)[name = tensor<string, []>("q_7_cast_fp16")];
108
+ tensor<int32, [4]> var_229 = const()[name = tensor<string, []>("op_229"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
109
+ tensor<fp16, [1, 1500, 8, 64]> var_230_cast_fp16 = reshape(shape = var_229, x = linear_7_cast_fp16)[name = tensor<string, []>("op_230_cast_fp16")];
110
+ tensor<fp16, [1, 1, 1, 1]> const_45_to_fp16 = const()[name = tensor<string, []>("const_45_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
111
+ tensor<fp16, [1, 1500, 8, 64]> k_7_cast_fp16 = mul(x = var_230_cast_fp16, y = const_45_to_fp16)[name = tensor<string, []>("k_7_cast_fp16")];
112
+ tensor<int32, [4]> var_236 = const()[name = tensor<string, []>("op_236"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
113
+ tensor<fp16, [1, 1500, 8, 64]> var_237_cast_fp16 = reshape(shape = var_236, x = linear_8_cast_fp16)[name = tensor<string, []>("op_237_cast_fp16")];
114
+ tensor<int32, [4]> var_238 = const()[name = tensor<string, []>("op_238"), val = tensor<int32, [4]>([0, 2, 1, 3])];
115
+ tensor<bool, []> qk_3_transpose_x_0 = const()[name = tensor<string, []>("qk_3_transpose_x_0"), val = tensor<bool, []>(false)];
116
+ tensor<bool, []> qk_3_transpose_y_0 = const()[name = tensor<string, []>("qk_3_transpose_y_0"), val = tensor<bool, []>(false)];
117
+ tensor<int32, [4]> transpose_26_perm_0 = const()[name = tensor<string, []>("transpose_26_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
118
+ tensor<int32, [4]> transpose_27_perm_0 = const()[name = tensor<string, []>("transpose_27_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
119
+ tensor<fp16, [1, 8, 64, 1500]> transpose_53 = transpose(perm = transpose_27_perm_0, x = k_7_cast_fp16)[name = tensor<string, []>("transpose_53")];
120
+ tensor<fp16, [1, 8, 1500, 64]> transpose_54 = transpose(perm = transpose_26_perm_0, x = q_7_cast_fp16)[name = tensor<string, []>("transpose_54")];
121
+ tensor<fp16, [1, 8, 1500, 1500]> qk_3_cast_fp16 = matmul(transpose_x = qk_3_transpose_x_0, transpose_y = qk_3_transpose_y_0, x = transpose_54, y = transpose_53)[name = tensor<string, []>("qk_3_cast_fp16")];
122
+ tensor<fp16, [1, 8, 1500, 1500]> var_242_cast_fp16 = softmax(axis = var_177, x = qk_3_cast_fp16)[name = tensor<string, []>("op_242_cast_fp16")];
123
+ tensor<bool, []> var_244_transpose_x_0 = const()[name = tensor<string, []>("op_244_transpose_x_0"), val = tensor<bool, []>(false)];
124
+ tensor<bool, []> var_244_transpose_y_0 = const()[name = tensor<string, []>("op_244_transpose_y_0"), val = tensor<bool, []>(false)];
125
+ tensor<fp16, [1, 8, 1500, 64]> transpose_55 = transpose(perm = var_238, x = var_237_cast_fp16)[name = tensor<string, []>("transpose_55")];
126
+ tensor<fp16, [1, 8, 1500, 64]> var_244_cast_fp16 = matmul(transpose_x = var_244_transpose_x_0, transpose_y = var_244_transpose_y_0, x = var_242_cast_fp16, y = transpose_55)[name = tensor<string, []>("op_244_cast_fp16")];
127
+ tensor<int32, [4]> var_245 = const()[name = tensor<string, []>("op_245"), val = tensor<int32, [4]>([0, 2, 1, 3])];
128
+ tensor<int32, [3]> concat_1 = const()[name = tensor<string, []>("concat_1"), val = tensor<int32, [3]>([1, 1500, 512])];
129
+ tensor<fp16, [1, 1500, 8, 64]> transpose_52 = transpose(perm = var_245, x = var_244_cast_fp16)[name = tensor<string, []>("transpose_52")];
130
+ tensor<fp16, [1, 1500, 512]> x_23_cast_fp16 = reshape(shape = concat_1, x = transpose_52)[name = tensor<string, []>("x_23_cast_fp16")];
131
+ tensor<fp16, [512, 512]> var_250_to_fp16 = const()[name = tensor<string, []>("op_250_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11240256)))];
132
+ tensor<fp16, [512]> var_251_to_fp16 = const()[name = tensor<string, []>("op_251_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11764608)))];
133
+ tensor<fp16, [1, 1500, 512]> linear_9_cast_fp16 = linear(bias = var_251_to_fp16, weight = var_250_to_fp16, x = x_23_cast_fp16)[name = tensor<string, []>("linear_9_cast_fp16")];
134
+ tensor<fp16, [1, 1500, 512]> x_25_cast_fp16 = add(x = x_19_cast_fp16, y = linear_9_cast_fp16)[name = tensor<string, []>("x_25_cast_fp16")];
135
+ tensor<int32, [1]> var_258_axes_0 = const()[name = tensor<string, []>("op_258_axes_0"), val = tensor<int32, [1]>([-1])];
136
+ tensor<fp16, [512]> blocks_1_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_1_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11765696)))];
137
+ tensor<fp16, [512]> blocks_1_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_1_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11766784)))];
138
+ tensor<fp16, [1, 1500, 512]> var_258_cast_fp16 = layer_norm(axes = var_258_axes_0, beta = blocks_1_mlp_ln_bias_to_fp16, epsilon = var_183_to_fp16, gamma = blocks_1_mlp_ln_weight_to_fp16, x = x_25_cast_fp16)[name = tensor<string, []>("op_258_cast_fp16")];
139
+ tensor<fp16, [2048, 512]> var_267_to_fp16 = const()[name = tensor<string, []>("op_267_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11767872)))];
140
+ tensor<fp16, [2048]> var_268_to_fp16 = const()[name = tensor<string, []>("op_268_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(13865088)))];
141
+ tensor<fp16, [1, 1500, 2048]> linear_10_cast_fp16 = linear(bias = var_268_to_fp16, weight = var_267_to_fp16, x = var_258_cast_fp16)[name = tensor<string, []>("linear_10_cast_fp16")];
142
+ tensor<string, []> x_29_mode_0 = const()[name = tensor<string, []>("x_29_mode_0"), val = tensor<string, []>("EXACT")];
143
+ tensor<fp16, [1, 1500, 2048]> x_29_cast_fp16 = gelu(mode = x_29_mode_0, x = linear_10_cast_fp16)[name = tensor<string, []>("x_29_cast_fp16")];
144
+ tensor<fp16, [512, 2048]> var_273_to_fp16 = const()[name = tensor<string, []>("op_273_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(13869248)))];
145
+ tensor<fp16, [512]> var_274_to_fp16 = const()[name = tensor<string, []>("op_274_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(15966464)))];
146
+ tensor<fp16, [1, 1500, 512]> linear_11_cast_fp16 = linear(bias = var_274_to_fp16, weight = var_273_to_fp16, x = x_29_cast_fp16)[name = tensor<string, []>("linear_11_cast_fp16")];
147
+ tensor<fp16, [1, 1500, 512]> x_31_cast_fp16 = add(x = x_25_cast_fp16, y = linear_11_cast_fp16)[name = tensor<string, []>("x_31_cast_fp16")];
148
+ tensor<int32, []> var_284 = const()[name = tensor<string, []>("op_284"), val = tensor<int32, []>(-1)];
149
+ tensor<int32, [1]> var_301_axes_0 = const()[name = tensor<string, []>("op_301_axes_0"), val = tensor<int32, [1]>([-1])];
150
+ tensor<fp16, [512]> blocks_2_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_2_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(15967552)))];
151
+ tensor<fp16, [512]> blocks_2_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_2_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(15968640)))];
152
+ tensor<fp16, []> var_290_to_fp16 = const()[name = tensor<string, []>("op_290_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
153
+ tensor<fp16, [1, 1500, 512]> var_301_cast_fp16 = layer_norm(axes = var_301_axes_0, beta = blocks_2_attn_ln_bias_to_fp16, epsilon = var_290_to_fp16, gamma = blocks_2_attn_ln_weight_to_fp16, x = x_31_cast_fp16)[name = tensor<string, []>("op_301_cast_fp16")];
154
+ tensor<fp16, [512, 512]> var_312_to_fp16 = const()[name = tensor<string, []>("op_312_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(15969728)))];
155
+ tensor<fp16, [512]> var_313_to_fp16 = const()[name = tensor<string, []>("op_313_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(16494080)))];
156
+ tensor<fp16, [1, 1500, 512]> linear_12_cast_fp16 = linear(bias = var_313_to_fp16, weight = var_312_to_fp16, x = var_301_cast_fp16)[name = tensor<string, []>("linear_12_cast_fp16")];
157
+ tensor<fp16, [512, 512]> var_316_to_fp16 = const()[name = tensor<string, []>("op_316_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(16495168)))];
158
+ tensor<fp16, [1, 1500, 512]> linear_13_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_316_to_fp16, x = var_301_cast_fp16)[name = tensor<string, []>("linear_13_cast_fp16")];
159
+ tensor<fp16, [512, 512]> var_320_to_fp16 = const()[name = tensor<string, []>("op_320_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(17019520)))];
160
+ tensor<fp16, [512]> var_321_to_fp16 = const()[name = tensor<string, []>("op_321_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(17543872)))];
161
+ tensor<fp16, [1, 1500, 512]> linear_14_cast_fp16 = linear(bias = var_321_to_fp16, weight = var_320_to_fp16, x = var_301_cast_fp16)[name = tensor<string, []>("linear_14_cast_fp16")];
162
+ tensor<int32, [4]> var_329 = const()[name = tensor<string, []>("op_329"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
163
+ tensor<fp16, [1, 1500, 8, 64]> var_330_cast_fp16 = reshape(shape = var_329, x = linear_12_cast_fp16)[name = tensor<string, []>("op_330_cast_fp16")];
164
+ tensor<fp16, [1, 1, 1, 1]> const_46_to_fp16 = const()[name = tensor<string, []>("const_46_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
165
+ tensor<fp16, [1, 1500, 8, 64]> q_11_cast_fp16 = mul(x = var_330_cast_fp16, y = const_46_to_fp16)[name = tensor<string, []>("q_11_cast_fp16")];
166
+ tensor<int32, [4]> var_336 = const()[name = tensor<string, []>("op_336"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
167
+ tensor<fp16, [1, 1500, 8, 64]> var_337_cast_fp16 = reshape(shape = var_336, x = linear_13_cast_fp16)[name = tensor<string, []>("op_337_cast_fp16")];
168
+ tensor<fp16, [1, 1, 1, 1]> const_47_to_fp16 = const()[name = tensor<string, []>("const_47_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
169
+ tensor<fp16, [1, 1500, 8, 64]> k_11_cast_fp16 = mul(x = var_337_cast_fp16, y = const_47_to_fp16)[name = tensor<string, []>("k_11_cast_fp16")];
170
+ tensor<int32, [4]> var_343 = const()[name = tensor<string, []>("op_343"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
171
+ tensor<fp16, [1, 1500, 8, 64]> var_344_cast_fp16 = reshape(shape = var_343, x = linear_14_cast_fp16)[name = tensor<string, []>("op_344_cast_fp16")];
172
+ tensor<int32, [4]> var_345 = const()[name = tensor<string, []>("op_345"), val = tensor<int32, [4]>([0, 2, 1, 3])];
173
+ tensor<bool, []> qk_5_transpose_x_0 = const()[name = tensor<string, []>("qk_5_transpose_x_0"), val = tensor<bool, []>(false)];
174
+ tensor<bool, []> qk_5_transpose_y_0 = const()[name = tensor<string, []>("qk_5_transpose_y_0"), val = tensor<bool, []>(false)];
175
+ tensor<int32, [4]> transpose_28_perm_0 = const()[name = tensor<string, []>("transpose_28_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
176
+ tensor<int32, [4]> transpose_29_perm_0 = const()[name = tensor<string, []>("transpose_29_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
177
+ tensor<fp16, [1, 8, 64, 1500]> transpose_49 = transpose(perm = transpose_29_perm_0, x = k_11_cast_fp16)[name = tensor<string, []>("transpose_49")];
178
+ tensor<fp16, [1, 8, 1500, 64]> transpose_50 = transpose(perm = transpose_28_perm_0, x = q_11_cast_fp16)[name = tensor<string, []>("transpose_50")];
179
+ tensor<fp16, [1, 8, 1500, 1500]> qk_5_cast_fp16 = matmul(transpose_x = qk_5_transpose_x_0, transpose_y = qk_5_transpose_y_0, x = transpose_50, y = transpose_49)[name = tensor<string, []>("qk_5_cast_fp16")];
180
+ tensor<fp16, [1, 8, 1500, 1500]> var_349_cast_fp16 = softmax(axis = var_284, x = qk_5_cast_fp16)[name = tensor<string, []>("op_349_cast_fp16")];
181
+ tensor<bool, []> var_351_transpose_x_0 = const()[name = tensor<string, []>("op_351_transpose_x_0"), val = tensor<bool, []>(false)];
182
+ tensor<bool, []> var_351_transpose_y_0 = const()[name = tensor<string, []>("op_351_transpose_y_0"), val = tensor<bool, []>(false)];
183
+ tensor<fp16, [1, 8, 1500, 64]> transpose_51 = transpose(perm = var_345, x = var_344_cast_fp16)[name = tensor<string, []>("transpose_51")];
184
+ tensor<fp16, [1, 8, 1500, 64]> var_351_cast_fp16 = matmul(transpose_x = var_351_transpose_x_0, transpose_y = var_351_transpose_y_0, x = var_349_cast_fp16, y = transpose_51)[name = tensor<string, []>("op_351_cast_fp16")];
185
+ tensor<int32, [4]> var_352 = const()[name = tensor<string, []>("op_352"), val = tensor<int32, [4]>([0, 2, 1, 3])];
186
+ tensor<int32, [3]> concat_2 = const()[name = tensor<string, []>("concat_2"), val = tensor<int32, [3]>([1, 1500, 512])];
187
+ tensor<fp16, [1, 1500, 8, 64]> transpose_48 = transpose(perm = var_352, x = var_351_cast_fp16)[name = tensor<string, []>("transpose_48")];
188
+ tensor<fp16, [1, 1500, 512]> x_35_cast_fp16 = reshape(shape = concat_2, x = transpose_48)[name = tensor<string, []>("x_35_cast_fp16")];
189
+ tensor<fp16, [512, 512]> var_357_to_fp16 = const()[name = tensor<string, []>("op_357_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(17544960)))];
190
+ tensor<fp16, [512]> var_358_to_fp16 = const()[name = tensor<string, []>("op_358_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(18069312)))];
191
+ tensor<fp16, [1, 1500, 512]> linear_15_cast_fp16 = linear(bias = var_358_to_fp16, weight = var_357_to_fp16, x = x_35_cast_fp16)[name = tensor<string, []>("linear_15_cast_fp16")];
192
+ tensor<fp16, [1, 1500, 512]> x_37_cast_fp16 = add(x = x_31_cast_fp16, y = linear_15_cast_fp16)[name = tensor<string, []>("x_37_cast_fp16")];
193
+ tensor<int32, [1]> var_365_axes_0 = const()[name = tensor<string, []>("op_365_axes_0"), val = tensor<int32, [1]>([-1])];
194
+ tensor<fp16, [512]> blocks_2_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_2_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(18070400)))];
195
+ tensor<fp16, [512]> blocks_2_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_2_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(18071488)))];
196
+ tensor<fp16, [1, 1500, 512]> var_365_cast_fp16 = layer_norm(axes = var_365_axes_0, beta = blocks_2_mlp_ln_bias_to_fp16, epsilon = var_290_to_fp16, gamma = blocks_2_mlp_ln_weight_to_fp16, x = x_37_cast_fp16)[name = tensor<string, []>("op_365_cast_fp16")];
197
+ tensor<fp16, [2048, 512]> var_374_to_fp16 = const()[name = tensor<string, []>("op_374_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(18072576)))];
198
+ tensor<fp16, [2048]> var_375_to_fp16 = const()[name = tensor<string, []>("op_375_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(20169792)))];
199
+ tensor<fp16, [1, 1500, 2048]> linear_16_cast_fp16 = linear(bias = var_375_to_fp16, weight = var_374_to_fp16, x = var_365_cast_fp16)[name = tensor<string, []>("linear_16_cast_fp16")];
200
+ tensor<string, []> x_41_mode_0 = const()[name = tensor<string, []>("x_41_mode_0"), val = tensor<string, []>("EXACT")];
201
+ tensor<fp16, [1, 1500, 2048]> x_41_cast_fp16 = gelu(mode = x_41_mode_0, x = linear_16_cast_fp16)[name = tensor<string, []>("x_41_cast_fp16")];
202
+ tensor<fp16, [512, 2048]> var_380_to_fp16 = const()[name = tensor<string, []>("op_380_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(20173952)))];
203
+ tensor<fp16, [512]> var_381_to_fp16 = const()[name = tensor<string, []>("op_381_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22271168)))];
204
+ tensor<fp16, [1, 1500, 512]> linear_17_cast_fp16 = linear(bias = var_381_to_fp16, weight = var_380_to_fp16, x = x_41_cast_fp16)[name = tensor<string, []>("linear_17_cast_fp16")];
205
+ tensor<fp16, [1, 1500, 512]> x_43_cast_fp16 = add(x = x_37_cast_fp16, y = linear_17_cast_fp16)[name = tensor<string, []>("x_43_cast_fp16")];
206
+ tensor<int32, []> var_391 = const()[name = tensor<string, []>("op_391"), val = tensor<int32, []>(-1)];
207
+ tensor<int32, [1]> var_408_axes_0 = const()[name = tensor<string, []>("op_408_axes_0"), val = tensor<int32, [1]>([-1])];
208
+ tensor<fp16, [512]> blocks_3_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_3_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22272256)))];
209
+ tensor<fp16, [512]> blocks_3_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_3_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22273344)))];
210
+ tensor<fp16, []> var_397_to_fp16 = const()[name = tensor<string, []>("op_397_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
211
+ tensor<fp16, [1, 1500, 512]> var_408_cast_fp16 = layer_norm(axes = var_408_axes_0, beta = blocks_3_attn_ln_bias_to_fp16, epsilon = var_397_to_fp16, gamma = blocks_3_attn_ln_weight_to_fp16, x = x_43_cast_fp16)[name = tensor<string, []>("op_408_cast_fp16")];
212
+ tensor<fp16, [512, 512]> var_419_to_fp16 = const()[name = tensor<string, []>("op_419_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22274432)))];
213
+ tensor<fp16, [512]> var_420_to_fp16 = const()[name = tensor<string, []>("op_420_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22798784)))];
214
+ tensor<fp16, [1, 1500, 512]> linear_18_cast_fp16 = linear(bias = var_420_to_fp16, weight = var_419_to_fp16, x = var_408_cast_fp16)[name = tensor<string, []>("linear_18_cast_fp16")];
215
+ tensor<fp16, [512, 512]> var_423_to_fp16 = const()[name = tensor<string, []>("op_423_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22799872)))];
216
+ tensor<fp16, [1, 1500, 512]> linear_19_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_423_to_fp16, x = var_408_cast_fp16)[name = tensor<string, []>("linear_19_cast_fp16")];
217
+ tensor<fp16, [512, 512]> var_427_to_fp16 = const()[name = tensor<string, []>("op_427_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(23324224)))];
218
+ tensor<fp16, [512]> var_428_to_fp16 = const()[name = tensor<string, []>("op_428_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(23848576)))];
219
+ tensor<fp16, [1, 1500, 512]> linear_20_cast_fp16 = linear(bias = var_428_to_fp16, weight = var_427_to_fp16, x = var_408_cast_fp16)[name = tensor<string, []>("linear_20_cast_fp16")];
220
+ tensor<int32, [4]> var_436 = const()[name = tensor<string, []>("op_436"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
221
+ tensor<fp16, [1, 1500, 8, 64]> var_437_cast_fp16 = reshape(shape = var_436, x = linear_18_cast_fp16)[name = tensor<string, []>("op_437_cast_fp16")];
222
+ tensor<fp16, [1, 1, 1, 1]> const_48_to_fp16 = const()[name = tensor<string, []>("const_48_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
223
+ tensor<fp16, [1, 1500, 8, 64]> q_15_cast_fp16 = mul(x = var_437_cast_fp16, y = const_48_to_fp16)[name = tensor<string, []>("q_15_cast_fp16")];
224
+ tensor<int32, [4]> var_443 = const()[name = tensor<string, []>("op_443"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
225
+ tensor<fp16, [1, 1500, 8, 64]> var_444_cast_fp16 = reshape(shape = var_443, x = linear_19_cast_fp16)[name = tensor<string, []>("op_444_cast_fp16")];
226
+ tensor<fp16, [1, 1, 1, 1]> const_49_to_fp16 = const()[name = tensor<string, []>("const_49_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
227
+ tensor<fp16, [1, 1500, 8, 64]> k_15_cast_fp16 = mul(x = var_444_cast_fp16, y = const_49_to_fp16)[name = tensor<string, []>("k_15_cast_fp16")];
228
+ tensor<int32, [4]> var_450 = const()[name = tensor<string, []>("op_450"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
229
+ tensor<fp16, [1, 1500, 8, 64]> var_451_cast_fp16 = reshape(shape = var_450, x = linear_20_cast_fp16)[name = tensor<string, []>("op_451_cast_fp16")];
230
+ tensor<int32, [4]> var_452 = const()[name = tensor<string, []>("op_452"), val = tensor<int32, [4]>([0, 2, 1, 3])];
231
+ tensor<bool, []> qk_7_transpose_x_0 = const()[name = tensor<string, []>("qk_7_transpose_x_0"), val = tensor<bool, []>(false)];
232
+ tensor<bool, []> qk_7_transpose_y_0 = const()[name = tensor<string, []>("qk_7_transpose_y_0"), val = tensor<bool, []>(false)];
233
+ tensor<int32, [4]> transpose_30_perm_0 = const()[name = tensor<string, []>("transpose_30_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
234
+ tensor<int32, [4]> transpose_31_perm_0 = const()[name = tensor<string, []>("transpose_31_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
235
+ tensor<fp16, [1, 8, 64, 1500]> transpose_45 = transpose(perm = transpose_31_perm_0, x = k_15_cast_fp16)[name = tensor<string, []>("transpose_45")];
236
+ tensor<fp16, [1, 8, 1500, 64]> transpose_46 = transpose(perm = transpose_30_perm_0, x = q_15_cast_fp16)[name = tensor<string, []>("transpose_46")];
237
+ tensor<fp16, [1, 8, 1500, 1500]> qk_7_cast_fp16 = matmul(transpose_x = qk_7_transpose_x_0, transpose_y = qk_7_transpose_y_0, x = transpose_46, y = transpose_45)[name = tensor<string, []>("qk_7_cast_fp16")];
238
+ tensor<fp16, [1, 8, 1500, 1500]> var_456_cast_fp16 = softmax(axis = var_391, x = qk_7_cast_fp16)[name = tensor<string, []>("op_456_cast_fp16")];
239
+ tensor<bool, []> var_458_transpose_x_0 = const()[name = tensor<string, []>("op_458_transpose_x_0"), val = tensor<bool, []>(false)];
240
+ tensor<bool, []> var_458_transpose_y_0 = const()[name = tensor<string, []>("op_458_transpose_y_0"), val = tensor<bool, []>(false)];
241
+ tensor<fp16, [1, 8, 1500, 64]> transpose_47 = transpose(perm = var_452, x = var_451_cast_fp16)[name = tensor<string, []>("transpose_47")];
242
+ tensor<fp16, [1, 8, 1500, 64]> var_458_cast_fp16 = matmul(transpose_x = var_458_transpose_x_0, transpose_y = var_458_transpose_y_0, x = var_456_cast_fp16, y = transpose_47)[name = tensor<string, []>("op_458_cast_fp16")];
243
+ tensor<int32, [4]> var_459 = const()[name = tensor<string, []>("op_459"), val = tensor<int32, [4]>([0, 2, 1, 3])];
244
+ tensor<int32, [3]> concat_3 = const()[name = tensor<string, []>("concat_3"), val = tensor<int32, [3]>([1, 1500, 512])];
245
+ tensor<fp16, [1, 1500, 8, 64]> transpose_44 = transpose(perm = var_459, x = var_458_cast_fp16)[name = tensor<string, []>("transpose_44")];
246
+ tensor<fp16, [1, 1500, 512]> x_47_cast_fp16 = reshape(shape = concat_3, x = transpose_44)[name = tensor<string, []>("x_47_cast_fp16")];
247
+ tensor<fp16, [512, 512]> var_464_to_fp16 = const()[name = tensor<string, []>("op_464_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(23849664)))];
248
+ tensor<fp16, [512]> var_465_to_fp16 = const()[name = tensor<string, []>("op_465_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(24374016)))];
249
+ tensor<fp16, [1, 1500, 512]> linear_21_cast_fp16 = linear(bias = var_465_to_fp16, weight = var_464_to_fp16, x = x_47_cast_fp16)[name = tensor<string, []>("linear_21_cast_fp16")];
250
+ tensor<fp16, [1, 1500, 512]> x_49_cast_fp16 = add(x = x_43_cast_fp16, y = linear_21_cast_fp16)[name = tensor<string, []>("x_49_cast_fp16")];
251
+ tensor<int32, [1]> var_472_axes_0 = const()[name = tensor<string, []>("op_472_axes_0"), val = tensor<int32, [1]>([-1])];
252
+ tensor<fp16, [512]> blocks_3_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_3_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(24375104)))];
253
+ tensor<fp16, [512]> blocks_3_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_3_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(24376192)))];
254
+ tensor<fp16, [1, 1500, 512]> var_472_cast_fp16 = layer_norm(axes = var_472_axes_0, beta = blocks_3_mlp_ln_bias_to_fp16, epsilon = var_397_to_fp16, gamma = blocks_3_mlp_ln_weight_to_fp16, x = x_49_cast_fp16)[name = tensor<string, []>("op_472_cast_fp16")];
255
+ tensor<fp16, [2048, 512]> var_481_to_fp16 = const()[name = tensor<string, []>("op_481_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(24377280)))];
256
+ tensor<fp16, [2048]> var_482_to_fp16 = const()[name = tensor<string, []>("op_482_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(26474496)))];
257
+ tensor<fp16, [1, 1500, 2048]> linear_22_cast_fp16 = linear(bias = var_482_to_fp16, weight = var_481_to_fp16, x = var_472_cast_fp16)[name = tensor<string, []>("linear_22_cast_fp16")];
258
+ tensor<string, []> x_53_mode_0 = const()[name = tensor<string, []>("x_53_mode_0"), val = tensor<string, []>("EXACT")];
259
+ tensor<fp16, [1, 1500, 2048]> x_53_cast_fp16 = gelu(mode = x_53_mode_0, x = linear_22_cast_fp16)[name = tensor<string, []>("x_53_cast_fp16")];
260
+ tensor<fp16, [512, 2048]> var_487_to_fp16 = const()[name = tensor<string, []>("op_487_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(26478656)))];
261
+ tensor<fp16, [512]> var_488_to_fp16 = const()[name = tensor<string, []>("op_488_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(28575872)))];
262
+ tensor<fp16, [1, 1500, 512]> linear_23_cast_fp16 = linear(bias = var_488_to_fp16, weight = var_487_to_fp16, x = x_53_cast_fp16)[name = tensor<string, []>("linear_23_cast_fp16")];
263
+ tensor<fp16, [1, 1500, 512]> x_55_cast_fp16 = add(x = x_49_cast_fp16, y = linear_23_cast_fp16)[name = tensor<string, []>("x_55_cast_fp16")];
264
+ tensor<int32, []> var_498 = const()[name = tensor<string, []>("op_498"), val = tensor<int32, []>(-1)];
265
+ tensor<int32, [1]> var_515_axes_0 = const()[name = tensor<string, []>("op_515_axes_0"), val = tensor<int32, [1]>([-1])];
266
+ tensor<fp16, [512]> blocks_4_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_4_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(28576960)))];
267
+ tensor<fp16, [512]> blocks_4_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_4_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(28578048)))];
268
+ tensor<fp16, []> var_504_to_fp16 = const()[name = tensor<string, []>("op_504_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
269
+ tensor<fp16, [1, 1500, 512]> var_515_cast_fp16 = layer_norm(axes = var_515_axes_0, beta = blocks_4_attn_ln_bias_to_fp16, epsilon = var_504_to_fp16, gamma = blocks_4_attn_ln_weight_to_fp16, x = x_55_cast_fp16)[name = tensor<string, []>("op_515_cast_fp16")];
270
+ tensor<fp16, [512, 512]> var_526_to_fp16 = const()[name = tensor<string, []>("op_526_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(28579136)))];
271
+ tensor<fp16, [512]> var_527_to_fp16 = const()[name = tensor<string, []>("op_527_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(29103488)))];
272
+ tensor<fp16, [1, 1500, 512]> linear_24_cast_fp16 = linear(bias = var_527_to_fp16, weight = var_526_to_fp16, x = var_515_cast_fp16)[name = tensor<string, []>("linear_24_cast_fp16")];
273
+ tensor<fp16, [512, 512]> var_530_to_fp16 = const()[name = tensor<string, []>("op_530_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(29104576)))];
274
+ tensor<fp16, [1, 1500, 512]> linear_25_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_530_to_fp16, x = var_515_cast_fp16)[name = tensor<string, []>("linear_25_cast_fp16")];
275
+ tensor<fp16, [512, 512]> var_534_to_fp16 = const()[name = tensor<string, []>("op_534_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(29628928)))];
276
+ tensor<fp16, [512]> var_535_to_fp16 = const()[name = tensor<string, []>("op_535_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30153280)))];
277
+ tensor<fp16, [1, 1500, 512]> linear_26_cast_fp16 = linear(bias = var_535_to_fp16, weight = var_534_to_fp16, x = var_515_cast_fp16)[name = tensor<string, []>("linear_26_cast_fp16")];
278
+ tensor<int32, [4]> var_543 = const()[name = tensor<string, []>("op_543"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
279
+ tensor<fp16, [1, 1500, 8, 64]> var_544_cast_fp16 = reshape(shape = var_543, x = linear_24_cast_fp16)[name = tensor<string, []>("op_544_cast_fp16")];
280
+ tensor<fp16, [1, 1, 1, 1]> const_50_to_fp16 = const()[name = tensor<string, []>("const_50_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
281
+ tensor<fp16, [1, 1500, 8, 64]> q_19_cast_fp16 = mul(x = var_544_cast_fp16, y = const_50_to_fp16)[name = tensor<string, []>("q_19_cast_fp16")];
282
+ tensor<int32, [4]> var_550 = const()[name = tensor<string, []>("op_550"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
283
+ tensor<fp16, [1, 1500, 8, 64]> var_551_cast_fp16 = reshape(shape = var_550, x = linear_25_cast_fp16)[name = tensor<string, []>("op_551_cast_fp16")];
284
+ tensor<fp16, [1, 1, 1, 1]> const_51_to_fp16 = const()[name = tensor<string, []>("const_51_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
285
+ tensor<fp16, [1, 1500, 8, 64]> k_19_cast_fp16 = mul(x = var_551_cast_fp16, y = const_51_to_fp16)[name = tensor<string, []>("k_19_cast_fp16")];
286
+ tensor<int32, [4]> var_557 = const()[name = tensor<string, []>("op_557"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
287
+ tensor<fp16, [1, 1500, 8, 64]> var_558_cast_fp16 = reshape(shape = var_557, x = linear_26_cast_fp16)[name = tensor<string, []>("op_558_cast_fp16")];
288
+ tensor<int32, [4]> var_559 = const()[name = tensor<string, []>("op_559"), val = tensor<int32, [4]>([0, 2, 1, 3])];
289
+ tensor<bool, []> qk_9_transpose_x_0 = const()[name = tensor<string, []>("qk_9_transpose_x_0"), val = tensor<bool, []>(false)];
290
+ tensor<bool, []> qk_9_transpose_y_0 = const()[name = tensor<string, []>("qk_9_transpose_y_0"), val = tensor<bool, []>(false)];
291
+ tensor<int32, [4]> transpose_32_perm_0 = const()[name = tensor<string, []>("transpose_32_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
292
+ tensor<int32, [4]> transpose_33_perm_0 = const()[name = tensor<string, []>("transpose_33_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
293
+ tensor<fp16, [1, 8, 64, 1500]> transpose_41 = transpose(perm = transpose_33_perm_0, x = k_19_cast_fp16)[name = tensor<string, []>("transpose_41")];
294
+ tensor<fp16, [1, 8, 1500, 64]> transpose_42 = transpose(perm = transpose_32_perm_0, x = q_19_cast_fp16)[name = tensor<string, []>("transpose_42")];
295
+ tensor<fp16, [1, 8, 1500, 1500]> qk_9_cast_fp16 = matmul(transpose_x = qk_9_transpose_x_0, transpose_y = qk_9_transpose_y_0, x = transpose_42, y = transpose_41)[name = tensor<string, []>("qk_9_cast_fp16")];
296
+ tensor<fp16, [1, 8, 1500, 1500]> var_563_cast_fp16 = softmax(axis = var_498, x = qk_9_cast_fp16)[name = tensor<string, []>("op_563_cast_fp16")];
297
+ tensor<bool, []> var_565_transpose_x_0 = const()[name = tensor<string, []>("op_565_transpose_x_0"), val = tensor<bool, []>(false)];
298
+ tensor<bool, []> var_565_transpose_y_0 = const()[name = tensor<string, []>("op_565_transpose_y_0"), val = tensor<bool, []>(false)];
299
+ tensor<fp16, [1, 8, 1500, 64]> transpose_43 = transpose(perm = var_559, x = var_558_cast_fp16)[name = tensor<string, []>("transpose_43")];
300
+ tensor<fp16, [1, 8, 1500, 64]> var_565_cast_fp16 = matmul(transpose_x = var_565_transpose_x_0, transpose_y = var_565_transpose_y_0, x = var_563_cast_fp16, y = transpose_43)[name = tensor<string, []>("op_565_cast_fp16")];
301
+ tensor<int32, [4]> var_566 = const()[name = tensor<string, []>("op_566"), val = tensor<int32, [4]>([0, 2, 1, 3])];
302
+ tensor<int32, [3]> concat_4 = const()[name = tensor<string, []>("concat_4"), val = tensor<int32, [3]>([1, 1500, 512])];
303
+ tensor<fp16, [1, 1500, 8, 64]> transpose_40 = transpose(perm = var_566, x = var_565_cast_fp16)[name = tensor<string, []>("transpose_40")];
304
+ tensor<fp16, [1, 1500, 512]> x_59_cast_fp16 = reshape(shape = concat_4, x = transpose_40)[name = tensor<string, []>("x_59_cast_fp16")];
305
+ tensor<fp16, [512, 512]> var_571_to_fp16 = const()[name = tensor<string, []>("op_571_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30154368)))];
306
+ tensor<fp16, [512]> var_572_to_fp16 = const()[name = tensor<string, []>("op_572_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30678720)))];
307
+ tensor<fp16, [1, 1500, 512]> linear_27_cast_fp16 = linear(bias = var_572_to_fp16, weight = var_571_to_fp16, x = x_59_cast_fp16)[name = tensor<string, []>("linear_27_cast_fp16")];
308
+ tensor<fp16, [1, 1500, 512]> x_61_cast_fp16 = add(x = x_55_cast_fp16, y = linear_27_cast_fp16)[name = tensor<string, []>("x_61_cast_fp16")];
309
+ tensor<int32, [1]> var_579_axes_0 = const()[name = tensor<string, []>("op_579_axes_0"), val = tensor<int32, [1]>([-1])];
310
+ tensor<fp16, [512]> blocks_4_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_4_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30679808)))];
311
+ tensor<fp16, [512]> blocks_4_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_4_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30680896)))];
312
+ tensor<fp16, [1, 1500, 512]> var_579_cast_fp16 = layer_norm(axes = var_579_axes_0, beta = blocks_4_mlp_ln_bias_to_fp16, epsilon = var_504_to_fp16, gamma = blocks_4_mlp_ln_weight_to_fp16, x = x_61_cast_fp16)[name = tensor<string, []>("op_579_cast_fp16")];
313
+ tensor<fp16, [2048, 512]> var_588_to_fp16 = const()[name = tensor<string, []>("op_588_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30681984)))];
314
+ tensor<fp16, [2048]> var_589_to_fp16 = const()[name = tensor<string, []>("op_589_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(32779200)))];
315
+ tensor<fp16, [1, 1500, 2048]> linear_28_cast_fp16 = linear(bias = var_589_to_fp16, weight = var_588_to_fp16, x = var_579_cast_fp16)[name = tensor<string, []>("linear_28_cast_fp16")];
316
+ tensor<string, []> x_65_mode_0 = const()[name = tensor<string, []>("x_65_mode_0"), val = tensor<string, []>("EXACT")];
317
+ tensor<fp16, [1, 1500, 2048]> x_65_cast_fp16 = gelu(mode = x_65_mode_0, x = linear_28_cast_fp16)[name = tensor<string, []>("x_65_cast_fp16")];
318
+ tensor<fp16, [512, 2048]> var_594_to_fp16 = const()[name = tensor<string, []>("op_594_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(32783360)))];
319
+ tensor<fp16, [512]> var_595_to_fp16 = const()[name = tensor<string, []>("op_595_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(34880576)))];
320
+ tensor<fp16, [1, 1500, 512]> linear_29_cast_fp16 = linear(bias = var_595_to_fp16, weight = var_594_to_fp16, x = x_65_cast_fp16)[name = tensor<string, []>("linear_29_cast_fp16")];
321
+ tensor<fp16, [1, 1500, 512]> x_67_cast_fp16 = add(x = x_61_cast_fp16, y = linear_29_cast_fp16)[name = tensor<string, []>("x_67_cast_fp16")];
322
+ tensor<int32, []> var_605 = const()[name = tensor<string, []>("op_605"), val = tensor<int32, []>(-1)];
323
+ tensor<int32, [1]> var_622_axes_0 = const()[name = tensor<string, []>("op_622_axes_0"), val = tensor<int32, [1]>([-1])];
324
+ tensor<fp16, [512]> blocks_5_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_5_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(34881664)))];
325
+ tensor<fp16, [512]> blocks_5_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_5_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(34882752)))];
326
+ tensor<fp16, []> var_611_to_fp16 = const()[name = tensor<string, []>("op_611_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
327
+ tensor<fp16, [1, 1500, 512]> var_622_cast_fp16 = layer_norm(axes = var_622_axes_0, beta = blocks_5_attn_ln_bias_to_fp16, epsilon = var_611_to_fp16, gamma = blocks_5_attn_ln_weight_to_fp16, x = x_67_cast_fp16)[name = tensor<string, []>("op_622_cast_fp16")];
328
+ tensor<fp16, [512, 512]> var_633_to_fp16 = const()[name = tensor<string, []>("op_633_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(34883840)))];
329
+ tensor<fp16, [512]> var_634_to_fp16 = const()[name = tensor<string, []>("op_634_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(35408192)))];
330
+ tensor<fp16, [1, 1500, 512]> linear_30_cast_fp16 = linear(bias = var_634_to_fp16, weight = var_633_to_fp16, x = var_622_cast_fp16)[name = tensor<string, []>("linear_30_cast_fp16")];
331
+ tensor<fp16, [512, 512]> var_637_to_fp16 = const()[name = tensor<string, []>("op_637_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(35409280)))];
332
+ tensor<fp16, [1, 1500, 512]> linear_31_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_637_to_fp16, x = var_622_cast_fp16)[name = tensor<string, []>("linear_31_cast_fp16")];
333
+ tensor<fp16, [512, 512]> var_641_to_fp16 = const()[name = tensor<string, []>("op_641_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(35933632)))];
334
+ tensor<fp16, [512]> var_642_to_fp16 = const()[name = tensor<string, []>("op_642_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36457984)))];
335
+ tensor<fp16, [1, 1500, 512]> linear_32_cast_fp16 = linear(bias = var_642_to_fp16, weight = var_641_to_fp16, x = var_622_cast_fp16)[name = tensor<string, []>("linear_32_cast_fp16")];
336
+ tensor<int32, [4]> var_650 = const()[name = tensor<string, []>("op_650"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
337
+ tensor<fp16, [1, 1500, 8, 64]> var_651_cast_fp16 = reshape(shape = var_650, x = linear_30_cast_fp16)[name = tensor<string, []>("op_651_cast_fp16")];
338
+ tensor<fp16, [1, 1, 1, 1]> const_52_to_fp16 = const()[name = tensor<string, []>("const_52_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
339
+ tensor<fp16, [1, 1500, 8, 64]> q_cast_fp16 = mul(x = var_651_cast_fp16, y = const_52_to_fp16)[name = tensor<string, []>("q_cast_fp16")];
340
+ tensor<int32, [4]> var_657 = const()[name = tensor<string, []>("op_657"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
341
+ tensor<fp16, [1, 1500, 8, 64]> var_658_cast_fp16 = reshape(shape = var_657, x = linear_31_cast_fp16)[name = tensor<string, []>("op_658_cast_fp16")];
342
+ tensor<fp16, [1, 1, 1, 1]> const_53_to_fp16 = const()[name = tensor<string, []>("const_53_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
343
+ tensor<fp16, [1, 1500, 8, 64]> k_cast_fp16 = mul(x = var_658_cast_fp16, y = const_53_to_fp16)[name = tensor<string, []>("k_cast_fp16")];
344
+ tensor<int32, [4]> var_664 = const()[name = tensor<string, []>("op_664"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
345
+ tensor<fp16, [1, 1500, 8, 64]> var_665_cast_fp16 = reshape(shape = var_664, x = linear_32_cast_fp16)[name = tensor<string, []>("op_665_cast_fp16")];
346
+ tensor<int32, [4]> var_666 = const()[name = tensor<string, []>("op_666"), val = tensor<int32, [4]>([0, 2, 1, 3])];
347
+ tensor<bool, []> qk_transpose_x_0 = const()[name = tensor<string, []>("qk_transpose_x_0"), val = tensor<bool, []>(false)];
348
+ tensor<bool, []> qk_transpose_y_0 = const()[name = tensor<string, []>("qk_transpose_y_0"), val = tensor<bool, []>(false)];
349
+ tensor<int32, [4]> transpose_34_perm_0 = const()[name = tensor<string, []>("transpose_34_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
350
+ tensor<int32, [4]> transpose_35_perm_0 = const()[name = tensor<string, []>("transpose_35_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
351
+ tensor<fp16, [1, 8, 64, 1500]> transpose_37 = transpose(perm = transpose_35_perm_0, x = k_cast_fp16)[name = tensor<string, []>("transpose_37")];
352
+ tensor<fp16, [1, 8, 1500, 64]> transpose_38 = transpose(perm = transpose_34_perm_0, x = q_cast_fp16)[name = tensor<string, []>("transpose_38")];
353
+ tensor<fp16, [1, 8, 1500, 1500]> qk_cast_fp16 = matmul(transpose_x = qk_transpose_x_0, transpose_y = qk_transpose_y_0, x = transpose_38, y = transpose_37)[name = tensor<string, []>("qk_cast_fp16")];
354
+ tensor<fp16, [1, 8, 1500, 1500]> var_670_cast_fp16 = softmax(axis = var_605, x = qk_cast_fp16)[name = tensor<string, []>("op_670_cast_fp16")];
355
+ tensor<bool, []> var_672_transpose_x_0 = const()[name = tensor<string, []>("op_672_transpose_x_0"), val = tensor<bool, []>(false)];
356
+ tensor<bool, []> var_672_transpose_y_0 = const()[name = tensor<string, []>("op_672_transpose_y_0"), val = tensor<bool, []>(false)];
357
+ tensor<fp16, [1, 8, 1500, 64]> transpose_39 = transpose(perm = var_666, x = var_665_cast_fp16)[name = tensor<string, []>("transpose_39")];
358
+ tensor<fp16, [1, 8, 1500, 64]> var_672_cast_fp16 = matmul(transpose_x = var_672_transpose_x_0, transpose_y = var_672_transpose_y_0, x = var_670_cast_fp16, y = transpose_39)[name = tensor<string, []>("op_672_cast_fp16")];
359
+ tensor<int32, [4]> var_673 = const()[name = tensor<string, []>("op_673"), val = tensor<int32, [4]>([0, 2, 1, 3])];
360
+ tensor<int32, [3]> concat_5 = const()[name = tensor<string, []>("concat_5"), val = tensor<int32, [3]>([1, 1500, 512])];
361
+ tensor<fp16, [1, 1500, 8, 64]> transpose_36 = transpose(perm = var_673, x = var_672_cast_fp16)[name = tensor<string, []>("transpose_36")];
362
+ tensor<fp16, [1, 1500, 512]> x_71_cast_fp16 = reshape(shape = concat_5, x = transpose_36)[name = tensor<string, []>("x_71_cast_fp16")];
363
+ tensor<fp16, [512, 512]> var_678_to_fp16 = const()[name = tensor<string, []>("op_678_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36459072)))];
364
+ tensor<fp16, [512]> var_679_to_fp16 = const()[name = tensor<string, []>("op_679_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36983424)))];
365
+ tensor<fp16, [1, 1500, 512]> linear_33_cast_fp16 = linear(bias = var_679_to_fp16, weight = var_678_to_fp16, x = x_71_cast_fp16)[name = tensor<string, []>("linear_33_cast_fp16")];
366
+ tensor<fp16, [1, 1500, 512]> x_73_cast_fp16 = add(x = x_67_cast_fp16, y = linear_33_cast_fp16)[name = tensor<string, []>("x_73_cast_fp16")];
367
+ tensor<int32, [1]> var_686_axes_0 = const()[name = tensor<string, []>("op_686_axes_0"), val = tensor<int32, [1]>([-1])];
368
+ tensor<fp16, [512]> blocks_5_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_5_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36984512)))];
369
+ tensor<fp16, [512]> blocks_5_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_5_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36985600)))];
370
+ tensor<fp16, [1, 1500, 512]> var_686_cast_fp16 = layer_norm(axes = var_686_axes_0, beta = blocks_5_mlp_ln_bias_to_fp16, epsilon = var_611_to_fp16, gamma = blocks_5_mlp_ln_weight_to_fp16, x = x_73_cast_fp16)[name = tensor<string, []>("op_686_cast_fp16")];
371
+ tensor<fp16, [2048, 512]> var_695_to_fp16 = const()[name = tensor<string, []>("op_695_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36986688)))];
372
+ tensor<fp16, [2048]> var_696_to_fp16 = const()[name = tensor<string, []>("op_696_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(39083904)))];
373
+ tensor<fp16, [1, 1500, 2048]> linear_34_cast_fp16 = linear(bias = var_696_to_fp16, weight = var_695_to_fp16, x = var_686_cast_fp16)[name = tensor<string, []>("linear_34_cast_fp16")];
374
+ tensor<string, []> x_77_mode_0 = const()[name = tensor<string, []>("x_77_mode_0"), val = tensor<string, []>("EXACT")];
375
+ tensor<fp16, [1, 1500, 2048]> x_77_cast_fp16 = gelu(mode = x_77_mode_0, x = linear_34_cast_fp16)[name = tensor<string, []>("x_77_cast_fp16")];
376
+ tensor<fp16, [512, 2048]> var_701_to_fp16 = const()[name = tensor<string, []>("op_701_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(39088064)))];
377
+ tensor<fp16, [512]> var_702_to_fp16 = const()[name = tensor<string, []>("op_702_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(41185280)))];
378
+ tensor<fp16, [1, 1500, 512]> linear_35_cast_fp16 = linear(bias = var_702_to_fp16, weight = var_701_to_fp16, x = x_77_cast_fp16)[name = tensor<string, []>("linear_35_cast_fp16")];
379
+ tensor<fp16, [1, 1500, 512]> x_cast_fp16 = add(x = x_73_cast_fp16, y = linear_35_cast_fp16)[name = tensor<string, []>("x_cast_fp16")];
380
+ tensor<int32, [1]> var_716_axes_0 = const()[name = tensor<string, []>("op_716_axes_0"), val = tensor<int32, [1]>([-1])];
381
+ tensor<fp16, [512]> ln_post_weight_to_fp16 = const()[name = tensor<string, []>("ln_post_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(41186368)))];
382
+ tensor<fp16, [512]> ln_post_bias_to_fp16 = const()[name = tensor<string, []>("ln_post_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(41187456)))];
383
+ tensor<fp16, []> var_707_to_fp16 = const()[name = tensor<string, []>("op_707_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
384
+ tensor<fp16, [1, 1500, 512]> var_716_cast_fp16 = layer_norm(axes = var_716_axes_0, beta = ln_post_bias_to_fp16, epsilon = var_707_to_fp16, gamma = ln_post_weight_to_fp16, x = x_cast_fp16)[name = tensor<string, []>("op_716_cast_fp16")];
385
+ tensor<string, []> var_716_cast_fp16_to_fp32_dtype_0 = const()[name = tensor<string, []>("op_716_cast_fp16_to_fp32_dtype_0"), val = tensor<string, []>("fp32")];
386
+ tensor<fp32, [1, 1500, 512]> output = cast(dtype = var_716_cast_fp16_to_fp32_dtype_0, x = var_716_cast_fp16)[name = tensor<string, []>("cast_36")];
387
+ } -> (output);
388
+ }
ggml-base.en-encoder.mlmodelc/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc998211e55f0972c70e3d29103477cfe8c6dd485cd68438951f83fa3ee3b770
3
+ size 41188544
ggml-base.en.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a03779c86df3323075f5e796cb2ce5029f00ec8869eee3fdfb897afe36c6d002
3
+ size 147964211
ggml_to_pt.py ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import struct
2
+ import torch
3
+ import numpy as np
4
+ from collections import OrderedDict
5
+ from pathlib import Path
6
+ import sys
7
+
8
+ if len(sys.argv) < 3:
9
+ print(
10
+ "Usage: convert-ggml-to-pt.py model.bin dir-output\n")
11
+ sys.exit(1)
12
+
13
+ fname_inp = Path(sys.argv[1])
14
+ dir_out = Path(sys.argv[2])
15
+ fname_out = dir_out / "torch-model.pt"
16
+
17
+
18
+
19
+ # Open the ggml file
20
+ with open(fname_inp, "rb") as f:
21
+ # Read magic number and hyperparameters
22
+ magic_number, n_vocab, n_audio_ctx, n_audio_state, n_audio_head, n_audio_layer, n_text_ctx, n_text_state, n_text_head, n_text_layer, n_mels, use_f16 = struct.unpack("12i", f.read(48))
23
+ print(f"Magic number: {magic_number}")
24
+ print(f"Vocab size: {n_vocab}")
25
+ print(f"Audio context size: {n_audio_ctx}")
26
+ print(f"Audio state size: {n_audio_state}")
27
+ print(f"Audio head size: {n_audio_head}")
28
+ print(f"Audio layer size: {n_audio_layer}")
29
+ print(f"Text context size: {n_text_ctx}")
30
+ print(f"Text head size: {n_text_head}")
31
+ print(f"Mel size: {n_mels}")
32
+ # Read mel filters
33
+ # mel_filters = np.fromfile(f, dtype=np.float32, count=n_mels * 2).reshape(n_mels, 2)
34
+ # print(f"Mel filters: {mel_filters}")
35
+ filters_shape_0 = struct.unpack("i", f.read(4))[0]
36
+ print(f"Filters shape 0: {filters_shape_0}")
37
+ filters_shape_1 = struct.unpack("i", f.read(4))[0]
38
+ print(f"Filters shape 1: {filters_shape_1}")
39
+
40
+ # Read tokenizer tokens
41
+ # bytes = f.read(4)
42
+ # print(bytes)
43
+
44
+
45
+ # for i in range(filters.shape[0]):
46
+ # for j in range(filters.shape[1]):
47
+ # fout.write(struct.pack("f", filters[i][j]))
48
+ mel_filters = np.zeros((filters_shape_0, filters_shape_1))
49
+
50
+ for i in range(filters_shape_0):
51
+ for j in range(filters_shape_1):
52
+ mel_filters[i][j] = struct.unpack("f", f.read(4))[0]
53
+
54
+ bytes_data = f.read(4)
55
+ num_tokens = struct.unpack("i", bytes_data)[0]
56
+ tokens = {}
57
+
58
+
59
+ for _ in range(num_tokens):
60
+ token_len = struct.unpack("i", f.read(4))[0]
61
+ token = f.read(token_len)
62
+ tokens[token] = {}
63
+
64
+ # Read model variables
65
+ model_state_dict = OrderedDict()
66
+ while True:
67
+ try:
68
+ n_dims, name_length, ftype = struct.unpack("iii", f.read(12))
69
+ except struct.error:
70
+ break # End of file
71
+ dims = [struct.unpack("i", f.read(4))[0] for _ in range(n_dims)]
72
+ dims = dims[::-1]
73
+ name = f.read(name_length).decode("utf-8")
74
+ if ftype == 1: # f16
75
+ data = np.fromfile(f, dtype=np.float16, count=np.prod(dims)).reshape(dims)
76
+ else: # f32
77
+ data = np.fromfile(f, dtype=np.float32, count=np.prod(dims)).reshape(dims)
78
+
79
+
80
+ if name in ["encoder.conv1.bias", "encoder.conv2.bias"]:
81
+
82
+ data = data[:, 0]
83
+
84
+
85
+ model_state_dict[name] = torch.from_numpy(data)
86
+
87
+ # Now you have the model's state_dict stored in model_state_dict
88
+ # You can load this state_dict into a model with the same architecture
89
+
90
+ # dims = ModelDimensions(**checkpoint["dims"])
91
+ # model = Whisper(dims)
92
+ from whisper import Whisper, ModelDimensions
93
+ dims = ModelDimensions(
94
+ n_mels=n_mels,
95
+ n_audio_ctx=n_audio_ctx,
96
+ n_audio_state=n_audio_state,
97
+ n_audio_head=n_audio_head,
98
+ n_audio_layer=n_audio_layer,
99
+ n_text_ctx=n_text_ctx,
100
+ n_text_state=n_text_state,
101
+ n_text_head=n_text_head,
102
+ n_text_layer=n_text_layer,
103
+ n_vocab=n_vocab,
104
+ )
105
+ model = Whisper(dims) # Replace with your model's class
106
+ model.load_state_dict(model_state_dict)
107
+
108
+ # Save the model in PyTorch format
109
+ torch.save(model.state_dict(), fname_out)
openvino-conversion-requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ openvino-dev[pytorch,onnx]
2
+ openai-whisper