Surya
commited on
all things model
Browse files- README.md +102 -3
- convert-h5-to-coreml.py +117 -0
- convert-h5-to-ggml.py +208 -0
- convert-pt-to-ggml.py +342 -0
- convert-whisper-to-coreml.py +331 -0
- convert-whisper-to-openvino.py +53 -0
- coreml-encoder-base.en.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
- coreml-encoder-base.en.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
- coreml-encoder-base.en.mlpackage/Manifest.json +18 -0
- download-coreml-model.sh +82 -0
- download-ggml-model.cmd +64 -0
- download-ggml-model.sh +111 -0
- for-tests-ggml-base.bin +3 -0
- for-tests-ggml-base.en.bin +3 -0
- for-tests-ggml-large.bin +3 -0
- for-tests-ggml-medium.bin +3 -0
- for-tests-ggml-medium.en.bin +3 -0
- for-tests-ggml-small.bin +3 -0
- for-tests-ggml-small.en.bin +3 -0
- for-tests-ggml-tiny.bin +3 -0
- for-tests-ggml-tiny.en.bin +3 -0
- generate-coreml-interface.sh +29 -0
- generate-coreml-model.sh +36 -0
- ggml-base.en-encoder.mlmodelc/analytics/coremldata.bin +3 -0
- ggml-base.en-encoder.mlmodelc/coremldata.bin +3 -0
- ggml-base.en-encoder.mlmodelc/metadata.json +67 -0
- ggml-base.en-encoder.mlmodelc/model.mil +388 -0
- ggml-base.en-encoder.mlmodelc/weights/weight.bin +3 -0
- ggml-base.en.bin +3 -0
- ggml_to_pt.py +109 -0
- openvino-conversion-requirements.txt +2 -0
README.md
CHANGED
|
@@ -1,3 +1,102 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
## Whisper model files in custom ggml format
|
| 2 |
+
|
| 3 |
+
The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L27)
|
| 4 |
+
are converted to custom `ggml` format in order to be able to load them in C/C++.
|
| 5 |
+
Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
|
| 6 |
+
|
| 7 |
+
You can either obtain the original models and generate the `ggml` files yourself using the conversion script,
|
| 8 |
+
or you can use the [download-ggml-model.sh](download-ggml-model.sh) script to download the already converted models.
|
| 9 |
+
Currently, they are hosted on the following locations:
|
| 10 |
+
|
| 11 |
+
- https://huggingface.co/ggerganov/whisper.cpp
|
| 12 |
+
- https://ggml.ggerganov.com
|
| 13 |
+
|
| 14 |
+
Sample download:
|
| 15 |
+
|
| 16 |
+
```java
|
| 17 |
+
$ ./download-ggml-model.sh base.en
|
| 18 |
+
Downloading ggml model base.en ...
|
| 19 |
+
models/ggml-base.en.bin 100%[=============================================>] 141.11M 5.41MB/s in 22s
|
| 20 |
+
Done! Model 'base.en' saved in 'models/ggml-base.en.bin'
|
| 21 |
+
You can now use it like this:
|
| 22 |
+
|
| 23 |
+
$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
To convert the files yourself, use the convert-pt-to-ggml.py script. Here is an example usage.
|
| 27 |
+
The original PyTorch files are assumed to have been downloaded into ~/.cache/whisper
|
| 28 |
+
Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
|
| 29 |
+
```
|
| 30 |
+
mkdir models/whisper-medium
|
| 31 |
+
python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
|
| 32 |
+
mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
|
| 33 |
+
rmdir models/whisper-medium
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
A third option to obtain the model files is to download them from Hugging Face:
|
| 37 |
+
|
| 38 |
+
https://huggingface.co/ggerganov/whisper.cpp/tree/main
|
| 39 |
+
|
| 40 |
+
## Available models
|
| 41 |
+
|
| 42 |
+
| Model | Disk | SHA |
|
| 43 |
+
| --- | --- | --- |
|
| 44 |
+
| tiny | 75 MiB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
|
| 45 |
+
| tiny.en | 75 MiB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
|
| 46 |
+
| base | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
|
| 47 |
+
| base.en | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` |
|
| 48 |
+
| small | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
|
| 49 |
+
| small.en | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
|
| 50 |
+
| medium | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
|
| 51 |
+
| medium.en | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
|
| 52 |
+
| large-v1 | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
|
| 53 |
+
| large-v2 | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
|
| 54 |
+
| large-v3 | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |
|
| 55 |
+
|
| 56 |
+
## Model files for testing purposes
|
| 57 |
+
|
| 58 |
+
The model files prefixed with `for-tests-` are empty (i.e. do not contain any weights) and are used by the CI for
|
| 59 |
+
testing purposes. They are directly included in this repository for convenience and the Github Actions CI uses them to
|
| 60 |
+
run various sanitizer tests.
|
| 61 |
+
|
| 62 |
+
## Fine-tuned models
|
| 63 |
+
|
| 64 |
+
There are community efforts for creating fine-tuned Whisper models using extra training data. For example, this
|
| 65 |
+
[blog post](https://huggingface.co/blog/fine-tune-whisper) describes a method for fine-tuning using Hugging Face (HF)
|
| 66 |
+
Transformer implementation of Whisper. The produced models are in slightly different format compared to the original
|
| 67 |
+
OpenAI format. To read the HF models you can use the [convert-h5-to-ggml.py](convert-h5-to-ggml.py) script like this:
|
| 68 |
+
|
| 69 |
+
```bash
|
| 70 |
+
git clone https://github.com/openai/whisper
|
| 71 |
+
git clone https://github.com/ggerganov/whisper.cpp
|
| 72 |
+
|
| 73 |
+
# clone HF fine-tuned model (this is just an example)
|
| 74 |
+
git clone https://huggingface.co/openai/whisper-medium
|
| 75 |
+
|
| 76 |
+
# convert the model to ggml
|
| 77 |
+
python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
## Distilled models
|
| 81 |
+
|
| 82 |
+
Initial support for https://huggingface.co/distil-whisper is available.
|
| 83 |
+
|
| 84 |
+
Currently, the chunk-based transcription strategy is not implemented, so there can be sub-optimal quality when using the distilled models with `whisper.cpp`.
|
| 85 |
+
|
| 86 |
+
```bash
|
| 87 |
+
# clone OpenAI whisper and whisper.cpp
|
| 88 |
+
git clone https://github.com/openai/whisper
|
| 89 |
+
git clone https://github.com/ggerganov/whisper.cpp
|
| 90 |
+
|
| 91 |
+
# get the models
|
| 92 |
+
cd whisper.cpp/models
|
| 93 |
+
git clone https://huggingface.co/distil-whisper/distil-medium.en
|
| 94 |
+
git clone https://huggingface.co/distil-whisper/distil-large-v2
|
| 95 |
+
|
| 96 |
+
# convert to ggml
|
| 97 |
+
python3 ./convert-h5-to-ggml.py ./distil-medium.en/ ../../whisper .
|
| 98 |
+
mv ggml-model.bin ggml-medium.en-distil.bin
|
| 99 |
+
|
| 100 |
+
python3 ./convert-h5-to-ggml.py ./distil-large-v2/ ../../whisper .
|
| 101 |
+
mv ggml-model.bin ggml-large-v2-distil.bin
|
| 102 |
+
```
|
convert-h5-to-coreml.py
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import argparse
|
| 2 |
+
import importlib.util
|
| 3 |
+
|
| 4 |
+
spec = importlib.util.spec_from_file_location('whisper_to_coreml', 'models/convert-whisper-to-coreml.py')
|
| 5 |
+
whisper_to_coreml = importlib.util.module_from_spec(spec)
|
| 6 |
+
spec.loader.exec_module(whisper_to_coreml)
|
| 7 |
+
|
| 8 |
+
from whisper import load_model
|
| 9 |
+
|
| 10 |
+
from copy import deepcopy
|
| 11 |
+
import torch
|
| 12 |
+
from transformers import WhisperForConditionalGeneration
|
| 13 |
+
from huggingface_hub import metadata_update
|
| 14 |
+
|
| 15 |
+
# https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets/blob/main/src/multiple_datasets/hub_default_utils.py
|
| 16 |
+
WHISPER_MAPPING = {
|
| 17 |
+
"layers": "blocks",
|
| 18 |
+
"fc1": "mlp.0",
|
| 19 |
+
"fc2": "mlp.2",
|
| 20 |
+
"final_layer_norm": "mlp_ln",
|
| 21 |
+
"layers": "blocks",
|
| 22 |
+
".self_attn.q_proj": ".attn.query",
|
| 23 |
+
".self_attn.k_proj": ".attn.key",
|
| 24 |
+
".self_attn.v_proj": ".attn.value",
|
| 25 |
+
".self_attn_layer_norm": ".attn_ln",
|
| 26 |
+
".self_attn.out_proj": ".attn.out",
|
| 27 |
+
".encoder_attn.q_proj": ".cross_attn.query",
|
| 28 |
+
".encoder_attn.k_proj": ".cross_attn.key",
|
| 29 |
+
".encoder_attn.v_proj": ".cross_attn.value",
|
| 30 |
+
".encoder_attn_layer_norm": ".cross_attn_ln",
|
| 31 |
+
".encoder_attn.out_proj": ".cross_attn.out",
|
| 32 |
+
"decoder.layer_norm.": "decoder.ln.",
|
| 33 |
+
"encoder.layer_norm.": "encoder.ln_post.",
|
| 34 |
+
"embed_tokens": "token_embedding",
|
| 35 |
+
"encoder.embed_positions.weight": "encoder.positional_embedding",
|
| 36 |
+
"decoder.embed_positions.weight": "decoder.positional_embedding",
|
| 37 |
+
"layer_norm": "ln_post",
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
# https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets/blob/main/src/multiple_datasets/hub_default_utils.py
|
| 41 |
+
def rename_keys(s_dict):
|
| 42 |
+
keys = list(s_dict.keys())
|
| 43 |
+
for key in keys:
|
| 44 |
+
new_key = key
|
| 45 |
+
for k, v in WHISPER_MAPPING.items():
|
| 46 |
+
if k in key:
|
| 47 |
+
new_key = new_key.replace(k, v)
|
| 48 |
+
|
| 49 |
+
print(f"{key} -> {new_key}")
|
| 50 |
+
|
| 51 |
+
s_dict[new_key] = s_dict.pop(key)
|
| 52 |
+
return s_dict
|
| 53 |
+
|
| 54 |
+
# https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets/blob/main/src/multiple_datasets/hub_default_utils.py
|
| 55 |
+
def convert_hf_whisper(hf_model_name_or_path: str, whisper_state_path: str):
|
| 56 |
+
transformer_model = WhisperForConditionalGeneration.from_pretrained(hf_model_name_or_path)
|
| 57 |
+
config = transformer_model.config
|
| 58 |
+
|
| 59 |
+
# first build dims
|
| 60 |
+
dims = {
|
| 61 |
+
'n_mels': config.num_mel_bins,
|
| 62 |
+
'n_vocab': config.vocab_size,
|
| 63 |
+
'n_audio_ctx': config.max_source_positions,
|
| 64 |
+
'n_audio_state': config.d_model,
|
| 65 |
+
'n_audio_head': config.encoder_attention_heads,
|
| 66 |
+
'n_audio_layer': config.encoder_layers,
|
| 67 |
+
'n_text_ctx': config.max_target_positions,
|
| 68 |
+
'n_text_state': config.d_model,
|
| 69 |
+
'n_text_head': config.decoder_attention_heads,
|
| 70 |
+
'n_text_layer': config.decoder_layers
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
state_dict = deepcopy(transformer_model.model.state_dict())
|
| 74 |
+
state_dict = rename_keys(state_dict)
|
| 75 |
+
|
| 76 |
+
torch.save({"dims": dims, "model_state_dict": state_dict}, whisper_state_path)
|
| 77 |
+
|
| 78 |
+
# Ported from models/convert-whisper-to-coreml.py
|
| 79 |
+
if __name__ == "__main__":
|
| 80 |
+
parser = argparse.ArgumentParser()
|
| 81 |
+
parser.add_argument("--model-name", type=str, help="name of model to convert (e.g. tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3)", required=True)
|
| 82 |
+
parser.add_argument("--model-path", type=str, help="path to the model (e.g. if published on HuggingFace: Oblivion208/whisper-tiny-cantonese)", required=True)
|
| 83 |
+
parser.add_argument("--encoder-only", type=bool, help="only convert encoder", default=False)
|
| 84 |
+
parser.add_argument("--quantize", type=bool, help="quantize weights to F16", default=False)
|
| 85 |
+
parser.add_argument("--optimize-ane", type=bool, help="optimize for ANE execution (currently broken)", default=False)
|
| 86 |
+
args = parser.parse_args()
|
| 87 |
+
|
| 88 |
+
if args.model_name not in ["tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large-v2", "large-v3"]:
|
| 89 |
+
raise ValueError("Invalid model name")
|
| 90 |
+
|
| 91 |
+
pt_target_path = f"models/hf-{args.model_name}.pt"
|
| 92 |
+
convert_hf_whisper(args.model_path, pt_target_path)
|
| 93 |
+
|
| 94 |
+
whisper = load_model(pt_target_path).cpu()
|
| 95 |
+
hparams = whisper.dims
|
| 96 |
+
print(hparams)
|
| 97 |
+
|
| 98 |
+
if args.optimize_ane:
|
| 99 |
+
whisperANE = whisper_to_coreml.WhisperANE(hparams).eval()
|
| 100 |
+
whisperANE.load_state_dict(whisper.state_dict())
|
| 101 |
+
|
| 102 |
+
encoder = whisperANE.encoder
|
| 103 |
+
decoder = whisperANE.decoder
|
| 104 |
+
else:
|
| 105 |
+
encoder = whisper.encoder
|
| 106 |
+
decoder = whisper.decoder
|
| 107 |
+
|
| 108 |
+
# Convert encoder
|
| 109 |
+
encoder = whisper_to_coreml.convert_encoder(hparams, encoder, quantize=args.quantize)
|
| 110 |
+
encoder.save(f"models/coreml-encoder-{args.model_name}.mlpackage")
|
| 111 |
+
|
| 112 |
+
if args.encoder_only is False:
|
| 113 |
+
# Convert decoder
|
| 114 |
+
decoder = whisper_to_coreml.convert_decoder(hparams, decoder, quantize=args.quantize)
|
| 115 |
+
decoder.save(f"models/coreml-decoder-{args.model_name}.mlpackage")
|
| 116 |
+
|
| 117 |
+
print("done converting")
|
convert-h5-to-ggml.py
ADDED
|
@@ -0,0 +1,208 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Convert Hugging Face fine-tuned models to ggml format
|
| 2 |
+
#
|
| 3 |
+
# Usage:
|
| 4 |
+
#
|
| 5 |
+
# git clone https://github.com/openai/whisper
|
| 6 |
+
# git clone https://github.com/ggerganov/whisper.cpp
|
| 7 |
+
# git clone https://huggingface.co/openai/whisper-medium
|
| 8 |
+
#
|
| 9 |
+
# python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
|
| 10 |
+
#
|
| 11 |
+
# This script is similar to "convert-pt-to-ggml.py"
|
| 12 |
+
#
|
| 13 |
+
# For more info:
|
| 14 |
+
#
|
| 15 |
+
# https://github.com/ggerganov/whisper.cpp/issues/157
|
| 16 |
+
#
|
| 17 |
+
|
| 18 |
+
import io
|
| 19 |
+
import os
|
| 20 |
+
import sys
|
| 21 |
+
import struct
|
| 22 |
+
import json
|
| 23 |
+
import code
|
| 24 |
+
import torch
|
| 25 |
+
import numpy as np
|
| 26 |
+
from pathlib import Path
|
| 27 |
+
|
| 28 |
+
from transformers import WhisperForConditionalGeneration
|
| 29 |
+
|
| 30 |
+
conv_map = {
|
| 31 |
+
'self_attn.k_proj' : 'attn.key',
|
| 32 |
+
'self_attn.q_proj' : 'attn.query',
|
| 33 |
+
'self_attn.v_proj' : 'attn.value',
|
| 34 |
+
'self_attn.out_proj' : 'attn.out',
|
| 35 |
+
'self_attn_layer_norm' : 'attn_ln',
|
| 36 |
+
'encoder_attn.q_proj' : 'cross_attn.query',
|
| 37 |
+
'encoder_attn.v_proj' : 'cross_attn.value',
|
| 38 |
+
'encoder_attn.out_proj' : 'cross_attn.out',
|
| 39 |
+
'encoder_attn_layer_norm' : 'cross_attn_ln',
|
| 40 |
+
'fc1' : 'mlp.0',
|
| 41 |
+
'fc2' : 'mlp.2',
|
| 42 |
+
'final_layer_norm' : 'mlp_ln',
|
| 43 |
+
'encoder.layer_norm.bias' : 'encoder.ln_post.bias',
|
| 44 |
+
'encoder.layer_norm.weight' : 'encoder.ln_post.weight',
|
| 45 |
+
'encoder.embed_positions.weight': 'encoder.positional_embedding',
|
| 46 |
+
'decoder.layer_norm.bias' : 'decoder.ln.bias',
|
| 47 |
+
'decoder.layer_norm.weight' : 'decoder.ln.weight',
|
| 48 |
+
'decoder.embed_positions.weight': 'decoder.positional_embedding',
|
| 49 |
+
'decoder.embed_tokens.weight' : 'decoder.token_embedding.weight',
|
| 50 |
+
'proj_out.weight' : 'decoder.proj.weight',
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
# ref: https://github.com/openai/gpt-2/blob/master/src/encoder.py
|
| 54 |
+
def bytes_to_unicode():
|
| 55 |
+
"""
|
| 56 |
+
Returns list of utf-8 byte and a corresponding list of unicode strings.
|
| 57 |
+
The reversible bpe codes work on unicode strings.
|
| 58 |
+
This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
|
| 59 |
+
When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
|
| 60 |
+
This is a significant percentage of your normal, say, 32K bpe vocab.
|
| 61 |
+
To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
|
| 62 |
+
And avoids mapping to whitespace/control characters the bpe code barfs on.
|
| 63 |
+
"""
|
| 64 |
+
bs = list(range(ord("!"), ord("~")+1))+list(range(ord("¡"), ord("¬")+1))+list(range(ord("®"), ord("ÿ")+1))
|
| 65 |
+
cs = bs[:]
|
| 66 |
+
n = 0
|
| 67 |
+
for b in range(2**8):
|
| 68 |
+
if b not in bs:
|
| 69 |
+
bs.append(b)
|
| 70 |
+
cs.append(2**8+n)
|
| 71 |
+
n += 1
|
| 72 |
+
cs = [chr(n) for n in cs]
|
| 73 |
+
return dict(zip(bs, cs))
|
| 74 |
+
|
| 75 |
+
if len(sys.argv) < 4:
|
| 76 |
+
print("Usage: convert-h5-to-ggml.py dir_model path-to-whisper-repo dir-output [use-f32]\n")
|
| 77 |
+
sys.exit(1)
|
| 78 |
+
|
| 79 |
+
dir_model = Path(sys.argv[1])
|
| 80 |
+
dir_whisper = Path(sys.argv[2])
|
| 81 |
+
dir_out = Path(sys.argv[3])
|
| 82 |
+
|
| 83 |
+
encoder = json.load((dir_model / "vocab.json").open("r", encoding="utf8"))
|
| 84 |
+
encoder_added = json.load((dir_model / "added_tokens.json").open( "r", encoding="utf8"))
|
| 85 |
+
hparams = json.load((dir_model / "config.json").open("r", encoding="utf8") )
|
| 86 |
+
|
| 87 |
+
model = WhisperForConditionalGeneration.from_pretrained(dir_model)
|
| 88 |
+
|
| 89 |
+
#code.interact(local=locals())
|
| 90 |
+
|
| 91 |
+
n_mels = hparams["num_mel_bins"]
|
| 92 |
+
with np.load(os.path.join(dir_whisper, "whisper/assets", "mel_filters.npz")) as f:
|
| 93 |
+
filters = torch.from_numpy(f[f"mel_{n_mels}"])
|
| 94 |
+
|
| 95 |
+
dir_tokenizer = dir_model
|
| 96 |
+
|
| 97 |
+
fname_out = dir_out / "ggml-model.bin"
|
| 98 |
+
|
| 99 |
+
tokens = json.load(open(dir_tokenizer / "vocab.json", "r", encoding="utf8"))
|
| 100 |
+
|
| 101 |
+
# use 16-bit or 32-bit floats
|
| 102 |
+
use_f16 = True
|
| 103 |
+
if len(sys.argv) > 4:
|
| 104 |
+
use_f16 = False
|
| 105 |
+
fname_out = dir_out / "ggml-model-f32.bin"
|
| 106 |
+
|
| 107 |
+
fout = open(fname_out, "wb")
|
| 108 |
+
|
| 109 |
+
fout.write(struct.pack("i", 0x67676d6c)) # magic: ggml in hex
|
| 110 |
+
fout.write(struct.pack("i", hparams["vocab_size"]))
|
| 111 |
+
fout.write(struct.pack("i", hparams["max_source_positions"]))
|
| 112 |
+
fout.write(struct.pack("i", hparams["d_model"]))
|
| 113 |
+
fout.write(struct.pack("i", hparams["encoder_attention_heads"]))
|
| 114 |
+
fout.write(struct.pack("i", hparams["encoder_layers"]))
|
| 115 |
+
fout.write(struct.pack("i", hparams["max_length"]))
|
| 116 |
+
fout.write(struct.pack("i", hparams["d_model"]))
|
| 117 |
+
fout.write(struct.pack("i", hparams["decoder_attention_heads"]))
|
| 118 |
+
fout.write(struct.pack("i", hparams["decoder_layers"]))
|
| 119 |
+
fout.write(struct.pack("i", hparams["num_mel_bins"]))
|
| 120 |
+
fout.write(struct.pack("i", use_f16))
|
| 121 |
+
|
| 122 |
+
fout.write(struct.pack("i", filters.shape[0]))
|
| 123 |
+
fout.write(struct.pack("i", filters.shape[1]))
|
| 124 |
+
for i in range(filters.shape[0]):
|
| 125 |
+
for j in range(filters.shape[1]):
|
| 126 |
+
fout.write(struct.pack("f", filters[i][j]))
|
| 127 |
+
|
| 128 |
+
byte_encoder = bytes_to_unicode()
|
| 129 |
+
byte_decoder = {v:k for k, v in byte_encoder.items()}
|
| 130 |
+
|
| 131 |
+
fout.write(struct.pack("i", len(tokens)))
|
| 132 |
+
|
| 133 |
+
tokens = sorted(tokens.items(), key=lambda x: x[1])
|
| 134 |
+
for key in tokens:
|
| 135 |
+
text = bytearray([byte_decoder[c] for c in key[0]])
|
| 136 |
+
fout.write(struct.pack("i", len(text)))
|
| 137 |
+
fout.write(text)
|
| 138 |
+
|
| 139 |
+
list_vars = model.state_dict()
|
| 140 |
+
for name in list_vars.keys():
|
| 141 |
+
# this seems to not be used
|
| 142 |
+
# ref: https://github.com/huggingface/transformers/blob/9a5b84a0076a04fe9596da72e8668069d4f09ea0/src/transformers/models/whisper/modeling_whisper.py#L1099-L1106
|
| 143 |
+
if name == "proj_out.weight":
|
| 144 |
+
print('Skipping', name)
|
| 145 |
+
continue
|
| 146 |
+
|
| 147 |
+
src = name
|
| 148 |
+
|
| 149 |
+
nn = name
|
| 150 |
+
if name != "proj_out.weight":
|
| 151 |
+
nn = nn.split(".")[1:]
|
| 152 |
+
else:
|
| 153 |
+
nn = nn.split(".")
|
| 154 |
+
|
| 155 |
+
if nn[1] == "layers":
|
| 156 |
+
nn[1] = "blocks"
|
| 157 |
+
if ".".join(nn[3:-1]) == "encoder_attn.k_proj":
|
| 158 |
+
mapped = "attn.key" if nn[0] == "encoder" else "cross_attn.key"
|
| 159 |
+
else:
|
| 160 |
+
mapped = conv_map[".".join(nn[3:-1])]
|
| 161 |
+
name = ".".join(nn[:3] + [mapped] + nn[-1:])
|
| 162 |
+
else:
|
| 163 |
+
name = ".".join(nn)
|
| 164 |
+
name = conv_map[name] if name in conv_map else name
|
| 165 |
+
|
| 166 |
+
print(src, ' -> ', name)
|
| 167 |
+
data = list_vars[src].squeeze().numpy()
|
| 168 |
+
data = data.astype(np.float16)
|
| 169 |
+
|
| 170 |
+
# reshape conv bias from [n] to [n, 1]
|
| 171 |
+
if name in ["encoder.conv1.bias", "encoder.conv2.bias"]:
|
| 172 |
+
data = data.reshape(data.shape[0], 1)
|
| 173 |
+
print(" Reshaped variable: " , name , " to shape: ", data.shape)
|
| 174 |
+
|
| 175 |
+
n_dims = len(data.shape)
|
| 176 |
+
print(name, n_dims, data.shape)
|
| 177 |
+
|
| 178 |
+
# looks like the whisper models are in f16 by default
|
| 179 |
+
# so we need to convert the small tensors to f32 until we fully support f16 in ggml
|
| 180 |
+
# ftype == 0 -> float32, ftype == 1 -> float16
|
| 181 |
+
ftype = 1
|
| 182 |
+
if use_f16:
|
| 183 |
+
if n_dims < 2 or \
|
| 184 |
+
name == "encoder.conv1.bias" or \
|
| 185 |
+
name == "encoder.conv2.bias" or \
|
| 186 |
+
name == "encoder.positional_embedding" or \
|
| 187 |
+
name == "decoder.positional_embedding":
|
| 188 |
+
print(" Converting to float32")
|
| 189 |
+
data = data.astype(np.float32)
|
| 190 |
+
ftype = 0
|
| 191 |
+
else:
|
| 192 |
+
data = data.astype(np.float32)
|
| 193 |
+
ftype = 0
|
| 194 |
+
|
| 195 |
+
# header
|
| 196 |
+
str_ = name.encode('utf-8')
|
| 197 |
+
fout.write(struct.pack("iii", n_dims, len(str_), ftype))
|
| 198 |
+
for i in range(n_dims):
|
| 199 |
+
fout.write(struct.pack("i", data.shape[n_dims - 1 - i]))
|
| 200 |
+
fout.write(str_)
|
| 201 |
+
|
| 202 |
+
# data
|
| 203 |
+
data.tofile(fout)
|
| 204 |
+
|
| 205 |
+
fout.close()
|
| 206 |
+
|
| 207 |
+
print("Done. Output file: " , fname_out)
|
| 208 |
+
print("")
|
convert-pt-to-ggml.py
ADDED
|
@@ -0,0 +1,342 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Convert Whisper transformer model from PyTorch to ggml format
|
| 2 |
+
#
|
| 3 |
+
# Usage: python convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
|
| 4 |
+
#
|
| 5 |
+
# You need to clone the original repo in ~/path/to/repo/whisper/
|
| 6 |
+
#
|
| 7 |
+
# git clone https://github.com/openai/whisper ~/path/to/repo/whisper/
|
| 8 |
+
#
|
| 9 |
+
# It is used to various assets needed by the algorithm:
|
| 10 |
+
#
|
| 11 |
+
# - tokenizer
|
| 12 |
+
# - mel filters
|
| 13 |
+
#
|
| 14 |
+
# Also, you need to have the original models in ~/.cache/whisper/
|
| 15 |
+
# See the original repo for more details.
|
| 16 |
+
#
|
| 17 |
+
# This script loads the specified model and whisper assets and saves them in ggml format.
|
| 18 |
+
# The output is a single binary file containing the following information:
|
| 19 |
+
#
|
| 20 |
+
# - hparams
|
| 21 |
+
# - mel filters
|
| 22 |
+
# - tokenizer vocab
|
| 23 |
+
# - model variables
|
| 24 |
+
#
|
| 25 |
+
# For each variable, write the following:
|
| 26 |
+
#
|
| 27 |
+
# - Number of dimensions (int)
|
| 28 |
+
# - Name length (int)
|
| 29 |
+
# - Dimensions (int[n_dims])
|
| 30 |
+
# - Name (char[name_length])
|
| 31 |
+
# - Data (float[n_dims])
|
| 32 |
+
#
|
| 33 |
+
|
| 34 |
+
import io
|
| 35 |
+
import os
|
| 36 |
+
import sys
|
| 37 |
+
import struct
|
| 38 |
+
import json
|
| 39 |
+
import code
|
| 40 |
+
import torch
|
| 41 |
+
import numpy as np
|
| 42 |
+
import base64
|
| 43 |
+
from pathlib import Path
|
| 44 |
+
#from transformers import GPTJForCausalLM
|
| 45 |
+
#from transformers import GPT2TokenizerFast
|
| 46 |
+
|
| 47 |
+
# ref: https://github.com/openai/whisper/blob/8cf36f3508c9acd341a45eb2364239a3d81458b9/whisper/tokenizer.py#L10-L110
|
| 48 |
+
#LANGUAGES = {
|
| 49 |
+
# "en": "english",
|
| 50 |
+
# "zh": "chinese",
|
| 51 |
+
# "de": "german",
|
| 52 |
+
# "es": "spanish",
|
| 53 |
+
# "ru": "russian",
|
| 54 |
+
# "ko": "korean",
|
| 55 |
+
# "fr": "french",
|
| 56 |
+
# "ja": "japanese",
|
| 57 |
+
# "pt": "portuguese",
|
| 58 |
+
# "tr": "turkish",
|
| 59 |
+
# "pl": "polish",
|
| 60 |
+
# "ca": "catalan",
|
| 61 |
+
# "nl": "dutch",
|
| 62 |
+
# "ar": "arabic",
|
| 63 |
+
# "sv": "swedish",
|
| 64 |
+
# "it": "italian",
|
| 65 |
+
# "id": "indonesian",
|
| 66 |
+
# "hi": "hindi",
|
| 67 |
+
# "fi": "finnish",
|
| 68 |
+
# "vi": "vietnamese",
|
| 69 |
+
# "iw": "hebrew",
|
| 70 |
+
# "uk": "ukrainian",
|
| 71 |
+
# "el": "greek",
|
| 72 |
+
# "ms": "malay",
|
| 73 |
+
# "cs": "czech",
|
| 74 |
+
# "ro": "romanian",
|
| 75 |
+
# "da": "danish",
|
| 76 |
+
# "hu": "hungarian",
|
| 77 |
+
# "ta": "tamil",
|
| 78 |
+
# "no": "norwegian",
|
| 79 |
+
# "th": "thai",
|
| 80 |
+
# "ur": "urdu",
|
| 81 |
+
# "hr": "croatian",
|
| 82 |
+
# "bg": "bulgarian",
|
| 83 |
+
# "lt": "lithuanian",
|
| 84 |
+
# "la": "latin",
|
| 85 |
+
# "mi": "maori",
|
| 86 |
+
# "ml": "malayalam",
|
| 87 |
+
# "cy": "welsh",
|
| 88 |
+
# "sk": "slovak",
|
| 89 |
+
# "te": "telugu",
|
| 90 |
+
# "fa": "persian",
|
| 91 |
+
# "lv": "latvian",
|
| 92 |
+
# "bn": "bengali",
|
| 93 |
+
# "sr": "serbian",
|
| 94 |
+
# "az": "azerbaijani",
|
| 95 |
+
# "sl": "slovenian",
|
| 96 |
+
# "kn": "kannada",
|
| 97 |
+
# "et": "estonian",
|
| 98 |
+
# "mk": "macedonian",
|
| 99 |
+
# "br": "breton",
|
| 100 |
+
# "eu": "basque",
|
| 101 |
+
# "is": "icelandic",
|
| 102 |
+
# "hy": "armenian",
|
| 103 |
+
# "ne": "nepali",
|
| 104 |
+
# "mn": "mongolian",
|
| 105 |
+
# "bs": "bosnian",
|
| 106 |
+
# "kk": "kazakh",
|
| 107 |
+
# "sq": "albanian",
|
| 108 |
+
# "sw": "swahili",
|
| 109 |
+
# "gl": "galician",
|
| 110 |
+
# "mr": "marathi",
|
| 111 |
+
# "pa": "punjabi",
|
| 112 |
+
# "si": "sinhala",
|
| 113 |
+
# "km": "khmer",
|
| 114 |
+
# "sn": "shona",
|
| 115 |
+
# "yo": "yoruba",
|
| 116 |
+
# "so": "somali",
|
| 117 |
+
# "af": "afrikaans",
|
| 118 |
+
# "oc": "occitan",
|
| 119 |
+
# "ka": "georgian",
|
| 120 |
+
# "be": "belarusian",
|
| 121 |
+
# "tg": "tajik",
|
| 122 |
+
# "sd": "sindhi",
|
| 123 |
+
# "gu": "gujarati",
|
| 124 |
+
# "am": "amharic",
|
| 125 |
+
# "yi": "yiddish",
|
| 126 |
+
# "lo": "lao",
|
| 127 |
+
# "uz": "uzbek",
|
| 128 |
+
# "fo": "faroese",
|
| 129 |
+
# "ht": "haitian creole",
|
| 130 |
+
# "ps": "pashto",
|
| 131 |
+
# "tk": "turkmen",
|
| 132 |
+
# "nn": "nynorsk",
|
| 133 |
+
# "mt": "maltese",
|
| 134 |
+
# "sa": "sanskrit",
|
| 135 |
+
# "lb": "luxembourgish",
|
| 136 |
+
# "my": "myanmar",
|
| 137 |
+
# "bo": "tibetan",
|
| 138 |
+
# "tl": "tagalog",
|
| 139 |
+
# "mg": "malagasy",
|
| 140 |
+
# "as": "assamese",
|
| 141 |
+
# "tt": "tatar",
|
| 142 |
+
# "haw": "hawaiian",
|
| 143 |
+
# "ln": "lingala",
|
| 144 |
+
# "ha": "hausa",
|
| 145 |
+
# "ba": "bashkir",
|
| 146 |
+
# "jw": "javanese",
|
| 147 |
+
# "su": "sundanese",
|
| 148 |
+
#}
|
| 149 |
+
|
| 150 |
+
## ref: https://github.com/openai/whisper/blob/8cf36f3508c9acd341a45eb2364239a3d81458b9/whisper/tokenizer.py#L273-L292
|
| 151 |
+
#def build_tokenizer(path_to_whisper_repo: str, name: str = "gpt2"):
|
| 152 |
+
# os.environ["TOKENIZERS_PARALLELISM"] = "false"
|
| 153 |
+
# path = os.path.join(path_to_whisper_repo, "whisper/assets", name)
|
| 154 |
+
# tokenizer = GPT2TokenizerFast.from_pretrained(path)
|
| 155 |
+
#
|
| 156 |
+
# specials = [
|
| 157 |
+
# "<|startoftranscript|>",
|
| 158 |
+
# *[f"<|{lang}|>" for lang in LANGUAGES.keys()],
|
| 159 |
+
# "<|translate|>",
|
| 160 |
+
# "<|transcribe|>",
|
| 161 |
+
# "<|startoflm|>",
|
| 162 |
+
# "<|startofprev|>",
|
| 163 |
+
# "<|nocaptions|>",
|
| 164 |
+
# "<|notimestamps|>",
|
| 165 |
+
# ]
|
| 166 |
+
#
|
| 167 |
+
# tokenizer.add_special_tokens(dict(additional_special_tokens=specials))
|
| 168 |
+
# return tokenizer
|
| 169 |
+
|
| 170 |
+
# ref: https://github.com/openai/gpt-2/blob/master/src/encoder.py
|
| 171 |
+
def bytes_to_unicode():
|
| 172 |
+
"""
|
| 173 |
+
Returns list of utf-8 byte and a corresponding list of unicode strings.
|
| 174 |
+
The reversible bpe codes work on unicode strings.
|
| 175 |
+
This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
|
| 176 |
+
When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
|
| 177 |
+
This is a signficant percentage of your normal, say, 32K bpe vocab.
|
| 178 |
+
To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
|
| 179 |
+
And avoids mapping to whitespace/control characters the bpe code barfs on.
|
| 180 |
+
"""
|
| 181 |
+
bs = list(range(ord("!"), ord("~")+1))+list(range(ord("¡"), ord("¬")+1))+list(range(ord("®"), ord("ÿ")+1))
|
| 182 |
+
cs = bs[:]
|
| 183 |
+
n = 0
|
| 184 |
+
for b in range(2**8):
|
| 185 |
+
if b not in bs:
|
| 186 |
+
bs.append(b)
|
| 187 |
+
cs.append(2**8+n)
|
| 188 |
+
n += 1
|
| 189 |
+
cs = [chr(n) for n in cs]
|
| 190 |
+
return dict(zip(bs, cs))
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
if len(sys.argv) < 4:
|
| 194 |
+
print("Usage: convert-pt-to-ggml.py model.pt path-to-whisper-repo dir-output [use-f32]\n")
|
| 195 |
+
sys.exit(1)
|
| 196 |
+
|
| 197 |
+
fname_inp = Path(sys.argv[1])
|
| 198 |
+
dir_whisper = Path(sys.argv[2])
|
| 199 |
+
dir_out = Path(sys.argv[3])
|
| 200 |
+
|
| 201 |
+
# try to load PyTorch binary data
|
| 202 |
+
try:
|
| 203 |
+
model_bytes = open(fname_inp, "rb").read()
|
| 204 |
+
with io.BytesIO(model_bytes) as fp:
|
| 205 |
+
checkpoint = torch.load(fp, map_location="cpu")
|
| 206 |
+
except Exception:
|
| 207 |
+
print("Error: failed to load PyTorch model file:" , fname_inp)
|
| 208 |
+
sys.exit(1)
|
| 209 |
+
|
| 210 |
+
hparams = checkpoint["dims"]
|
| 211 |
+
print("hparams:", hparams)
|
| 212 |
+
|
| 213 |
+
list_vars = checkpoint["model_state_dict"]
|
| 214 |
+
|
| 215 |
+
#print(list_vars['encoder.positional_embedding'])
|
| 216 |
+
#print(list_vars['encoder.conv1.weight'])
|
| 217 |
+
#print(list_vars['encoder.conv1.weight'].shape)
|
| 218 |
+
|
| 219 |
+
# load mel filters
|
| 220 |
+
n_mels = hparams["n_mels"]
|
| 221 |
+
with np.load(dir_whisper / "whisper" / "assets" / "mel_filters.npz") as f:
|
| 222 |
+
filters = torch.from_numpy(f[f"mel_{n_mels}"])
|
| 223 |
+
#print (filters)
|
| 224 |
+
|
| 225 |
+
#code.interact(local=locals())
|
| 226 |
+
|
| 227 |
+
# load tokenizer
|
| 228 |
+
# for backwards compatibility, also check for older hf_transformers format tokenizer files
|
| 229 |
+
# old format: dir_whisper/whisper/assets/[multilingual/gpt2]/vocab.json
|
| 230 |
+
# new format: dir_whisper/whisper/assets/[multilingual/gpt2].tiktoken
|
| 231 |
+
multilingual = hparams["n_vocab"] >= 51865
|
| 232 |
+
tokenizer = dir_whisper / "whisper" / "assets" / (multilingual and "multilingual.tiktoken" or "gpt2.tiktoken")
|
| 233 |
+
tokenizer_type = "tiktoken"
|
| 234 |
+
if not tokenizer.is_file():
|
| 235 |
+
tokenizer = dir_whisper / "whisper" / "assets" / (multilingual and "multilingual" or "gpt2") / "vocab.json"
|
| 236 |
+
tokenizer_type = "hf_transformers"
|
| 237 |
+
if not tokenizer.is_file():
|
| 238 |
+
print("Error: failed to find either tiktoken or hf_transformers tokenizer file:", tokenizer)
|
| 239 |
+
sys.exit(1)
|
| 240 |
+
|
| 241 |
+
byte_encoder = bytes_to_unicode()
|
| 242 |
+
byte_decoder = {v:k for k, v in byte_encoder.items()}
|
| 243 |
+
|
| 244 |
+
if tokenizer_type == "tiktoken":
|
| 245 |
+
with open(tokenizer, "rb") as f:
|
| 246 |
+
contents = f.read()
|
| 247 |
+
tokens = {base64.b64decode(token): int(rank) for token, rank in (line.split() for line in contents.splitlines() if line)}
|
| 248 |
+
elif tokenizer_type == "hf_transformers":
|
| 249 |
+
with open(tokenizer, "r", encoding="utf8") as f:
|
| 250 |
+
_tokens_raw = json.load(f)
|
| 251 |
+
if '<|endoftext|>' in _tokens_raw:
|
| 252 |
+
# ensures exact same model as tokenizer_type == tiktoken
|
| 253 |
+
# details: https://github.com/ggerganov/whisper.cpp/pull/725
|
| 254 |
+
del _tokens_raw['<|endoftext|>']
|
| 255 |
+
tokens = {bytes([byte_decoder[c] for c in token]): int(idx) for token, idx in _tokens_raw.items()}
|
| 256 |
+
|
| 257 |
+
# output in the same directory as the model
|
| 258 |
+
fname_out = dir_out / "ggml-model.bin"
|
| 259 |
+
|
| 260 |
+
# use 16-bit or 32-bit floats
|
| 261 |
+
use_f16 = True
|
| 262 |
+
if len(sys.argv) > 4:
|
| 263 |
+
use_f16 = False
|
| 264 |
+
fname_out = dir_out / "ggml-model-f32.bin"
|
| 265 |
+
|
| 266 |
+
fout = fname_out.open("wb")
|
| 267 |
+
|
| 268 |
+
fout.write(struct.pack("i", 0x67676d6c)) # magic: ggml in hex
|
| 269 |
+
fout.write(struct.pack("i", hparams["n_vocab"]))
|
| 270 |
+
fout.write(struct.pack("i", hparams["n_audio_ctx"]))
|
| 271 |
+
fout.write(struct.pack("i", hparams["n_audio_state"]))
|
| 272 |
+
fout.write(struct.pack("i", hparams["n_audio_head"]))
|
| 273 |
+
fout.write(struct.pack("i", hparams["n_audio_layer"]))
|
| 274 |
+
fout.write(struct.pack("i", hparams["n_text_ctx"]))
|
| 275 |
+
fout.write(struct.pack("i", hparams["n_text_state"]))
|
| 276 |
+
fout.write(struct.pack("i", hparams["n_text_head"]))
|
| 277 |
+
fout.write(struct.pack("i", hparams["n_text_layer"]))
|
| 278 |
+
fout.write(struct.pack("i", hparams["n_mels"]))
|
| 279 |
+
fout.write(struct.pack("i", use_f16))
|
| 280 |
+
|
| 281 |
+
# write mel filters
|
| 282 |
+
fout.write(struct.pack("i", filters.shape[0]))
|
| 283 |
+
fout.write(struct.pack("i", filters.shape[1]))
|
| 284 |
+
for i in range(filters.shape[0]):
|
| 285 |
+
for j in range(filters.shape[1]):
|
| 286 |
+
fout.write(struct.pack("f", filters[i][j]))
|
| 287 |
+
|
| 288 |
+
# write tokenizer
|
| 289 |
+
fout.write(struct.pack("i", len(tokens)))
|
| 290 |
+
|
| 291 |
+
for key in tokens:
|
| 292 |
+
fout.write(struct.pack("i", len(key)))
|
| 293 |
+
fout.write(key)
|
| 294 |
+
|
| 295 |
+
for name in list_vars.keys():
|
| 296 |
+
data = list_vars[name].squeeze().numpy()
|
| 297 |
+
print("Processing variable: " , name , " with shape: ", data.shape)
|
| 298 |
+
|
| 299 |
+
# reshape conv bias from [n] to [n, 1]
|
| 300 |
+
if name in ["encoder.conv1.bias", "encoder.conv2.bias"]:
|
| 301 |
+
data = data.reshape(data.shape[0], 1)
|
| 302 |
+
print(f" Reshaped variable: {name} to shape: ", data.shape)
|
| 303 |
+
|
| 304 |
+
n_dims = len(data.shape)
|
| 305 |
+
|
| 306 |
+
# looks like the whisper models are in f16 by default
|
| 307 |
+
# so we need to convert the small tensors to f32 until we fully support f16 in ggml
|
| 308 |
+
# ftype == 0 -> float32, ftype == 1 -> float16
|
| 309 |
+
ftype = 1
|
| 310 |
+
if use_f16:
|
| 311 |
+
if n_dims < 2 or \
|
| 312 |
+
name == "encoder.conv1.bias" or \
|
| 313 |
+
name == "encoder.conv2.bias" or \
|
| 314 |
+
name == "encoder.positional_embedding" or \
|
| 315 |
+
name == "decoder.positional_embedding":
|
| 316 |
+
print(" Converting to float32")
|
| 317 |
+
data = data.astype(np.float32)
|
| 318 |
+
ftype = 0
|
| 319 |
+
else:
|
| 320 |
+
data = data.astype(np.float32)
|
| 321 |
+
ftype = 0
|
| 322 |
+
|
| 323 |
+
#if name.startswith("encoder"):
|
| 324 |
+
# if name.endswith("mlp.0.weight") or \
|
| 325 |
+
# name.endswith("mlp.2.weight"):
|
| 326 |
+
# print(" Transposing")
|
| 327 |
+
# data = data.transpose()
|
| 328 |
+
|
| 329 |
+
# header
|
| 330 |
+
str_ = name.encode('utf-8')
|
| 331 |
+
fout.write(struct.pack("iii", n_dims, len(str_), ftype))
|
| 332 |
+
for i in range(n_dims):
|
| 333 |
+
fout.write(struct.pack("i", data.shape[n_dims - 1 - i]))
|
| 334 |
+
fout.write(str_)
|
| 335 |
+
|
| 336 |
+
# data
|
| 337 |
+
data.tofile(fout)
|
| 338 |
+
|
| 339 |
+
fout.close()
|
| 340 |
+
|
| 341 |
+
print("Done. Output file: " , fname_out)
|
| 342 |
+
print("")
|
convert-whisper-to-coreml.py
ADDED
|
@@ -0,0 +1,331 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import argparse
|
| 2 |
+
import torch
|
| 3 |
+
import torch.nn.functional as F
|
| 4 |
+
import coremltools as ct
|
| 5 |
+
|
| 6 |
+
from torch import Tensor
|
| 7 |
+
from torch import nn
|
| 8 |
+
from typing import Dict
|
| 9 |
+
from typing import Optional
|
| 10 |
+
from ane_transformers.reference.layer_norm import LayerNormANE as LayerNormANEBase
|
| 11 |
+
from coremltools.models.neural_network.quantization_utils import quantize_weights
|
| 12 |
+
from whisper.model import Whisper, AudioEncoder, TextDecoder, ResidualAttentionBlock, MultiHeadAttention, ModelDimensions
|
| 13 |
+
from whisper import load_model
|
| 14 |
+
|
| 15 |
+
# Use for changing dim of input in encoder and decoder embeddings
|
| 16 |
+
def linear_to_conv2d_map(state_dict, prefix, local_metadata, strict,
|
| 17 |
+
missing_keys, unexpected_keys, error_msgs):
|
| 18 |
+
"""
|
| 19 |
+
Unsqueeze twice to map nn.Linear weights to nn.Conv2d weights
|
| 20 |
+
"""
|
| 21 |
+
for k in state_dict:
|
| 22 |
+
is_attention = all(substr in k for substr in ['attn', '.weight'])
|
| 23 |
+
is_mlp = any(k.endswith(s) for s in ['mlp.0.weight', 'mlp.2.weight'])
|
| 24 |
+
|
| 25 |
+
if (is_attention or is_mlp) and len(state_dict[k].shape) == 2:
|
| 26 |
+
state_dict[k] = state_dict[k][:, :, None, None]
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
def correct_for_bias_scale_order_inversion(state_dict, prefix, local_metadata,
|
| 30 |
+
strict, missing_keys,
|
| 31 |
+
unexpected_keys, error_msgs):
|
| 32 |
+
state_dict[prefix + 'bias'] = state_dict[prefix + 'bias'] / state_dict[prefix + 'weight']
|
| 33 |
+
return state_dict
|
| 34 |
+
|
| 35 |
+
class LayerNormANE(LayerNormANEBase):
|
| 36 |
+
|
| 37 |
+
def __init__(self, *args, **kwargs):
|
| 38 |
+
super().__init__(*args, **kwargs)
|
| 39 |
+
self._register_load_state_dict_pre_hook(
|
| 40 |
+
correct_for_bias_scale_order_inversion)
|
| 41 |
+
|
| 42 |
+
class MultiHeadAttentionANE(MultiHeadAttention):
|
| 43 |
+
def __init__(self, n_state: int, n_head: int):
|
| 44 |
+
super().__init__(n_state, n_head)
|
| 45 |
+
self.query = nn.Conv2d(n_state, n_state, kernel_size=1)
|
| 46 |
+
self.key = nn.Conv2d(n_state, n_state, kernel_size=1, bias=False)
|
| 47 |
+
self.value = nn.Conv2d(n_state, n_state, kernel_size=1)
|
| 48 |
+
self.out = nn.Conv2d(n_state, n_state, kernel_size=1)
|
| 49 |
+
|
| 50 |
+
def forward(self,
|
| 51 |
+
x: Tensor,
|
| 52 |
+
xa: Optional[Tensor] = None,
|
| 53 |
+
mask: Optional[Tensor] = None,
|
| 54 |
+
kv_cache: Optional[dict] = None):
|
| 55 |
+
|
| 56 |
+
q = self.query(x)
|
| 57 |
+
|
| 58 |
+
if kv_cache is None or xa is None or self.key not in kv_cache:
|
| 59 |
+
# hooks, if installed (i.e. kv_cache is not None), will prepend the cached kv tensors;
|
| 60 |
+
# otherwise, perform key/value projections for self- or cross-attention as usual.
|
| 61 |
+
k = self.key(x if xa is None else xa)
|
| 62 |
+
v = self.value(x if xa is None else xa)
|
| 63 |
+
|
| 64 |
+
else:
|
| 65 |
+
# for cross-attention, calculate keys and values once and reuse in subsequent calls.
|
| 66 |
+
k = kv_cache[self.key]
|
| 67 |
+
v = kv_cache[self.value]
|
| 68 |
+
|
| 69 |
+
wv, qk = self.qkv_attention_ane(q, k, v, mask)
|
| 70 |
+
|
| 71 |
+
return self.out(wv), qk
|
| 72 |
+
|
| 73 |
+
def qkv_attention_ane(self, q: Tensor, k: Tensor, v: Tensor, mask: Optional[Tensor] = None):
|
| 74 |
+
|
| 75 |
+
_, dim, _, seqlen = q.size()
|
| 76 |
+
|
| 77 |
+
dim_per_head = dim // self.n_head
|
| 78 |
+
|
| 79 |
+
scale = float(dim_per_head)**-0.5
|
| 80 |
+
|
| 81 |
+
q = q * scale
|
| 82 |
+
|
| 83 |
+
mh_q = q.split(dim_per_head, dim=1)
|
| 84 |
+
mh_k = k.transpose(1,3).split(dim_per_head, dim=3)
|
| 85 |
+
mh_v = v.split(dim_per_head, dim=1)
|
| 86 |
+
|
| 87 |
+
mh_qk = [
|
| 88 |
+
torch.einsum('bchq,bkhc->bkhq', [qi, ki])
|
| 89 |
+
for qi, ki in zip(mh_q, mh_k)
|
| 90 |
+
] # (batch_size, max_seq_length, 1, max_seq_length) * n_heads
|
| 91 |
+
|
| 92 |
+
if mask is not None:
|
| 93 |
+
for head_idx in range(self.n_head):
|
| 94 |
+
mh_qk[head_idx] = mh_qk[head_idx] + mask[:, :seqlen, :, :seqlen]
|
| 95 |
+
|
| 96 |
+
attn_weights = [aw.softmax(dim=1) for aw in mh_qk] # (batch_size, max_seq_length, 1, max_seq_length) * n_heads
|
| 97 |
+
attn = [torch.einsum('bkhq,bchk->bchq', wi, vi) for wi, vi in zip(attn_weights, mh_v)] # (batch_size, dim_per_head, 1, max_seq_length) * n_heads
|
| 98 |
+
attn = torch.cat(attn, dim=1) # (batch_size, dim, 1, max_seq_length)
|
| 99 |
+
|
| 100 |
+
return attn, torch.cat(mh_qk, dim=1).float().detach()
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
class ResidualAttentionBlockANE(ResidualAttentionBlock):
|
| 104 |
+
def __init__(self, n_state: int, n_head: int, cross_attention: bool = False):
|
| 105 |
+
super().__init__(n_state, n_head, cross_attention)
|
| 106 |
+
self.attn = MultiHeadAttentionANE(n_state, n_head)
|
| 107 |
+
self.attn_ln = LayerNormANE(n_state)
|
| 108 |
+
self.cross_attn = MultiHeadAttentionANE(n_state, n_head) if cross_attention else None
|
| 109 |
+
self.cross_attn_ln = LayerNormANE(n_state) if cross_attention else None
|
| 110 |
+
|
| 111 |
+
n_mlp = n_state * 4
|
| 112 |
+
self.mlp = nn.Sequential(
|
| 113 |
+
nn.Conv2d(n_state, n_mlp, kernel_size=1),
|
| 114 |
+
nn.GELU(),
|
| 115 |
+
nn.Conv2d(n_mlp, n_state, kernel_size=1)
|
| 116 |
+
)
|
| 117 |
+
self.mlp_ln = LayerNormANE(n_state)
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
class AudioEncoderANE(AudioEncoder):
|
| 121 |
+
def __init__(self, n_mels: int, n_ctx: int, n_state: int, n_head: int, n_layer: int):
|
| 122 |
+
super().__init__(n_mels, n_ctx, n_state, n_head, n_layer)
|
| 123 |
+
|
| 124 |
+
self.blocks = nn.ModuleList(
|
| 125 |
+
[ResidualAttentionBlockANE(n_state, n_head) for _ in range(n_layer)]
|
| 126 |
+
)
|
| 127 |
+
self.ln_post = LayerNormANE(n_state)
|
| 128 |
+
|
| 129 |
+
def forward(self, x: Tensor):
|
| 130 |
+
"""
|
| 131 |
+
x : torch.Tensor, shape = (batch_size, n_mels, n_ctx)
|
| 132 |
+
the mel spectrogram of the audio
|
| 133 |
+
"""
|
| 134 |
+
x = F.gelu(self.conv1(x))
|
| 135 |
+
x = F.gelu(self.conv2(x))
|
| 136 |
+
|
| 137 |
+
assert x.shape[1:] == self.positional_embedding.shape[::-1], "incorrect audio shape"
|
| 138 |
+
|
| 139 |
+
# Add positional embedding and add dummy dim for ANE
|
| 140 |
+
x = (x + self.positional_embedding.transpose(0,1)).to(x.dtype).unsqueeze(2)
|
| 141 |
+
|
| 142 |
+
for block in self.blocks:
|
| 143 |
+
x = block(x)
|
| 144 |
+
|
| 145 |
+
x = self.ln_post(x)
|
| 146 |
+
|
| 147 |
+
# """
|
| 148 |
+
# TODO:
|
| 149 |
+
# I think we need to transpose the result here to make it fit whisper.cpp memory order.
|
| 150 |
+
# However, even doing this, the results are still wrong. Kind of less wrong compared to
|
| 151 |
+
# not transposing, but still wrong.
|
| 152 |
+
|
| 153 |
+
# Also, I don't know why the original OpenAI implementation does not need to transpose
|
| 154 |
+
|
| 155 |
+
# transpose to (batch_size, n_ctx, n_state)
|
| 156 |
+
# x : torch.Tensor, shape = (batch_size, n_state, 1, n_ctx)
|
| 157 |
+
|
| 158 |
+
# """
|
| 159 |
+
# x = x.transpose(1,3)
|
| 160 |
+
|
| 161 |
+
return x
|
| 162 |
+
|
| 163 |
+
class TextDecoderANE(TextDecoder):
|
| 164 |
+
|
| 165 |
+
def __init__(self, n_vocab: int, n_ctx: int, n_state: int, n_head: int, n_layer: int):
|
| 166 |
+
super().__init__(n_vocab, n_ctx, n_state, n_head, n_layer)
|
| 167 |
+
|
| 168 |
+
self.blocks= nn.ModuleList(
|
| 169 |
+
[ResidualAttentionBlockANE(n_state, n_head, cross_attention=True) for _ in range(n_layer)]
|
| 170 |
+
)
|
| 171 |
+
self.ln= LayerNormANE(n_state)
|
| 172 |
+
|
| 173 |
+
def forward(self, x: Tensor, xa: Tensor, kv_cache: Optional[dict] = None):
|
| 174 |
+
"""
|
| 175 |
+
x : torch.LongTensor, shape = (batch_size, <= n_ctx)
|
| 176 |
+
the text tokens
|
| 177 |
+
xa : torch.Tensor, shape = (batch_size, n_mels, n_audio_ctx)
|
| 178 |
+
the encoded audio features to be attended on
|
| 179 |
+
"""
|
| 180 |
+
offset = next(iter(kv_cache.values())).shape[3] if kv_cache else 0
|
| 181 |
+
x = self.token_embedding(x) + self.positional_embedding[offset : offset + x.shape[-1]]
|
| 182 |
+
x = x.to(xa.dtype)
|
| 183 |
+
|
| 184 |
+
# Reformat for ANE
|
| 185 |
+
mask = self.mask[None, None, :, :].permute(0,3,1,2)
|
| 186 |
+
x = x.transpose(1,2).unsqueeze(2)
|
| 187 |
+
|
| 188 |
+
for block in self.blocks:
|
| 189 |
+
x = block(x, xa, mask=mask, kv_cache=kv_cache)
|
| 190 |
+
|
| 191 |
+
x = self.ln(x)
|
| 192 |
+
|
| 193 |
+
# Reformat back from ANE
|
| 194 |
+
x = x.permute(0,2,3,1).squeeze(0)
|
| 195 |
+
|
| 196 |
+
# ANE can only load tensors with dim size of at most 16,384 - whisper uses 51,864 (en) or 51,865 (multi-lang) tokens so we need to compute in chunks
|
| 197 |
+
if self.token_embedding.weight.shape[0] >= 51865:
|
| 198 |
+
# split in 11 chunks - 4715 each
|
| 199 |
+
splits = self.token_embedding.weight.split(self.token_embedding.weight.shape[0]//11, dim=0)
|
| 200 |
+
logits = torch.cat([torch.einsum('bid,jd->bij', x, split) for split in splits]).view(*x.shape[:2], -1)
|
| 201 |
+
else:
|
| 202 |
+
# split in 12 chunks - 4322 each
|
| 203 |
+
assert(self.token_embedding.weight.shape[0] == 51864)
|
| 204 |
+
splits = self.token_embedding.weight.split(self.token_embedding.weight.shape[0]//12, dim=0)
|
| 205 |
+
logits = torch.cat([torch.einsum('bid,jd->bij', x, split) for split in splits]).view(*x.shape[:2], -1)
|
| 206 |
+
|
| 207 |
+
return logits
|
| 208 |
+
|
| 209 |
+
class WhisperANE(Whisper):
|
| 210 |
+
def __init__(self, dims: ModelDimensions):
|
| 211 |
+
super().__init__(dims)
|
| 212 |
+
|
| 213 |
+
self.encoder = AudioEncoderANE(
|
| 214 |
+
self.dims.n_mels,
|
| 215 |
+
self.dims.n_audio_ctx,
|
| 216 |
+
self.dims.n_audio_state,
|
| 217 |
+
self.dims.n_audio_head,
|
| 218 |
+
self.dims.n_audio_layer,
|
| 219 |
+
)
|
| 220 |
+
self.decoder = TextDecoderANE(
|
| 221 |
+
self.dims.n_vocab,
|
| 222 |
+
self.dims.n_text_ctx,
|
| 223 |
+
self.dims.n_text_state,
|
| 224 |
+
self.dims.n_text_head,
|
| 225 |
+
self.dims.n_text_layer,
|
| 226 |
+
)
|
| 227 |
+
|
| 228 |
+
self._register_load_state_dict_pre_hook(linear_to_conv2d_map)
|
| 229 |
+
|
| 230 |
+
def forward(self, mel: torch.Tensor, tokens: torch.Tensor) -> Dict[str, torch.Tensor]:
|
| 231 |
+
return self.decoder(tokens, self.encoder(mel))
|
| 232 |
+
|
| 233 |
+
def install_kv_cache_hooks(self, cache: Optional[dict] = None):
|
| 234 |
+
cache = {**cache} if cache is not None else {}
|
| 235 |
+
hooks = []
|
| 236 |
+
|
| 237 |
+
def save_to_cache(module, _, output):
|
| 238 |
+
if module not in cache or output.shape[3] > self.decoder.positional_embedding.shape[0]:
|
| 239 |
+
cache[module] = output # save as-is, for the first token or cross attention
|
| 240 |
+
else:
|
| 241 |
+
cache[module] = torch.cat([cache[module], output], dim=3).detach()
|
| 242 |
+
return cache[module]
|
| 243 |
+
|
| 244 |
+
def install_hooks(layer: nn.Module):
|
| 245 |
+
if isinstance(layer, MultiHeadAttentionANE):
|
| 246 |
+
hooks.append(layer.key.register_forward_hook(save_to_cache))
|
| 247 |
+
hooks.append(layer.value.register_forward_hook(save_to_cache))
|
| 248 |
+
|
| 249 |
+
self.decoder.apply(install_hooks)
|
| 250 |
+
return cache, hooks
|
| 251 |
+
|
| 252 |
+
def convert_encoder(hparams, model, quantize=False):
|
| 253 |
+
model.eval()
|
| 254 |
+
|
| 255 |
+
input_shape = (1, hparams.n_mels, 3000)
|
| 256 |
+
input_data = torch.randn(input_shape)
|
| 257 |
+
traced_model = torch.jit.trace(model, input_data)
|
| 258 |
+
|
| 259 |
+
model = ct.convert(
|
| 260 |
+
traced_model,
|
| 261 |
+
convert_to=None if quantize else "mlprogram", # convert will fail if weights are quantized, not sure why
|
| 262 |
+
inputs=[ct.TensorType(name="logmel_data", shape=input_shape)],
|
| 263 |
+
outputs=[ct.TensorType(name="output")],
|
| 264 |
+
compute_units=ct.ComputeUnit.ALL
|
| 265 |
+
)
|
| 266 |
+
|
| 267 |
+
if quantize:
|
| 268 |
+
model = quantize_weights(model, nbits=16)
|
| 269 |
+
|
| 270 |
+
return model
|
| 271 |
+
|
| 272 |
+
def convert_decoder(hparams, model, quantize=False):
|
| 273 |
+
model.eval()
|
| 274 |
+
|
| 275 |
+
tokens_shape = (1, 1)
|
| 276 |
+
audio_shape = (1, hparams.n_audio_state, 1, 1500)
|
| 277 |
+
|
| 278 |
+
audio_data = torch.randn(audio_shape)
|
| 279 |
+
token_data = torch.randint(50257, tokens_shape).long()
|
| 280 |
+
traced_model = torch.jit.trace(model, (token_data, audio_data))
|
| 281 |
+
|
| 282 |
+
model = ct.convert(
|
| 283 |
+
traced_model,
|
| 284 |
+
convert_to=None if quantize else "mlprogram", # convert will fail if weights are quantized, not sure why
|
| 285 |
+
inputs=[
|
| 286 |
+
ct.TensorType(name="token_data", shape=tokens_shape, dtype=int),
|
| 287 |
+
ct.TensorType(name="audio_data", shape=audio_shape)
|
| 288 |
+
]
|
| 289 |
+
)
|
| 290 |
+
|
| 291 |
+
if quantize:
|
| 292 |
+
model = quantize_weights(model, nbits=16)
|
| 293 |
+
|
| 294 |
+
return model
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
if __name__ == "__main__":
|
| 298 |
+
parser = argparse.ArgumentParser()
|
| 299 |
+
parser.add_argument("--model", type=str, help="model to convert (e.g. tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3)", required=True)
|
| 300 |
+
parser.add_argument("--encoder-only", type=bool, help="only convert encoder", default=False)
|
| 301 |
+
parser.add_argument("--quantize", type=bool, help="quantize weights to F16", default=False)
|
| 302 |
+
parser.add_argument("--optimize-ane", type=bool, help="optimize for ANE execution (currently broken)", default=False)
|
| 303 |
+
args = parser.parse_args()
|
| 304 |
+
|
| 305 |
+
if args.model not in ["tiny", "tiny.en", "base", "base.en", "small", "small.en", "small.en-tdrz", "medium", "medium.en", "large-v1", "large-v2", "large-v3"]:
|
| 306 |
+
raise ValueError("Invalid model name")
|
| 307 |
+
|
| 308 |
+
whisper = load_model(args.model).cpu()
|
| 309 |
+
hparams = whisper.dims
|
| 310 |
+
print(hparams)
|
| 311 |
+
|
| 312 |
+
if args.optimize_ane:
|
| 313 |
+
whisperANE = WhisperANE(hparams).eval()
|
| 314 |
+
whisperANE.load_state_dict(whisper.state_dict())
|
| 315 |
+
|
| 316 |
+
encoder = whisperANE.encoder
|
| 317 |
+
decoder = whisperANE.decoder
|
| 318 |
+
else:
|
| 319 |
+
encoder = whisper.encoder
|
| 320 |
+
decoder = whisper.decoder
|
| 321 |
+
|
| 322 |
+
# Convert encoder
|
| 323 |
+
encoder = convert_encoder(hparams, encoder, quantize=args.quantize)
|
| 324 |
+
encoder.save(f"models/coreml-encoder-{args.model}.mlpackage")
|
| 325 |
+
|
| 326 |
+
if args.encoder_only is False:
|
| 327 |
+
# Convert decoder
|
| 328 |
+
decoder = convert_decoder(hparams, decoder, quantize=args.quantize)
|
| 329 |
+
decoder.save(f"models/coreml-decoder-{args.model}.mlpackage")
|
| 330 |
+
|
| 331 |
+
print("done converting")
|
convert-whisper-to-openvino.py
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import argparse
|
| 2 |
+
import torch
|
| 3 |
+
from whisper import load_model
|
| 4 |
+
import os
|
| 5 |
+
from openvino.tools import mo
|
| 6 |
+
from openvino.runtime import serialize
|
| 7 |
+
import shutil
|
| 8 |
+
|
| 9 |
+
def convert_encoder(hparams, encoder, mname):
|
| 10 |
+
encoder.eval()
|
| 11 |
+
|
| 12 |
+
mel = torch.zeros((1, hparams.n_mels, 3000))
|
| 13 |
+
|
| 14 |
+
onnx_folder=os.path.join(os.path.dirname(__file__),"onnx_encoder")
|
| 15 |
+
|
| 16 |
+
#create a directory to store the onnx model, and other collateral that is saved during onnx export procedure
|
| 17 |
+
if not os.path.isdir(onnx_folder):
|
| 18 |
+
os.makedirs(onnx_folder)
|
| 19 |
+
|
| 20 |
+
onnx_path = os.path.join(onnx_folder, "whisper_encoder.onnx")
|
| 21 |
+
|
| 22 |
+
torch.onnx.export(
|
| 23 |
+
encoder,
|
| 24 |
+
mel,
|
| 25 |
+
onnx_path,
|
| 26 |
+
input_names=["mel"],
|
| 27 |
+
output_names=["output_features"]
|
| 28 |
+
)
|
| 29 |
+
|
| 30 |
+
# use model optimizer to convert onnx to OpenVINO IR format
|
| 31 |
+
encoder_model = mo.convert_model(onnx_path, compress_to_fp16=True)
|
| 32 |
+
serialize(encoder_model, xml_path=os.path.join(os.path.dirname(__file__),"ggml-" + mname + "-encoder-openvino.xml"))
|
| 33 |
+
|
| 34 |
+
#cleanup
|
| 35 |
+
if os.path.isdir(onnx_folder):
|
| 36 |
+
shutil.rmtree(onnx_folder)
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
if __name__ == "__main__":
|
| 40 |
+
parser = argparse.ArgumentParser()
|
| 41 |
+
parser.add_argument("--model", type=str, help="model to convert (e.g. tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3)", required=True)
|
| 42 |
+
args = parser.parse_args()
|
| 43 |
+
|
| 44 |
+
if args.model not in ["tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large-v2", "large-v3"]:
|
| 45 |
+
raise ValueError("Invalid model name")
|
| 46 |
+
|
| 47 |
+
whisper = load_model(args.model).cpu()
|
| 48 |
+
hparams = whisper.dims
|
| 49 |
+
|
| 50 |
+
encoder = whisper.encoder
|
| 51 |
+
|
| 52 |
+
# Convert encoder to onnx
|
| 53 |
+
convert_encoder(hparams, encoder, args.model)
|
coreml-encoder-base.en.mlpackage/Data/com.apple.CoreML/model.mlmodel
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:598255dff1e5eb81f2c32e6e9c6b3c4916bbbf4d2b39f4749d5dcb438f33f420
|
| 3 |
+
size 58049
|
coreml-encoder-base.en.mlpackage/Data/com.apple.CoreML/weights/weight.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fc998211e55f0972c70e3d29103477cfe8c6dd485cd68438951f83fa3ee3b770
|
| 3 |
+
size 41188544
|
coreml-encoder-base.en.mlpackage/Manifest.json
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"fileFormatVersion": "1.0.0",
|
| 3 |
+
"itemInfoEntries": {
|
| 4 |
+
"36C90F61-3ED1-4D0A-A009-9C0067D75407": {
|
| 5 |
+
"author": "com.apple.CoreML",
|
| 6 |
+
"description": "CoreML Model Specification",
|
| 7 |
+
"name": "model.mlmodel",
|
| 8 |
+
"path": "com.apple.CoreML/model.mlmodel"
|
| 9 |
+
},
|
| 10 |
+
"945A3445-84F5-4FAA-BCEF-C53E04FA3A47": {
|
| 11 |
+
"author": "com.apple.CoreML",
|
| 12 |
+
"description": "CoreML Model Weights",
|
| 13 |
+
"name": "weights",
|
| 14 |
+
"path": "com.apple.CoreML/weights"
|
| 15 |
+
}
|
| 16 |
+
},
|
| 17 |
+
"rootModelIdentifier": "36C90F61-3ED1-4D0A-A009-9C0067D75407"
|
| 18 |
+
}
|
download-coreml-model.sh
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# This script downloads Whisper model files that have already been converted to Core ML format.
|
| 4 |
+
# This way you don't have to convert them yourself.
|
| 5 |
+
|
| 6 |
+
src="https://huggingface.co/datasets/ggerganov/whisper.cpp-coreml"
|
| 7 |
+
pfx="resolve/main/ggml"
|
| 8 |
+
|
| 9 |
+
# get the path of this script
|
| 10 |
+
function get_script_path() {
|
| 11 |
+
if [ -x "$(command -v realpath)" ]; then
|
| 12 |
+
echo "$(dirname $(realpath $0))"
|
| 13 |
+
else
|
| 14 |
+
local ret="$(cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P)"
|
| 15 |
+
echo "$ret"
|
| 16 |
+
fi
|
| 17 |
+
}
|
| 18 |
+
|
| 19 |
+
models_path="$(get_script_path)"
|
| 20 |
+
|
| 21 |
+
# Whisper models
|
| 22 |
+
models=( "tiny.en" "tiny" "base.en" "base" "small.en" "small" "medium.en" "medium" "large-v1" "large-v2" "large-v3" )
|
| 23 |
+
|
| 24 |
+
# list available models
|
| 25 |
+
function list_models {
|
| 26 |
+
printf "\n"
|
| 27 |
+
printf " Available models:"
|
| 28 |
+
for model in "${models[@]}"; do
|
| 29 |
+
printf " $model"
|
| 30 |
+
done
|
| 31 |
+
printf "\n\n"
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
if [ "$#" -ne 1 ]; then
|
| 35 |
+
printf "Usage: $0 <model>\n"
|
| 36 |
+
list_models
|
| 37 |
+
|
| 38 |
+
exit 1
|
| 39 |
+
fi
|
| 40 |
+
|
| 41 |
+
model=$1
|
| 42 |
+
|
| 43 |
+
if [[ ! " ${models[@]} " =~ " ${model} " ]]; then
|
| 44 |
+
printf "Invalid model: $model\n"
|
| 45 |
+
list_models
|
| 46 |
+
|
| 47 |
+
exit 1
|
| 48 |
+
fi
|
| 49 |
+
|
| 50 |
+
# download Core ML model
|
| 51 |
+
|
| 52 |
+
printf "Downloading Core ML model $model from '$src' ...\n"
|
| 53 |
+
|
| 54 |
+
cd $models_path
|
| 55 |
+
|
| 56 |
+
if [ -f "ggml-$model.mlmodel" ]; then
|
| 57 |
+
printf "Model $model already exists. Skipping download.\n"
|
| 58 |
+
exit 0
|
| 59 |
+
fi
|
| 60 |
+
|
| 61 |
+
if [ -x "$(command -v wget)" ]; then
|
| 62 |
+
wget --quiet --show-progress -O ggml-$model.mlmodel $src/$pfx-$model.mlmodel
|
| 63 |
+
elif [ -x "$(command -v curl)" ]; then
|
| 64 |
+
curl -L --output ggml-$model.mlmodel $src/$pfx-$model.mlmodel
|
| 65 |
+
else
|
| 66 |
+
printf "Either wget or curl is required to download models.\n"
|
| 67 |
+
exit 1
|
| 68 |
+
fi
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
if [ $? -ne 0 ]; then
|
| 72 |
+
printf "Failed to download Core ML model $model \n"
|
| 73 |
+
printf "Please try again later or download the original Whisper model files and convert them yourself.\n"
|
| 74 |
+
exit 1
|
| 75 |
+
fi
|
| 76 |
+
|
| 77 |
+
printf "Done! Model '$model' saved in 'models/ggml-$model.mlmodel'\n"
|
| 78 |
+
printf "Run the following command to compile it:\n\n"
|
| 79 |
+
printf " $ xcrun coremlc compile ./models/ggml-$model.mlmodel ./models\n\n"
|
| 80 |
+
printf "You can now use it like this:\n\n"
|
| 81 |
+
printf " $ ./main -m models/ggml-$model.bin -f samples/jfk.wav\n"
|
| 82 |
+
printf "\n"
|
download-ggml-model.cmd
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
@echo off
|
| 2 |
+
|
| 3 |
+
pushd %~dp0
|
| 4 |
+
set models_path=%CD%
|
| 5 |
+
for %%d in (%~dp0..) do set root_path=%%~fd
|
| 6 |
+
popd
|
| 7 |
+
|
| 8 |
+
set argc=0
|
| 9 |
+
for %%x in (%*) do set /A argc+=1
|
| 10 |
+
|
| 11 |
+
set models=tiny.en tiny base.en base small.en small medium.en medium large-v1 large-v2 large-v3
|
| 12 |
+
|
| 13 |
+
if %argc% neq 1 (
|
| 14 |
+
echo.
|
| 15 |
+
echo Usage: download-ggml-model.cmd model
|
| 16 |
+
CALL :list_models
|
| 17 |
+
goto :eof
|
| 18 |
+
)
|
| 19 |
+
|
| 20 |
+
set model=%1
|
| 21 |
+
|
| 22 |
+
for %%b in (%models%) do (
|
| 23 |
+
if "%%b"=="%model%" (
|
| 24 |
+
CALL :download_model
|
| 25 |
+
goto :eof
|
| 26 |
+
)
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
echo Invalid model: %model%
|
| 30 |
+
CALL :list_models
|
| 31 |
+
goto :eof
|
| 32 |
+
|
| 33 |
+
:download_model
|
| 34 |
+
echo Downloading ggml model %model%...
|
| 35 |
+
|
| 36 |
+
cd "%models_path%"
|
| 37 |
+
|
| 38 |
+
if exist "ggml-%model%.bin" (
|
| 39 |
+
echo Model %model% already exists. Skipping download.
|
| 40 |
+
goto :eof
|
| 41 |
+
)
|
| 42 |
+
|
| 43 |
+
PowerShell -NoProfile -ExecutionPolicy Bypass -Command "Start-BitsTransfer -Source https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-%model%.bin -Destination ggml-%model%.bin"
|
| 44 |
+
|
| 45 |
+
if %ERRORLEVEL% neq 0 (
|
| 46 |
+
echo Failed to download ggml model %model%
|
| 47 |
+
echo Please try again later or download the original Whisper model files and convert them yourself.
|
| 48 |
+
goto :eof
|
| 49 |
+
)
|
| 50 |
+
|
| 51 |
+
echo Done! Model %model% saved in %root_path%\models\ggml-%model%.bin
|
| 52 |
+
echo You can now use it like this:
|
| 53 |
+
echo main.exe -m %root_path%\models\ggml-%model%.bin -f %root_path%\samples\jfk.wav
|
| 54 |
+
|
| 55 |
+
goto :eof
|
| 56 |
+
|
| 57 |
+
:list_models
|
| 58 |
+
echo.
|
| 59 |
+
echo Available models:
|
| 60 |
+
(for %%a in (%models%) do (
|
| 61 |
+
echo %%a
|
| 62 |
+
))
|
| 63 |
+
echo.
|
| 64 |
+
exit /b
|
download-ggml-model.sh
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# This script downloads Whisper model files that have already been converted to ggml format.
|
| 4 |
+
# This way you don't have to convert them yourself.
|
| 5 |
+
|
| 6 |
+
#src="https://ggml.ggerganov.com"
|
| 7 |
+
#pfx="ggml-model-whisper"
|
| 8 |
+
|
| 9 |
+
src="https://huggingface.co/ggerganov/whisper.cpp"
|
| 10 |
+
pfx="resolve/main/ggml"
|
| 11 |
+
|
| 12 |
+
# get the path of this script
|
| 13 |
+
function get_script_path() {
|
| 14 |
+
if [ -x "$(command -v realpath)" ]; then
|
| 15 |
+
echo "$(dirname "$(realpath "$0")")"
|
| 16 |
+
else
|
| 17 |
+
local ret="$(cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P)"
|
| 18 |
+
echo "$ret"
|
| 19 |
+
fi
|
| 20 |
+
}
|
| 21 |
+
|
| 22 |
+
models_path="$(get_script_path)"
|
| 23 |
+
|
| 24 |
+
# Whisper models
|
| 25 |
+
models=(
|
| 26 |
+
"tiny.en"
|
| 27 |
+
"tiny"
|
| 28 |
+
"tiny-q5_1"
|
| 29 |
+
"tiny.en-q5_1"
|
| 30 |
+
"base.en"
|
| 31 |
+
"base"
|
| 32 |
+
"base-q5_1"
|
| 33 |
+
"base.en-q5_1"
|
| 34 |
+
"small.en"
|
| 35 |
+
"small.en-tdrz"
|
| 36 |
+
"small"
|
| 37 |
+
"small-q5_1"
|
| 38 |
+
"small.en-q5_1"
|
| 39 |
+
"medium"
|
| 40 |
+
"medium.en"
|
| 41 |
+
"medium-q5_0"
|
| 42 |
+
"medium.en-q5_0"
|
| 43 |
+
"large-v1"
|
| 44 |
+
"large-v2"
|
| 45 |
+
"large-v3"
|
| 46 |
+
"large-q5_0"
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
# list available models
|
| 50 |
+
function list_models {
|
| 51 |
+
printf "\n"
|
| 52 |
+
printf " Available models:"
|
| 53 |
+
for model in "${models[@]}"; do
|
| 54 |
+
printf " $model"
|
| 55 |
+
done
|
| 56 |
+
printf "\n\n"
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
if [ "$#" -ne 1 ]; then
|
| 60 |
+
printf "Usage: $0 <model>\n"
|
| 61 |
+
list_models
|
| 62 |
+
|
| 63 |
+
exit 1
|
| 64 |
+
fi
|
| 65 |
+
|
| 66 |
+
model=$1
|
| 67 |
+
|
| 68 |
+
if [[ ! " ${models[@]} " =~ " ${model} " ]]; then
|
| 69 |
+
printf "Invalid model: $model\n"
|
| 70 |
+
list_models
|
| 71 |
+
|
| 72 |
+
exit 1
|
| 73 |
+
fi
|
| 74 |
+
|
| 75 |
+
# check if model contains `tdrz` and update the src and pfx accordingly
|
| 76 |
+
if [[ $model == *"tdrz"* ]]; then
|
| 77 |
+
src="https://huggingface.co/akashmjn/tinydiarize-whisper.cpp"
|
| 78 |
+
pfx="resolve/main/ggml"
|
| 79 |
+
fi
|
| 80 |
+
|
| 81 |
+
# download ggml model
|
| 82 |
+
|
| 83 |
+
printf "Downloading ggml model $model from '$src' ...\n"
|
| 84 |
+
|
| 85 |
+
cd "$models_path"
|
| 86 |
+
|
| 87 |
+
if [ -f "ggml-$model.bin" ]; then
|
| 88 |
+
printf "Model $model already exists. Skipping download.\n"
|
| 89 |
+
exit 0
|
| 90 |
+
fi
|
| 91 |
+
|
| 92 |
+
if [ -x "$(command -v wget)" ]; then
|
| 93 |
+
wget --no-config --quiet --show-progress -O ggml-$model.bin $src/$pfx-$model.bin
|
| 94 |
+
elif [ -x "$(command -v curl)" ]; then
|
| 95 |
+
curl -L --output ggml-$model.bin $src/$pfx-$model.bin
|
| 96 |
+
else
|
| 97 |
+
printf "Either wget or curl is required to download models.\n"
|
| 98 |
+
exit 1
|
| 99 |
+
fi
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
if [ $? -ne 0 ]; then
|
| 103 |
+
printf "Failed to download ggml model $model \n"
|
| 104 |
+
printf "Please try again later or download the original Whisper model files and convert them yourself.\n"
|
| 105 |
+
exit 1
|
| 106 |
+
fi
|
| 107 |
+
|
| 108 |
+
printf "Done! Model '$model' saved in 'models/ggml-$model.bin'\n"
|
| 109 |
+
printf "You can now use it like this:\n\n"
|
| 110 |
+
printf " $ ./main -m models/ggml-$model.bin -f samples/jfk.wav\n"
|
| 111 |
+
printf "\n"
|
for-tests-ggml-base.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ddf6ff3e5f9e0da794fee41652559af1efaa6118f3cc699f250991c515b6af2a
|
| 3 |
+
size 575451
|
for-tests-ggml-base.en.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1bc042ca584ff1895897e95bffb34ccf357be46c1fca97cf7fbe32f2060aa9e8
|
| 3 |
+
size 586836
|
for-tests-ggml-large.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bf987facca89f2d75a843d5467d91668fba5c23debf66a0644df53f0accf0cfb
|
| 3 |
+
size 575451
|
for-tests-ggml-medium.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f676437ddef445443e95fc77d88d59013e9f6dc05d25ebcbabd89abeefc5565b
|
| 3 |
+
size 575451
|
for-tests-ggml-medium.en.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:52c051196f9b2737679722239bc7f649f4a3b0a84d418be0adfd7aed72480827
|
| 3 |
+
size 586836
|
for-tests-ggml-small.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e3cd79f6d818b13aea6427e0c56ca97d6d82274585efb8bd25187a37b944024b
|
| 3 |
+
size 575451
|
for-tests-ggml-small.en.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5618c8b3cf34b1fa4493789eb92c9ff68796fb789a58180a8c4b3fb5b28789e2
|
| 3 |
+
size 586836
|
for-tests-ggml-tiny.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c486fb9f14a28b1c1dc252741a431646cc573450c900b9d9c406e10294aa01e6
|
| 3 |
+
size 575451
|
for-tests-ggml-tiny.en.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dd6b7796204a1cdf7164666423034e6e1a7a3e9f5c22327b4b7974c4584bd82d
|
| 3 |
+
size 586836
|
generate-coreml-interface.sh
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
#
|
| 3 |
+
# This generates:
|
| 4 |
+
# - coreml/whisper-encoder-impl.h and coreml/whisper-encoder-impl.m
|
| 5 |
+
# - coreml/whisper-decoder-impl.h and coreml/whisper-decoder-impl.m
|
| 6 |
+
#
|
| 7 |
+
|
| 8 |
+
wd=$(dirname "$0")
|
| 9 |
+
cd "$wd/../"
|
| 10 |
+
|
| 11 |
+
python3 models/convert-whisper-to-coreml.py --model tiny.en
|
| 12 |
+
|
| 13 |
+
mv -v models/coreml-encoder-tiny.en.mlpackage models/whisper-encoder-impl.mlpackage
|
| 14 |
+
xcrun coremlc generate models/whisper-encoder-impl.mlpackage coreml/
|
| 15 |
+
mv coreml/whisper_encoder_impl.h coreml/whisper-encoder-impl.h
|
| 16 |
+
mv coreml/whisper_encoder_impl.m coreml/whisper-encoder-impl.m
|
| 17 |
+
sed -i '' 's/whisper_encoder_impl\.h/whisper-encoder-impl.h/g' coreml/whisper-encoder-impl.m
|
| 18 |
+
sed -i '' 's/whisper_encoder_impl\.m/whisper-encoder-impl.m/g' coreml/whisper-encoder-impl.m
|
| 19 |
+
sed -i '' 's/whisper_encoder_impl\.h/whisper-encoder-impl.h/g' coreml/whisper-encoder-impl.h
|
| 20 |
+
|
| 21 |
+
mv -v models/coreml-decoder-tiny.en.mlpackage models/whisper-decoder-impl.mlpackage
|
| 22 |
+
xcrun coremlc generate models/whisper-decoder-impl.mlpackage coreml/
|
| 23 |
+
mv coreml/whisper_decoder_impl.h coreml/whisper-decoder-impl.h
|
| 24 |
+
mv coreml/whisper_decoder_impl.m coreml/whisper-decoder-impl.m
|
| 25 |
+
sed -i '' 's/whisper_decoder_impl\.h/whisper-decoder-impl.h/g' coreml/whisper-decoder-impl.m
|
| 26 |
+
sed -i '' 's/whisper_decoder_impl\.m/whisper-decoder-impl.m/g' coreml/whisper-decoder-impl.m
|
| 27 |
+
sed -i '' 's/whisper_decoder_impl\.h/whisper-decoder-impl.h/g' coreml/whisper-decoder-impl.h
|
| 28 |
+
|
| 29 |
+
rm -rfv models/whisper-encoder-impl.mlpackage models/whisper-decoder-impl.mlpackage
|
generate-coreml-model.sh
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Usage: ./generate-coreml-model.sh <model-name>
|
| 4 |
+
if [ $# -eq 0 ]; then
|
| 5 |
+
echo "No model name supplied"
|
| 6 |
+
echo "Usage for Whisper models: ./generate-coreml-model.sh <model-name>"
|
| 7 |
+
echo "Usage for HuggingFace models: ./generate-coreml-model.sh -h5 <model-name> <model-path>"
|
| 8 |
+
exit 1
|
| 9 |
+
elif [[ "$1" == "-h5" && $# != 3 ]]; then
|
| 10 |
+
echo "No model name and model path supplied for a HuggingFace model"
|
| 11 |
+
echo "Usage for HuggingFace models: ./generate-coreml-model.sh -h5 <model-name> <model-path>"
|
| 12 |
+
exit 1
|
| 13 |
+
fi
|
| 14 |
+
|
| 15 |
+
mname="$1"
|
| 16 |
+
|
| 17 |
+
wd=$(dirname "$0")
|
| 18 |
+
cd "$wd/../"
|
| 19 |
+
|
| 20 |
+
if [[ $mname == "-h5" ]]; then
|
| 21 |
+
mname="$2"
|
| 22 |
+
mpath="$3"
|
| 23 |
+
echo $mpath
|
| 24 |
+
python3 models/convert-h5-to-coreml.py --model-name $mname --model-path $mpath --encoder-only True
|
| 25 |
+
else
|
| 26 |
+
python3 models/convert-whisper-to-coreml.py --model $mname --encoder-only True
|
| 27 |
+
fi
|
| 28 |
+
|
| 29 |
+
xcrun coremlc compile models/coreml-encoder-${mname}.mlpackage models/
|
| 30 |
+
rm -rf models/ggml-${mname}-encoder.mlmodelc
|
| 31 |
+
mv -v models/coreml-encoder-${mname}.mlmodelc models/ggml-${mname}-encoder.mlmodelc
|
| 32 |
+
|
| 33 |
+
# TODO: decoder (sometime in the future maybe)
|
| 34 |
+
#xcrun coremlc compile models/whisper-decoder-${mname}.mlpackage models/
|
| 35 |
+
#rm -rf models/ggml-${mname}-decoder.mlmodelc
|
| 36 |
+
#mv -v models/coreml_decoder_${mname}.mlmodelc models/ggml-${mname}-decoder.mlmodelc
|
ggml-base.en-encoder.mlmodelc/analytics/coremldata.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:461c6790016895f31a85af19613c6a21d3b937f5fea6bc52387360a4100947e1
|
| 3 |
+
size 243
|
ggml-base.en-encoder.mlmodelc/coremldata.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:97d47ff2029aaa5e922ecf427c1e9fccc08d7e7b8226be5c6f482fceaf583dd4
|
| 3 |
+
size 319
|
ggml-base.en-encoder.mlmodelc/metadata.json
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"metadataOutputVersion" : "3.0",
|
| 4 |
+
"storagePrecision" : "Float16",
|
| 5 |
+
"outputSchema" : [
|
| 6 |
+
{
|
| 7 |
+
"hasShapeFlexibility" : "0",
|
| 8 |
+
"isOptional" : "0",
|
| 9 |
+
"dataType" : "Float32",
|
| 10 |
+
"formattedType" : "MultiArray (Float32 1 × 1500 × 512)",
|
| 11 |
+
"shortDescription" : "",
|
| 12 |
+
"shape" : "[1, 1500, 512]",
|
| 13 |
+
"name" : "output",
|
| 14 |
+
"type" : "MultiArray"
|
| 15 |
+
}
|
| 16 |
+
],
|
| 17 |
+
"modelParameters" : [
|
| 18 |
+
|
| 19 |
+
],
|
| 20 |
+
"specificationVersion" : 6,
|
| 21 |
+
"mlProgramOperationTypeHistogram" : {
|
| 22 |
+
"Linear" : 36,
|
| 23 |
+
"Matmul" : 12,
|
| 24 |
+
"Cast" : 2,
|
| 25 |
+
"Conv" : 2,
|
| 26 |
+
"Softmax" : 6,
|
| 27 |
+
"Add" : 13,
|
| 28 |
+
"LayerNorm" : 13,
|
| 29 |
+
"Mul" : 12,
|
| 30 |
+
"Transpose" : 25,
|
| 31 |
+
"Gelu" : 8,
|
| 32 |
+
"Reshape" : 24
|
| 33 |
+
},
|
| 34 |
+
"computePrecision" : "Mixed (Float16, Float32, Int32)",
|
| 35 |
+
"isUpdatable" : "0",
|
| 36 |
+
"availability" : {
|
| 37 |
+
"macOS" : "12.0",
|
| 38 |
+
"tvOS" : "15.0",
|
| 39 |
+
"visionOS" : "1.0",
|
| 40 |
+
"watchOS" : "8.0",
|
| 41 |
+
"iOS" : "15.0",
|
| 42 |
+
"macCatalyst" : "15.0"
|
| 43 |
+
},
|
| 44 |
+
"modelType" : {
|
| 45 |
+
"name" : "MLModelType_mlProgram"
|
| 46 |
+
},
|
| 47 |
+
"userDefinedMetadata" : {
|
| 48 |
+
"com.github.apple.coremltools.source_dialect" : "TorchScript",
|
| 49 |
+
"com.github.apple.coremltools.source" : "torch==1.11.0",
|
| 50 |
+
"com.github.apple.coremltools.version" : "7.1"
|
| 51 |
+
},
|
| 52 |
+
"inputSchema" : [
|
| 53 |
+
{
|
| 54 |
+
"hasShapeFlexibility" : "0",
|
| 55 |
+
"isOptional" : "0",
|
| 56 |
+
"dataType" : "Float32",
|
| 57 |
+
"formattedType" : "MultiArray (Float32 1 × 80 × 3000)",
|
| 58 |
+
"shortDescription" : "",
|
| 59 |
+
"shape" : "[1, 80, 3000]",
|
| 60 |
+
"name" : "logmel_data",
|
| 61 |
+
"type" : "MultiArray"
|
| 62 |
+
}
|
| 63 |
+
],
|
| 64 |
+
"generatedClassName" : "coreml_encoder_base_en",
|
| 65 |
+
"method" : "predict"
|
| 66 |
+
}
|
| 67 |
+
]
|
ggml-base.en-encoder.mlmodelc/model.mil
ADDED
|
@@ -0,0 +1,388 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
program(1.0)
|
| 2 |
+
[buildInfo = dict<tensor<string, []>, tensor<string, []>>({{"coremlc-component-MIL", "5.33.5"}, {"coremlc-version", "1877.40.3"}, {"coremltools-component-torch", "1.11.0"}, {"coremltools-source-dialect", "TorchScript"}, {"coremltools-version", "7.1"}})]
|
| 3 |
+
{
|
| 4 |
+
func main<ios15>(tensor<fp32, [1, 80, 3000]> logmel_data) {
|
| 5 |
+
tensor<int32, []> var_20 = const()[name = tensor<string, []>("op_20"), val = tensor<int32, []>(1)];
|
| 6 |
+
tensor<int32, [1]> var_28 = const()[name = tensor<string, []>("op_28"), val = tensor<int32, [1]>([1])];
|
| 7 |
+
tensor<int32, [1]> var_30 = const()[name = tensor<string, []>("op_30"), val = tensor<int32, [1]>([1])];
|
| 8 |
+
tensor<string, []> var_32_pad_type_0 = const()[name = tensor<string, []>("op_32_pad_type_0"), val = tensor<string, []>("custom")];
|
| 9 |
+
tensor<int32, [2]> var_32_pad_0 = const()[name = tensor<string, []>("op_32_pad_0"), val = tensor<int32, [2]>([1, 1])];
|
| 10 |
+
tensor<string, []> logmel_data_to_fp16_dtype_0 = const()[name = tensor<string, []>("logmel_data_to_fp16_dtype_0"), val = tensor<string, []>("fp16")];
|
| 11 |
+
tensor<fp16, [512, 80, 3]> weight_3_to_fp16 = const()[name = tensor<string, []>("weight_3_to_fp16"), val = tensor<fp16, [512, 80, 3]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(64)))];
|
| 12 |
+
tensor<fp16, [512]> bias_3_to_fp16 = const()[name = tensor<string, []>("bias_3_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(245888)))];
|
| 13 |
+
tensor<fp16, [1, 80, 3000]> cast_37 = cast(dtype = logmel_data_to_fp16_dtype_0, x = logmel_data)[name = tensor<string, []>("cast_37")];
|
| 14 |
+
tensor<fp16, [1, 512, 3000]> var_32_cast_fp16 = conv(bias = bias_3_to_fp16, dilations = var_30, groups = var_20, pad = var_32_pad_0, pad_type = var_32_pad_type_0, strides = var_28, weight = weight_3_to_fp16, x = cast_37)[name = tensor<string, []>("op_32_cast_fp16")];
|
| 15 |
+
tensor<string, []> input_1_mode_0 = const()[name = tensor<string, []>("input_1_mode_0"), val = tensor<string, []>("EXACT")];
|
| 16 |
+
tensor<fp16, [1, 512, 3000]> input_1_cast_fp16 = gelu(mode = input_1_mode_0, x = var_32_cast_fp16)[name = tensor<string, []>("input_1_cast_fp16")];
|
| 17 |
+
tensor<int32, []> var_36 = const()[name = tensor<string, []>("op_36"), val = tensor<int32, []>(1)];
|
| 18 |
+
tensor<int32, [1]> var_45 = const()[name = tensor<string, []>("op_45"), val = tensor<int32, [1]>([2])];
|
| 19 |
+
tensor<int32, [1]> var_47 = const()[name = tensor<string, []>("op_47"), val = tensor<int32, [1]>([1])];
|
| 20 |
+
tensor<string, []> var_49_pad_type_0 = const()[name = tensor<string, []>("op_49_pad_type_0"), val = tensor<string, []>("custom")];
|
| 21 |
+
tensor<int32, [2]> var_49_pad_0 = const()[name = tensor<string, []>("op_49_pad_0"), val = tensor<int32, [2]>([1, 1])];
|
| 22 |
+
tensor<fp16, [512, 512, 3]> weight_7_to_fp16 = const()[name = tensor<string, []>("weight_7_to_fp16"), val = tensor<fp16, [512, 512, 3]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(246976)))];
|
| 23 |
+
tensor<fp16, [512]> bias_7_to_fp16 = const()[name = tensor<string, []>("bias_7_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(1819904)))];
|
| 24 |
+
tensor<fp16, [1, 512, 1500]> var_49_cast_fp16 = conv(bias = bias_7_to_fp16, dilations = var_47, groups = var_36, pad = var_49_pad_0, pad_type = var_49_pad_type_0, strides = var_45, weight = weight_7_to_fp16, x = input_1_cast_fp16)[name = tensor<string, []>("op_49_cast_fp16")];
|
| 25 |
+
tensor<string, []> x_3_mode_0 = const()[name = tensor<string, []>("x_3_mode_0"), val = tensor<string, []>("EXACT")];
|
| 26 |
+
tensor<fp16, [1, 512, 1500]> x_3_cast_fp16 = gelu(mode = x_3_mode_0, x = var_49_cast_fp16)[name = tensor<string, []>("x_3_cast_fp16")];
|
| 27 |
+
tensor<int32, [3]> var_54 = const()[name = tensor<string, []>("op_54"), val = tensor<int32, [3]>([0, 2, 1])];
|
| 28 |
+
tensor<fp16, [1500, 512]> positional_embedding_to_fp16 = const()[name = tensor<string, []>("positional_embedding_to_fp16"), val = tensor<fp16, [1500, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(1820992)))];
|
| 29 |
+
tensor<fp16, [1, 1500, 512]> transpose_60 = transpose(perm = var_54, x = x_3_cast_fp16)[name = tensor<string, []>("transpose_60")];
|
| 30 |
+
tensor<fp16, [1, 1500, 512]> var_57_cast_fp16 = add(x = transpose_60, y = positional_embedding_to_fp16)[name = tensor<string, []>("op_57_cast_fp16")];
|
| 31 |
+
tensor<int32, []> var_70 = const()[name = tensor<string, []>("op_70"), val = tensor<int32, []>(-1)];
|
| 32 |
+
tensor<int32, [1]> var_87_axes_0 = const()[name = tensor<string, []>("op_87_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 33 |
+
tensor<fp16, [512]> blocks_0_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_0_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3357056)))];
|
| 34 |
+
tensor<fp16, [512]> blocks_0_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_0_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3358144)))];
|
| 35 |
+
tensor<fp16, []> var_76_to_fp16 = const()[name = tensor<string, []>("op_76_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
|
| 36 |
+
tensor<fp16, [1, 1500, 512]> var_87_cast_fp16 = layer_norm(axes = var_87_axes_0, beta = blocks_0_attn_ln_bias_to_fp16, epsilon = var_76_to_fp16, gamma = blocks_0_attn_ln_weight_to_fp16, x = var_57_cast_fp16)[name = tensor<string, []>("op_87_cast_fp16")];
|
| 37 |
+
tensor<fp16, [512, 512]> var_98_to_fp16 = const()[name = tensor<string, []>("op_98_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3359232)))];
|
| 38 |
+
tensor<fp16, [512]> var_99_to_fp16 = const()[name = tensor<string, []>("op_99_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3883584)))];
|
| 39 |
+
tensor<fp16, [1, 1500, 512]> linear_0_cast_fp16 = linear(bias = var_99_to_fp16, weight = var_98_to_fp16, x = var_87_cast_fp16)[name = tensor<string, []>("linear_0_cast_fp16")];
|
| 40 |
+
tensor<fp16, [512, 512]> var_102_to_fp16 = const()[name = tensor<string, []>("op_102_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(3884672)))];
|
| 41 |
+
tensor<fp16, [512]> linear_1_bias_0_to_fp16 = const()[name = tensor<string, []>("linear_1_bias_0_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(4409024)))];
|
| 42 |
+
tensor<fp16, [1, 1500, 512]> linear_1_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_102_to_fp16, x = var_87_cast_fp16)[name = tensor<string, []>("linear_1_cast_fp16")];
|
| 43 |
+
tensor<fp16, [512, 512]> var_106_to_fp16 = const()[name = tensor<string, []>("op_106_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(4410112)))];
|
| 44 |
+
tensor<fp16, [512]> var_107_to_fp16 = const()[name = tensor<string, []>("op_107_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(4934464)))];
|
| 45 |
+
tensor<fp16, [1, 1500, 512]> linear_2_cast_fp16 = linear(bias = var_107_to_fp16, weight = var_106_to_fp16, x = var_87_cast_fp16)[name = tensor<string, []>("linear_2_cast_fp16")];
|
| 46 |
+
tensor<int32, [4]> var_115 = const()[name = tensor<string, []>("op_115"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 47 |
+
tensor<fp16, [1, 1500, 8, 64]> var_116_cast_fp16 = reshape(shape = var_115, x = linear_0_cast_fp16)[name = tensor<string, []>("op_116_cast_fp16")];
|
| 48 |
+
tensor<fp16, [1, 1, 1, 1]> const_42_to_fp16 = const()[name = tensor<string, []>("const_42_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 49 |
+
tensor<fp16, [1, 1500, 8, 64]> q_3_cast_fp16 = mul(x = var_116_cast_fp16, y = const_42_to_fp16)[name = tensor<string, []>("q_3_cast_fp16")];
|
| 50 |
+
tensor<int32, [4]> var_122 = const()[name = tensor<string, []>("op_122"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 51 |
+
tensor<fp16, [1, 1500, 8, 64]> var_123_cast_fp16 = reshape(shape = var_122, x = linear_1_cast_fp16)[name = tensor<string, []>("op_123_cast_fp16")];
|
| 52 |
+
tensor<fp16, [1, 1, 1, 1]> const_43_to_fp16 = const()[name = tensor<string, []>("const_43_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 53 |
+
tensor<fp16, [1, 1500, 8, 64]> k_3_cast_fp16 = mul(x = var_123_cast_fp16, y = const_43_to_fp16)[name = tensor<string, []>("k_3_cast_fp16")];
|
| 54 |
+
tensor<int32, [4]> var_129 = const()[name = tensor<string, []>("op_129"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 55 |
+
tensor<fp16, [1, 1500, 8, 64]> var_130_cast_fp16 = reshape(shape = var_129, x = linear_2_cast_fp16)[name = tensor<string, []>("op_130_cast_fp16")];
|
| 56 |
+
tensor<int32, [4]> var_131 = const()[name = tensor<string, []>("op_131"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 57 |
+
tensor<bool, []> qk_1_transpose_x_0 = const()[name = tensor<string, []>("qk_1_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 58 |
+
tensor<bool, []> qk_1_transpose_y_0 = const()[name = tensor<string, []>("qk_1_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 59 |
+
tensor<int32, [4]> transpose_24_perm_0 = const()[name = tensor<string, []>("transpose_24_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 60 |
+
tensor<int32, [4]> transpose_25_perm_0 = const()[name = tensor<string, []>("transpose_25_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
|
| 61 |
+
tensor<fp16, [1, 8, 64, 1500]> transpose_57 = transpose(perm = transpose_25_perm_0, x = k_3_cast_fp16)[name = tensor<string, []>("transpose_57")];
|
| 62 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_58 = transpose(perm = transpose_24_perm_0, x = q_3_cast_fp16)[name = tensor<string, []>("transpose_58")];
|
| 63 |
+
tensor<fp16, [1, 8, 1500, 1500]> qk_1_cast_fp16 = matmul(transpose_x = qk_1_transpose_x_0, transpose_y = qk_1_transpose_y_0, x = transpose_58, y = transpose_57)[name = tensor<string, []>("qk_1_cast_fp16")];
|
| 64 |
+
tensor<fp16, [1, 8, 1500, 1500]> var_135_cast_fp16 = softmax(axis = var_70, x = qk_1_cast_fp16)[name = tensor<string, []>("op_135_cast_fp16")];
|
| 65 |
+
tensor<bool, []> var_137_transpose_x_0 = const()[name = tensor<string, []>("op_137_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 66 |
+
tensor<bool, []> var_137_transpose_y_0 = const()[name = tensor<string, []>("op_137_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 67 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_59 = transpose(perm = var_131, x = var_130_cast_fp16)[name = tensor<string, []>("transpose_59")];
|
| 68 |
+
tensor<fp16, [1, 8, 1500, 64]> var_137_cast_fp16 = matmul(transpose_x = var_137_transpose_x_0, transpose_y = var_137_transpose_y_0, x = var_135_cast_fp16, y = transpose_59)[name = tensor<string, []>("op_137_cast_fp16")];
|
| 69 |
+
tensor<int32, [4]> var_138 = const()[name = tensor<string, []>("op_138"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 70 |
+
tensor<int32, [3]> concat_0 = const()[name = tensor<string, []>("concat_0"), val = tensor<int32, [3]>([1, 1500, 512])];
|
| 71 |
+
tensor<fp16, [1, 1500, 8, 64]> transpose_56 = transpose(perm = var_138, x = var_137_cast_fp16)[name = tensor<string, []>("transpose_56")];
|
| 72 |
+
tensor<fp16, [1, 1500, 512]> x_11_cast_fp16 = reshape(shape = concat_0, x = transpose_56)[name = tensor<string, []>("x_11_cast_fp16")];
|
| 73 |
+
tensor<fp16, [512, 512]> var_143_to_fp16 = const()[name = tensor<string, []>("op_143_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(4935552)))];
|
| 74 |
+
tensor<fp16, [512]> var_144_to_fp16 = const()[name = tensor<string, []>("op_144_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(5459904)))];
|
| 75 |
+
tensor<fp16, [1, 1500, 512]> linear_3_cast_fp16 = linear(bias = var_144_to_fp16, weight = var_143_to_fp16, x = x_11_cast_fp16)[name = tensor<string, []>("linear_3_cast_fp16")];
|
| 76 |
+
tensor<fp16, [1, 1500, 512]> x_13_cast_fp16 = add(x = var_57_cast_fp16, y = linear_3_cast_fp16)[name = tensor<string, []>("x_13_cast_fp16")];
|
| 77 |
+
tensor<int32, [1]> var_151_axes_0 = const()[name = tensor<string, []>("op_151_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 78 |
+
tensor<fp16, [512]> blocks_0_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_0_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(5460992)))];
|
| 79 |
+
tensor<fp16, [512]> blocks_0_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_0_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(5462080)))];
|
| 80 |
+
tensor<fp16, [1, 1500, 512]> var_151_cast_fp16 = layer_norm(axes = var_151_axes_0, beta = blocks_0_mlp_ln_bias_to_fp16, epsilon = var_76_to_fp16, gamma = blocks_0_mlp_ln_weight_to_fp16, x = x_13_cast_fp16)[name = tensor<string, []>("op_151_cast_fp16")];
|
| 81 |
+
tensor<fp16, [2048, 512]> var_160_to_fp16 = const()[name = tensor<string, []>("op_160_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(5463168)))];
|
| 82 |
+
tensor<fp16, [2048]> var_161_to_fp16 = const()[name = tensor<string, []>("op_161_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(7560384)))];
|
| 83 |
+
tensor<fp16, [1, 1500, 2048]> linear_4_cast_fp16 = linear(bias = var_161_to_fp16, weight = var_160_to_fp16, x = var_151_cast_fp16)[name = tensor<string, []>("linear_4_cast_fp16")];
|
| 84 |
+
tensor<string, []> x_17_mode_0 = const()[name = tensor<string, []>("x_17_mode_0"), val = tensor<string, []>("EXACT")];
|
| 85 |
+
tensor<fp16, [1, 1500, 2048]> x_17_cast_fp16 = gelu(mode = x_17_mode_0, x = linear_4_cast_fp16)[name = tensor<string, []>("x_17_cast_fp16")];
|
| 86 |
+
tensor<fp16, [512, 2048]> var_166_to_fp16 = const()[name = tensor<string, []>("op_166_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(7564544)))];
|
| 87 |
+
tensor<fp16, [512]> var_167_to_fp16 = const()[name = tensor<string, []>("op_167_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(9661760)))];
|
| 88 |
+
tensor<fp16, [1, 1500, 512]> linear_5_cast_fp16 = linear(bias = var_167_to_fp16, weight = var_166_to_fp16, x = x_17_cast_fp16)[name = tensor<string, []>("linear_5_cast_fp16")];
|
| 89 |
+
tensor<fp16, [1, 1500, 512]> x_19_cast_fp16 = add(x = x_13_cast_fp16, y = linear_5_cast_fp16)[name = tensor<string, []>("x_19_cast_fp16")];
|
| 90 |
+
tensor<int32, []> var_177 = const()[name = tensor<string, []>("op_177"), val = tensor<int32, []>(-1)];
|
| 91 |
+
tensor<int32, [1]> var_194_axes_0 = const()[name = tensor<string, []>("op_194_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 92 |
+
tensor<fp16, [512]> blocks_1_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_1_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(9662848)))];
|
| 93 |
+
tensor<fp16, [512]> blocks_1_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_1_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(9663936)))];
|
| 94 |
+
tensor<fp16, []> var_183_to_fp16 = const()[name = tensor<string, []>("op_183_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
|
| 95 |
+
tensor<fp16, [1, 1500, 512]> var_194_cast_fp16 = layer_norm(axes = var_194_axes_0, beta = blocks_1_attn_ln_bias_to_fp16, epsilon = var_183_to_fp16, gamma = blocks_1_attn_ln_weight_to_fp16, x = x_19_cast_fp16)[name = tensor<string, []>("op_194_cast_fp16")];
|
| 96 |
+
tensor<fp16, [512, 512]> var_205_to_fp16 = const()[name = tensor<string, []>("op_205_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(9665024)))];
|
| 97 |
+
tensor<fp16, [512]> var_206_to_fp16 = const()[name = tensor<string, []>("op_206_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(10189376)))];
|
| 98 |
+
tensor<fp16, [1, 1500, 512]> linear_6_cast_fp16 = linear(bias = var_206_to_fp16, weight = var_205_to_fp16, x = var_194_cast_fp16)[name = tensor<string, []>("linear_6_cast_fp16")];
|
| 99 |
+
tensor<fp16, [512, 512]> var_209_to_fp16 = const()[name = tensor<string, []>("op_209_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(10190464)))];
|
| 100 |
+
tensor<fp16, [1, 1500, 512]> linear_7_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_209_to_fp16, x = var_194_cast_fp16)[name = tensor<string, []>("linear_7_cast_fp16")];
|
| 101 |
+
tensor<fp16, [512, 512]> var_213_to_fp16 = const()[name = tensor<string, []>("op_213_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(10714816)))];
|
| 102 |
+
tensor<fp16, [512]> var_214_to_fp16 = const()[name = tensor<string, []>("op_214_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11239168)))];
|
| 103 |
+
tensor<fp16, [1, 1500, 512]> linear_8_cast_fp16 = linear(bias = var_214_to_fp16, weight = var_213_to_fp16, x = var_194_cast_fp16)[name = tensor<string, []>("linear_8_cast_fp16")];
|
| 104 |
+
tensor<int32, [4]> var_222 = const()[name = tensor<string, []>("op_222"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 105 |
+
tensor<fp16, [1, 1500, 8, 64]> var_223_cast_fp16 = reshape(shape = var_222, x = linear_6_cast_fp16)[name = tensor<string, []>("op_223_cast_fp16")];
|
| 106 |
+
tensor<fp16, [1, 1, 1, 1]> const_44_to_fp16 = const()[name = tensor<string, []>("const_44_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 107 |
+
tensor<fp16, [1, 1500, 8, 64]> q_7_cast_fp16 = mul(x = var_223_cast_fp16, y = const_44_to_fp16)[name = tensor<string, []>("q_7_cast_fp16")];
|
| 108 |
+
tensor<int32, [4]> var_229 = const()[name = tensor<string, []>("op_229"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 109 |
+
tensor<fp16, [1, 1500, 8, 64]> var_230_cast_fp16 = reshape(shape = var_229, x = linear_7_cast_fp16)[name = tensor<string, []>("op_230_cast_fp16")];
|
| 110 |
+
tensor<fp16, [1, 1, 1, 1]> const_45_to_fp16 = const()[name = tensor<string, []>("const_45_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 111 |
+
tensor<fp16, [1, 1500, 8, 64]> k_7_cast_fp16 = mul(x = var_230_cast_fp16, y = const_45_to_fp16)[name = tensor<string, []>("k_7_cast_fp16")];
|
| 112 |
+
tensor<int32, [4]> var_236 = const()[name = tensor<string, []>("op_236"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 113 |
+
tensor<fp16, [1, 1500, 8, 64]> var_237_cast_fp16 = reshape(shape = var_236, x = linear_8_cast_fp16)[name = tensor<string, []>("op_237_cast_fp16")];
|
| 114 |
+
tensor<int32, [4]> var_238 = const()[name = tensor<string, []>("op_238"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 115 |
+
tensor<bool, []> qk_3_transpose_x_0 = const()[name = tensor<string, []>("qk_3_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 116 |
+
tensor<bool, []> qk_3_transpose_y_0 = const()[name = tensor<string, []>("qk_3_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 117 |
+
tensor<int32, [4]> transpose_26_perm_0 = const()[name = tensor<string, []>("transpose_26_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 118 |
+
tensor<int32, [4]> transpose_27_perm_0 = const()[name = tensor<string, []>("transpose_27_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
|
| 119 |
+
tensor<fp16, [1, 8, 64, 1500]> transpose_53 = transpose(perm = transpose_27_perm_0, x = k_7_cast_fp16)[name = tensor<string, []>("transpose_53")];
|
| 120 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_54 = transpose(perm = transpose_26_perm_0, x = q_7_cast_fp16)[name = tensor<string, []>("transpose_54")];
|
| 121 |
+
tensor<fp16, [1, 8, 1500, 1500]> qk_3_cast_fp16 = matmul(transpose_x = qk_3_transpose_x_0, transpose_y = qk_3_transpose_y_0, x = transpose_54, y = transpose_53)[name = tensor<string, []>("qk_3_cast_fp16")];
|
| 122 |
+
tensor<fp16, [1, 8, 1500, 1500]> var_242_cast_fp16 = softmax(axis = var_177, x = qk_3_cast_fp16)[name = tensor<string, []>("op_242_cast_fp16")];
|
| 123 |
+
tensor<bool, []> var_244_transpose_x_0 = const()[name = tensor<string, []>("op_244_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 124 |
+
tensor<bool, []> var_244_transpose_y_0 = const()[name = tensor<string, []>("op_244_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 125 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_55 = transpose(perm = var_238, x = var_237_cast_fp16)[name = tensor<string, []>("transpose_55")];
|
| 126 |
+
tensor<fp16, [1, 8, 1500, 64]> var_244_cast_fp16 = matmul(transpose_x = var_244_transpose_x_0, transpose_y = var_244_transpose_y_0, x = var_242_cast_fp16, y = transpose_55)[name = tensor<string, []>("op_244_cast_fp16")];
|
| 127 |
+
tensor<int32, [4]> var_245 = const()[name = tensor<string, []>("op_245"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 128 |
+
tensor<int32, [3]> concat_1 = const()[name = tensor<string, []>("concat_1"), val = tensor<int32, [3]>([1, 1500, 512])];
|
| 129 |
+
tensor<fp16, [1, 1500, 8, 64]> transpose_52 = transpose(perm = var_245, x = var_244_cast_fp16)[name = tensor<string, []>("transpose_52")];
|
| 130 |
+
tensor<fp16, [1, 1500, 512]> x_23_cast_fp16 = reshape(shape = concat_1, x = transpose_52)[name = tensor<string, []>("x_23_cast_fp16")];
|
| 131 |
+
tensor<fp16, [512, 512]> var_250_to_fp16 = const()[name = tensor<string, []>("op_250_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11240256)))];
|
| 132 |
+
tensor<fp16, [512]> var_251_to_fp16 = const()[name = tensor<string, []>("op_251_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11764608)))];
|
| 133 |
+
tensor<fp16, [1, 1500, 512]> linear_9_cast_fp16 = linear(bias = var_251_to_fp16, weight = var_250_to_fp16, x = x_23_cast_fp16)[name = tensor<string, []>("linear_9_cast_fp16")];
|
| 134 |
+
tensor<fp16, [1, 1500, 512]> x_25_cast_fp16 = add(x = x_19_cast_fp16, y = linear_9_cast_fp16)[name = tensor<string, []>("x_25_cast_fp16")];
|
| 135 |
+
tensor<int32, [1]> var_258_axes_0 = const()[name = tensor<string, []>("op_258_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 136 |
+
tensor<fp16, [512]> blocks_1_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_1_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11765696)))];
|
| 137 |
+
tensor<fp16, [512]> blocks_1_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_1_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11766784)))];
|
| 138 |
+
tensor<fp16, [1, 1500, 512]> var_258_cast_fp16 = layer_norm(axes = var_258_axes_0, beta = blocks_1_mlp_ln_bias_to_fp16, epsilon = var_183_to_fp16, gamma = blocks_1_mlp_ln_weight_to_fp16, x = x_25_cast_fp16)[name = tensor<string, []>("op_258_cast_fp16")];
|
| 139 |
+
tensor<fp16, [2048, 512]> var_267_to_fp16 = const()[name = tensor<string, []>("op_267_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(11767872)))];
|
| 140 |
+
tensor<fp16, [2048]> var_268_to_fp16 = const()[name = tensor<string, []>("op_268_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(13865088)))];
|
| 141 |
+
tensor<fp16, [1, 1500, 2048]> linear_10_cast_fp16 = linear(bias = var_268_to_fp16, weight = var_267_to_fp16, x = var_258_cast_fp16)[name = tensor<string, []>("linear_10_cast_fp16")];
|
| 142 |
+
tensor<string, []> x_29_mode_0 = const()[name = tensor<string, []>("x_29_mode_0"), val = tensor<string, []>("EXACT")];
|
| 143 |
+
tensor<fp16, [1, 1500, 2048]> x_29_cast_fp16 = gelu(mode = x_29_mode_0, x = linear_10_cast_fp16)[name = tensor<string, []>("x_29_cast_fp16")];
|
| 144 |
+
tensor<fp16, [512, 2048]> var_273_to_fp16 = const()[name = tensor<string, []>("op_273_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(13869248)))];
|
| 145 |
+
tensor<fp16, [512]> var_274_to_fp16 = const()[name = tensor<string, []>("op_274_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(15966464)))];
|
| 146 |
+
tensor<fp16, [1, 1500, 512]> linear_11_cast_fp16 = linear(bias = var_274_to_fp16, weight = var_273_to_fp16, x = x_29_cast_fp16)[name = tensor<string, []>("linear_11_cast_fp16")];
|
| 147 |
+
tensor<fp16, [1, 1500, 512]> x_31_cast_fp16 = add(x = x_25_cast_fp16, y = linear_11_cast_fp16)[name = tensor<string, []>("x_31_cast_fp16")];
|
| 148 |
+
tensor<int32, []> var_284 = const()[name = tensor<string, []>("op_284"), val = tensor<int32, []>(-1)];
|
| 149 |
+
tensor<int32, [1]> var_301_axes_0 = const()[name = tensor<string, []>("op_301_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 150 |
+
tensor<fp16, [512]> blocks_2_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_2_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(15967552)))];
|
| 151 |
+
tensor<fp16, [512]> blocks_2_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_2_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(15968640)))];
|
| 152 |
+
tensor<fp16, []> var_290_to_fp16 = const()[name = tensor<string, []>("op_290_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
|
| 153 |
+
tensor<fp16, [1, 1500, 512]> var_301_cast_fp16 = layer_norm(axes = var_301_axes_0, beta = blocks_2_attn_ln_bias_to_fp16, epsilon = var_290_to_fp16, gamma = blocks_2_attn_ln_weight_to_fp16, x = x_31_cast_fp16)[name = tensor<string, []>("op_301_cast_fp16")];
|
| 154 |
+
tensor<fp16, [512, 512]> var_312_to_fp16 = const()[name = tensor<string, []>("op_312_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(15969728)))];
|
| 155 |
+
tensor<fp16, [512]> var_313_to_fp16 = const()[name = tensor<string, []>("op_313_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(16494080)))];
|
| 156 |
+
tensor<fp16, [1, 1500, 512]> linear_12_cast_fp16 = linear(bias = var_313_to_fp16, weight = var_312_to_fp16, x = var_301_cast_fp16)[name = tensor<string, []>("linear_12_cast_fp16")];
|
| 157 |
+
tensor<fp16, [512, 512]> var_316_to_fp16 = const()[name = tensor<string, []>("op_316_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(16495168)))];
|
| 158 |
+
tensor<fp16, [1, 1500, 512]> linear_13_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_316_to_fp16, x = var_301_cast_fp16)[name = tensor<string, []>("linear_13_cast_fp16")];
|
| 159 |
+
tensor<fp16, [512, 512]> var_320_to_fp16 = const()[name = tensor<string, []>("op_320_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(17019520)))];
|
| 160 |
+
tensor<fp16, [512]> var_321_to_fp16 = const()[name = tensor<string, []>("op_321_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(17543872)))];
|
| 161 |
+
tensor<fp16, [1, 1500, 512]> linear_14_cast_fp16 = linear(bias = var_321_to_fp16, weight = var_320_to_fp16, x = var_301_cast_fp16)[name = tensor<string, []>("linear_14_cast_fp16")];
|
| 162 |
+
tensor<int32, [4]> var_329 = const()[name = tensor<string, []>("op_329"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 163 |
+
tensor<fp16, [1, 1500, 8, 64]> var_330_cast_fp16 = reshape(shape = var_329, x = linear_12_cast_fp16)[name = tensor<string, []>("op_330_cast_fp16")];
|
| 164 |
+
tensor<fp16, [1, 1, 1, 1]> const_46_to_fp16 = const()[name = tensor<string, []>("const_46_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 165 |
+
tensor<fp16, [1, 1500, 8, 64]> q_11_cast_fp16 = mul(x = var_330_cast_fp16, y = const_46_to_fp16)[name = tensor<string, []>("q_11_cast_fp16")];
|
| 166 |
+
tensor<int32, [4]> var_336 = const()[name = tensor<string, []>("op_336"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 167 |
+
tensor<fp16, [1, 1500, 8, 64]> var_337_cast_fp16 = reshape(shape = var_336, x = linear_13_cast_fp16)[name = tensor<string, []>("op_337_cast_fp16")];
|
| 168 |
+
tensor<fp16, [1, 1, 1, 1]> const_47_to_fp16 = const()[name = tensor<string, []>("const_47_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 169 |
+
tensor<fp16, [1, 1500, 8, 64]> k_11_cast_fp16 = mul(x = var_337_cast_fp16, y = const_47_to_fp16)[name = tensor<string, []>("k_11_cast_fp16")];
|
| 170 |
+
tensor<int32, [4]> var_343 = const()[name = tensor<string, []>("op_343"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 171 |
+
tensor<fp16, [1, 1500, 8, 64]> var_344_cast_fp16 = reshape(shape = var_343, x = linear_14_cast_fp16)[name = tensor<string, []>("op_344_cast_fp16")];
|
| 172 |
+
tensor<int32, [4]> var_345 = const()[name = tensor<string, []>("op_345"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 173 |
+
tensor<bool, []> qk_5_transpose_x_0 = const()[name = tensor<string, []>("qk_5_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 174 |
+
tensor<bool, []> qk_5_transpose_y_0 = const()[name = tensor<string, []>("qk_5_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 175 |
+
tensor<int32, [4]> transpose_28_perm_0 = const()[name = tensor<string, []>("transpose_28_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 176 |
+
tensor<int32, [4]> transpose_29_perm_0 = const()[name = tensor<string, []>("transpose_29_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
|
| 177 |
+
tensor<fp16, [1, 8, 64, 1500]> transpose_49 = transpose(perm = transpose_29_perm_0, x = k_11_cast_fp16)[name = tensor<string, []>("transpose_49")];
|
| 178 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_50 = transpose(perm = transpose_28_perm_0, x = q_11_cast_fp16)[name = tensor<string, []>("transpose_50")];
|
| 179 |
+
tensor<fp16, [1, 8, 1500, 1500]> qk_5_cast_fp16 = matmul(transpose_x = qk_5_transpose_x_0, transpose_y = qk_5_transpose_y_0, x = transpose_50, y = transpose_49)[name = tensor<string, []>("qk_5_cast_fp16")];
|
| 180 |
+
tensor<fp16, [1, 8, 1500, 1500]> var_349_cast_fp16 = softmax(axis = var_284, x = qk_5_cast_fp16)[name = tensor<string, []>("op_349_cast_fp16")];
|
| 181 |
+
tensor<bool, []> var_351_transpose_x_0 = const()[name = tensor<string, []>("op_351_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 182 |
+
tensor<bool, []> var_351_transpose_y_0 = const()[name = tensor<string, []>("op_351_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 183 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_51 = transpose(perm = var_345, x = var_344_cast_fp16)[name = tensor<string, []>("transpose_51")];
|
| 184 |
+
tensor<fp16, [1, 8, 1500, 64]> var_351_cast_fp16 = matmul(transpose_x = var_351_transpose_x_0, transpose_y = var_351_transpose_y_0, x = var_349_cast_fp16, y = transpose_51)[name = tensor<string, []>("op_351_cast_fp16")];
|
| 185 |
+
tensor<int32, [4]> var_352 = const()[name = tensor<string, []>("op_352"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 186 |
+
tensor<int32, [3]> concat_2 = const()[name = tensor<string, []>("concat_2"), val = tensor<int32, [3]>([1, 1500, 512])];
|
| 187 |
+
tensor<fp16, [1, 1500, 8, 64]> transpose_48 = transpose(perm = var_352, x = var_351_cast_fp16)[name = tensor<string, []>("transpose_48")];
|
| 188 |
+
tensor<fp16, [1, 1500, 512]> x_35_cast_fp16 = reshape(shape = concat_2, x = transpose_48)[name = tensor<string, []>("x_35_cast_fp16")];
|
| 189 |
+
tensor<fp16, [512, 512]> var_357_to_fp16 = const()[name = tensor<string, []>("op_357_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(17544960)))];
|
| 190 |
+
tensor<fp16, [512]> var_358_to_fp16 = const()[name = tensor<string, []>("op_358_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(18069312)))];
|
| 191 |
+
tensor<fp16, [1, 1500, 512]> linear_15_cast_fp16 = linear(bias = var_358_to_fp16, weight = var_357_to_fp16, x = x_35_cast_fp16)[name = tensor<string, []>("linear_15_cast_fp16")];
|
| 192 |
+
tensor<fp16, [1, 1500, 512]> x_37_cast_fp16 = add(x = x_31_cast_fp16, y = linear_15_cast_fp16)[name = tensor<string, []>("x_37_cast_fp16")];
|
| 193 |
+
tensor<int32, [1]> var_365_axes_0 = const()[name = tensor<string, []>("op_365_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 194 |
+
tensor<fp16, [512]> blocks_2_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_2_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(18070400)))];
|
| 195 |
+
tensor<fp16, [512]> blocks_2_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_2_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(18071488)))];
|
| 196 |
+
tensor<fp16, [1, 1500, 512]> var_365_cast_fp16 = layer_norm(axes = var_365_axes_0, beta = blocks_2_mlp_ln_bias_to_fp16, epsilon = var_290_to_fp16, gamma = blocks_2_mlp_ln_weight_to_fp16, x = x_37_cast_fp16)[name = tensor<string, []>("op_365_cast_fp16")];
|
| 197 |
+
tensor<fp16, [2048, 512]> var_374_to_fp16 = const()[name = tensor<string, []>("op_374_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(18072576)))];
|
| 198 |
+
tensor<fp16, [2048]> var_375_to_fp16 = const()[name = tensor<string, []>("op_375_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(20169792)))];
|
| 199 |
+
tensor<fp16, [1, 1500, 2048]> linear_16_cast_fp16 = linear(bias = var_375_to_fp16, weight = var_374_to_fp16, x = var_365_cast_fp16)[name = tensor<string, []>("linear_16_cast_fp16")];
|
| 200 |
+
tensor<string, []> x_41_mode_0 = const()[name = tensor<string, []>("x_41_mode_0"), val = tensor<string, []>("EXACT")];
|
| 201 |
+
tensor<fp16, [1, 1500, 2048]> x_41_cast_fp16 = gelu(mode = x_41_mode_0, x = linear_16_cast_fp16)[name = tensor<string, []>("x_41_cast_fp16")];
|
| 202 |
+
tensor<fp16, [512, 2048]> var_380_to_fp16 = const()[name = tensor<string, []>("op_380_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(20173952)))];
|
| 203 |
+
tensor<fp16, [512]> var_381_to_fp16 = const()[name = tensor<string, []>("op_381_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22271168)))];
|
| 204 |
+
tensor<fp16, [1, 1500, 512]> linear_17_cast_fp16 = linear(bias = var_381_to_fp16, weight = var_380_to_fp16, x = x_41_cast_fp16)[name = tensor<string, []>("linear_17_cast_fp16")];
|
| 205 |
+
tensor<fp16, [1, 1500, 512]> x_43_cast_fp16 = add(x = x_37_cast_fp16, y = linear_17_cast_fp16)[name = tensor<string, []>("x_43_cast_fp16")];
|
| 206 |
+
tensor<int32, []> var_391 = const()[name = tensor<string, []>("op_391"), val = tensor<int32, []>(-1)];
|
| 207 |
+
tensor<int32, [1]> var_408_axes_0 = const()[name = tensor<string, []>("op_408_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 208 |
+
tensor<fp16, [512]> blocks_3_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_3_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22272256)))];
|
| 209 |
+
tensor<fp16, [512]> blocks_3_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_3_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22273344)))];
|
| 210 |
+
tensor<fp16, []> var_397_to_fp16 = const()[name = tensor<string, []>("op_397_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
|
| 211 |
+
tensor<fp16, [1, 1500, 512]> var_408_cast_fp16 = layer_norm(axes = var_408_axes_0, beta = blocks_3_attn_ln_bias_to_fp16, epsilon = var_397_to_fp16, gamma = blocks_3_attn_ln_weight_to_fp16, x = x_43_cast_fp16)[name = tensor<string, []>("op_408_cast_fp16")];
|
| 212 |
+
tensor<fp16, [512, 512]> var_419_to_fp16 = const()[name = tensor<string, []>("op_419_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22274432)))];
|
| 213 |
+
tensor<fp16, [512]> var_420_to_fp16 = const()[name = tensor<string, []>("op_420_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22798784)))];
|
| 214 |
+
tensor<fp16, [1, 1500, 512]> linear_18_cast_fp16 = linear(bias = var_420_to_fp16, weight = var_419_to_fp16, x = var_408_cast_fp16)[name = tensor<string, []>("linear_18_cast_fp16")];
|
| 215 |
+
tensor<fp16, [512, 512]> var_423_to_fp16 = const()[name = tensor<string, []>("op_423_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(22799872)))];
|
| 216 |
+
tensor<fp16, [1, 1500, 512]> linear_19_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_423_to_fp16, x = var_408_cast_fp16)[name = tensor<string, []>("linear_19_cast_fp16")];
|
| 217 |
+
tensor<fp16, [512, 512]> var_427_to_fp16 = const()[name = tensor<string, []>("op_427_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(23324224)))];
|
| 218 |
+
tensor<fp16, [512]> var_428_to_fp16 = const()[name = tensor<string, []>("op_428_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(23848576)))];
|
| 219 |
+
tensor<fp16, [1, 1500, 512]> linear_20_cast_fp16 = linear(bias = var_428_to_fp16, weight = var_427_to_fp16, x = var_408_cast_fp16)[name = tensor<string, []>("linear_20_cast_fp16")];
|
| 220 |
+
tensor<int32, [4]> var_436 = const()[name = tensor<string, []>("op_436"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 221 |
+
tensor<fp16, [1, 1500, 8, 64]> var_437_cast_fp16 = reshape(shape = var_436, x = linear_18_cast_fp16)[name = tensor<string, []>("op_437_cast_fp16")];
|
| 222 |
+
tensor<fp16, [1, 1, 1, 1]> const_48_to_fp16 = const()[name = tensor<string, []>("const_48_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 223 |
+
tensor<fp16, [1, 1500, 8, 64]> q_15_cast_fp16 = mul(x = var_437_cast_fp16, y = const_48_to_fp16)[name = tensor<string, []>("q_15_cast_fp16")];
|
| 224 |
+
tensor<int32, [4]> var_443 = const()[name = tensor<string, []>("op_443"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 225 |
+
tensor<fp16, [1, 1500, 8, 64]> var_444_cast_fp16 = reshape(shape = var_443, x = linear_19_cast_fp16)[name = tensor<string, []>("op_444_cast_fp16")];
|
| 226 |
+
tensor<fp16, [1, 1, 1, 1]> const_49_to_fp16 = const()[name = tensor<string, []>("const_49_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 227 |
+
tensor<fp16, [1, 1500, 8, 64]> k_15_cast_fp16 = mul(x = var_444_cast_fp16, y = const_49_to_fp16)[name = tensor<string, []>("k_15_cast_fp16")];
|
| 228 |
+
tensor<int32, [4]> var_450 = const()[name = tensor<string, []>("op_450"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 229 |
+
tensor<fp16, [1, 1500, 8, 64]> var_451_cast_fp16 = reshape(shape = var_450, x = linear_20_cast_fp16)[name = tensor<string, []>("op_451_cast_fp16")];
|
| 230 |
+
tensor<int32, [4]> var_452 = const()[name = tensor<string, []>("op_452"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 231 |
+
tensor<bool, []> qk_7_transpose_x_0 = const()[name = tensor<string, []>("qk_7_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 232 |
+
tensor<bool, []> qk_7_transpose_y_0 = const()[name = tensor<string, []>("qk_7_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 233 |
+
tensor<int32, [4]> transpose_30_perm_0 = const()[name = tensor<string, []>("transpose_30_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 234 |
+
tensor<int32, [4]> transpose_31_perm_0 = const()[name = tensor<string, []>("transpose_31_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
|
| 235 |
+
tensor<fp16, [1, 8, 64, 1500]> transpose_45 = transpose(perm = transpose_31_perm_0, x = k_15_cast_fp16)[name = tensor<string, []>("transpose_45")];
|
| 236 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_46 = transpose(perm = transpose_30_perm_0, x = q_15_cast_fp16)[name = tensor<string, []>("transpose_46")];
|
| 237 |
+
tensor<fp16, [1, 8, 1500, 1500]> qk_7_cast_fp16 = matmul(transpose_x = qk_7_transpose_x_0, transpose_y = qk_7_transpose_y_0, x = transpose_46, y = transpose_45)[name = tensor<string, []>("qk_7_cast_fp16")];
|
| 238 |
+
tensor<fp16, [1, 8, 1500, 1500]> var_456_cast_fp16 = softmax(axis = var_391, x = qk_7_cast_fp16)[name = tensor<string, []>("op_456_cast_fp16")];
|
| 239 |
+
tensor<bool, []> var_458_transpose_x_0 = const()[name = tensor<string, []>("op_458_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 240 |
+
tensor<bool, []> var_458_transpose_y_0 = const()[name = tensor<string, []>("op_458_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 241 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_47 = transpose(perm = var_452, x = var_451_cast_fp16)[name = tensor<string, []>("transpose_47")];
|
| 242 |
+
tensor<fp16, [1, 8, 1500, 64]> var_458_cast_fp16 = matmul(transpose_x = var_458_transpose_x_0, transpose_y = var_458_transpose_y_0, x = var_456_cast_fp16, y = transpose_47)[name = tensor<string, []>("op_458_cast_fp16")];
|
| 243 |
+
tensor<int32, [4]> var_459 = const()[name = tensor<string, []>("op_459"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 244 |
+
tensor<int32, [3]> concat_3 = const()[name = tensor<string, []>("concat_3"), val = tensor<int32, [3]>([1, 1500, 512])];
|
| 245 |
+
tensor<fp16, [1, 1500, 8, 64]> transpose_44 = transpose(perm = var_459, x = var_458_cast_fp16)[name = tensor<string, []>("transpose_44")];
|
| 246 |
+
tensor<fp16, [1, 1500, 512]> x_47_cast_fp16 = reshape(shape = concat_3, x = transpose_44)[name = tensor<string, []>("x_47_cast_fp16")];
|
| 247 |
+
tensor<fp16, [512, 512]> var_464_to_fp16 = const()[name = tensor<string, []>("op_464_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(23849664)))];
|
| 248 |
+
tensor<fp16, [512]> var_465_to_fp16 = const()[name = tensor<string, []>("op_465_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(24374016)))];
|
| 249 |
+
tensor<fp16, [1, 1500, 512]> linear_21_cast_fp16 = linear(bias = var_465_to_fp16, weight = var_464_to_fp16, x = x_47_cast_fp16)[name = tensor<string, []>("linear_21_cast_fp16")];
|
| 250 |
+
tensor<fp16, [1, 1500, 512]> x_49_cast_fp16 = add(x = x_43_cast_fp16, y = linear_21_cast_fp16)[name = tensor<string, []>("x_49_cast_fp16")];
|
| 251 |
+
tensor<int32, [1]> var_472_axes_0 = const()[name = tensor<string, []>("op_472_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 252 |
+
tensor<fp16, [512]> blocks_3_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_3_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(24375104)))];
|
| 253 |
+
tensor<fp16, [512]> blocks_3_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_3_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(24376192)))];
|
| 254 |
+
tensor<fp16, [1, 1500, 512]> var_472_cast_fp16 = layer_norm(axes = var_472_axes_0, beta = blocks_3_mlp_ln_bias_to_fp16, epsilon = var_397_to_fp16, gamma = blocks_3_mlp_ln_weight_to_fp16, x = x_49_cast_fp16)[name = tensor<string, []>("op_472_cast_fp16")];
|
| 255 |
+
tensor<fp16, [2048, 512]> var_481_to_fp16 = const()[name = tensor<string, []>("op_481_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(24377280)))];
|
| 256 |
+
tensor<fp16, [2048]> var_482_to_fp16 = const()[name = tensor<string, []>("op_482_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(26474496)))];
|
| 257 |
+
tensor<fp16, [1, 1500, 2048]> linear_22_cast_fp16 = linear(bias = var_482_to_fp16, weight = var_481_to_fp16, x = var_472_cast_fp16)[name = tensor<string, []>("linear_22_cast_fp16")];
|
| 258 |
+
tensor<string, []> x_53_mode_0 = const()[name = tensor<string, []>("x_53_mode_0"), val = tensor<string, []>("EXACT")];
|
| 259 |
+
tensor<fp16, [1, 1500, 2048]> x_53_cast_fp16 = gelu(mode = x_53_mode_0, x = linear_22_cast_fp16)[name = tensor<string, []>("x_53_cast_fp16")];
|
| 260 |
+
tensor<fp16, [512, 2048]> var_487_to_fp16 = const()[name = tensor<string, []>("op_487_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(26478656)))];
|
| 261 |
+
tensor<fp16, [512]> var_488_to_fp16 = const()[name = tensor<string, []>("op_488_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(28575872)))];
|
| 262 |
+
tensor<fp16, [1, 1500, 512]> linear_23_cast_fp16 = linear(bias = var_488_to_fp16, weight = var_487_to_fp16, x = x_53_cast_fp16)[name = tensor<string, []>("linear_23_cast_fp16")];
|
| 263 |
+
tensor<fp16, [1, 1500, 512]> x_55_cast_fp16 = add(x = x_49_cast_fp16, y = linear_23_cast_fp16)[name = tensor<string, []>("x_55_cast_fp16")];
|
| 264 |
+
tensor<int32, []> var_498 = const()[name = tensor<string, []>("op_498"), val = tensor<int32, []>(-1)];
|
| 265 |
+
tensor<int32, [1]> var_515_axes_0 = const()[name = tensor<string, []>("op_515_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 266 |
+
tensor<fp16, [512]> blocks_4_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_4_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(28576960)))];
|
| 267 |
+
tensor<fp16, [512]> blocks_4_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_4_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(28578048)))];
|
| 268 |
+
tensor<fp16, []> var_504_to_fp16 = const()[name = tensor<string, []>("op_504_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
|
| 269 |
+
tensor<fp16, [1, 1500, 512]> var_515_cast_fp16 = layer_norm(axes = var_515_axes_0, beta = blocks_4_attn_ln_bias_to_fp16, epsilon = var_504_to_fp16, gamma = blocks_4_attn_ln_weight_to_fp16, x = x_55_cast_fp16)[name = tensor<string, []>("op_515_cast_fp16")];
|
| 270 |
+
tensor<fp16, [512, 512]> var_526_to_fp16 = const()[name = tensor<string, []>("op_526_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(28579136)))];
|
| 271 |
+
tensor<fp16, [512]> var_527_to_fp16 = const()[name = tensor<string, []>("op_527_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(29103488)))];
|
| 272 |
+
tensor<fp16, [1, 1500, 512]> linear_24_cast_fp16 = linear(bias = var_527_to_fp16, weight = var_526_to_fp16, x = var_515_cast_fp16)[name = tensor<string, []>("linear_24_cast_fp16")];
|
| 273 |
+
tensor<fp16, [512, 512]> var_530_to_fp16 = const()[name = tensor<string, []>("op_530_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(29104576)))];
|
| 274 |
+
tensor<fp16, [1, 1500, 512]> linear_25_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_530_to_fp16, x = var_515_cast_fp16)[name = tensor<string, []>("linear_25_cast_fp16")];
|
| 275 |
+
tensor<fp16, [512, 512]> var_534_to_fp16 = const()[name = tensor<string, []>("op_534_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(29628928)))];
|
| 276 |
+
tensor<fp16, [512]> var_535_to_fp16 = const()[name = tensor<string, []>("op_535_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30153280)))];
|
| 277 |
+
tensor<fp16, [1, 1500, 512]> linear_26_cast_fp16 = linear(bias = var_535_to_fp16, weight = var_534_to_fp16, x = var_515_cast_fp16)[name = tensor<string, []>("linear_26_cast_fp16")];
|
| 278 |
+
tensor<int32, [4]> var_543 = const()[name = tensor<string, []>("op_543"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 279 |
+
tensor<fp16, [1, 1500, 8, 64]> var_544_cast_fp16 = reshape(shape = var_543, x = linear_24_cast_fp16)[name = tensor<string, []>("op_544_cast_fp16")];
|
| 280 |
+
tensor<fp16, [1, 1, 1, 1]> const_50_to_fp16 = const()[name = tensor<string, []>("const_50_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 281 |
+
tensor<fp16, [1, 1500, 8, 64]> q_19_cast_fp16 = mul(x = var_544_cast_fp16, y = const_50_to_fp16)[name = tensor<string, []>("q_19_cast_fp16")];
|
| 282 |
+
tensor<int32, [4]> var_550 = const()[name = tensor<string, []>("op_550"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 283 |
+
tensor<fp16, [1, 1500, 8, 64]> var_551_cast_fp16 = reshape(shape = var_550, x = linear_25_cast_fp16)[name = tensor<string, []>("op_551_cast_fp16")];
|
| 284 |
+
tensor<fp16, [1, 1, 1, 1]> const_51_to_fp16 = const()[name = tensor<string, []>("const_51_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 285 |
+
tensor<fp16, [1, 1500, 8, 64]> k_19_cast_fp16 = mul(x = var_551_cast_fp16, y = const_51_to_fp16)[name = tensor<string, []>("k_19_cast_fp16")];
|
| 286 |
+
tensor<int32, [4]> var_557 = const()[name = tensor<string, []>("op_557"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 287 |
+
tensor<fp16, [1, 1500, 8, 64]> var_558_cast_fp16 = reshape(shape = var_557, x = linear_26_cast_fp16)[name = tensor<string, []>("op_558_cast_fp16")];
|
| 288 |
+
tensor<int32, [4]> var_559 = const()[name = tensor<string, []>("op_559"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 289 |
+
tensor<bool, []> qk_9_transpose_x_0 = const()[name = tensor<string, []>("qk_9_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 290 |
+
tensor<bool, []> qk_9_transpose_y_0 = const()[name = tensor<string, []>("qk_9_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 291 |
+
tensor<int32, [4]> transpose_32_perm_0 = const()[name = tensor<string, []>("transpose_32_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 292 |
+
tensor<int32, [4]> transpose_33_perm_0 = const()[name = tensor<string, []>("transpose_33_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
|
| 293 |
+
tensor<fp16, [1, 8, 64, 1500]> transpose_41 = transpose(perm = transpose_33_perm_0, x = k_19_cast_fp16)[name = tensor<string, []>("transpose_41")];
|
| 294 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_42 = transpose(perm = transpose_32_perm_0, x = q_19_cast_fp16)[name = tensor<string, []>("transpose_42")];
|
| 295 |
+
tensor<fp16, [1, 8, 1500, 1500]> qk_9_cast_fp16 = matmul(transpose_x = qk_9_transpose_x_0, transpose_y = qk_9_transpose_y_0, x = transpose_42, y = transpose_41)[name = tensor<string, []>("qk_9_cast_fp16")];
|
| 296 |
+
tensor<fp16, [1, 8, 1500, 1500]> var_563_cast_fp16 = softmax(axis = var_498, x = qk_9_cast_fp16)[name = tensor<string, []>("op_563_cast_fp16")];
|
| 297 |
+
tensor<bool, []> var_565_transpose_x_0 = const()[name = tensor<string, []>("op_565_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 298 |
+
tensor<bool, []> var_565_transpose_y_0 = const()[name = tensor<string, []>("op_565_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 299 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_43 = transpose(perm = var_559, x = var_558_cast_fp16)[name = tensor<string, []>("transpose_43")];
|
| 300 |
+
tensor<fp16, [1, 8, 1500, 64]> var_565_cast_fp16 = matmul(transpose_x = var_565_transpose_x_0, transpose_y = var_565_transpose_y_0, x = var_563_cast_fp16, y = transpose_43)[name = tensor<string, []>("op_565_cast_fp16")];
|
| 301 |
+
tensor<int32, [4]> var_566 = const()[name = tensor<string, []>("op_566"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 302 |
+
tensor<int32, [3]> concat_4 = const()[name = tensor<string, []>("concat_4"), val = tensor<int32, [3]>([1, 1500, 512])];
|
| 303 |
+
tensor<fp16, [1, 1500, 8, 64]> transpose_40 = transpose(perm = var_566, x = var_565_cast_fp16)[name = tensor<string, []>("transpose_40")];
|
| 304 |
+
tensor<fp16, [1, 1500, 512]> x_59_cast_fp16 = reshape(shape = concat_4, x = transpose_40)[name = tensor<string, []>("x_59_cast_fp16")];
|
| 305 |
+
tensor<fp16, [512, 512]> var_571_to_fp16 = const()[name = tensor<string, []>("op_571_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30154368)))];
|
| 306 |
+
tensor<fp16, [512]> var_572_to_fp16 = const()[name = tensor<string, []>("op_572_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30678720)))];
|
| 307 |
+
tensor<fp16, [1, 1500, 512]> linear_27_cast_fp16 = linear(bias = var_572_to_fp16, weight = var_571_to_fp16, x = x_59_cast_fp16)[name = tensor<string, []>("linear_27_cast_fp16")];
|
| 308 |
+
tensor<fp16, [1, 1500, 512]> x_61_cast_fp16 = add(x = x_55_cast_fp16, y = linear_27_cast_fp16)[name = tensor<string, []>("x_61_cast_fp16")];
|
| 309 |
+
tensor<int32, [1]> var_579_axes_0 = const()[name = tensor<string, []>("op_579_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 310 |
+
tensor<fp16, [512]> blocks_4_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_4_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30679808)))];
|
| 311 |
+
tensor<fp16, [512]> blocks_4_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_4_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30680896)))];
|
| 312 |
+
tensor<fp16, [1, 1500, 512]> var_579_cast_fp16 = layer_norm(axes = var_579_axes_0, beta = blocks_4_mlp_ln_bias_to_fp16, epsilon = var_504_to_fp16, gamma = blocks_4_mlp_ln_weight_to_fp16, x = x_61_cast_fp16)[name = tensor<string, []>("op_579_cast_fp16")];
|
| 313 |
+
tensor<fp16, [2048, 512]> var_588_to_fp16 = const()[name = tensor<string, []>("op_588_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(30681984)))];
|
| 314 |
+
tensor<fp16, [2048]> var_589_to_fp16 = const()[name = tensor<string, []>("op_589_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(32779200)))];
|
| 315 |
+
tensor<fp16, [1, 1500, 2048]> linear_28_cast_fp16 = linear(bias = var_589_to_fp16, weight = var_588_to_fp16, x = var_579_cast_fp16)[name = tensor<string, []>("linear_28_cast_fp16")];
|
| 316 |
+
tensor<string, []> x_65_mode_0 = const()[name = tensor<string, []>("x_65_mode_0"), val = tensor<string, []>("EXACT")];
|
| 317 |
+
tensor<fp16, [1, 1500, 2048]> x_65_cast_fp16 = gelu(mode = x_65_mode_0, x = linear_28_cast_fp16)[name = tensor<string, []>("x_65_cast_fp16")];
|
| 318 |
+
tensor<fp16, [512, 2048]> var_594_to_fp16 = const()[name = tensor<string, []>("op_594_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(32783360)))];
|
| 319 |
+
tensor<fp16, [512]> var_595_to_fp16 = const()[name = tensor<string, []>("op_595_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(34880576)))];
|
| 320 |
+
tensor<fp16, [1, 1500, 512]> linear_29_cast_fp16 = linear(bias = var_595_to_fp16, weight = var_594_to_fp16, x = x_65_cast_fp16)[name = tensor<string, []>("linear_29_cast_fp16")];
|
| 321 |
+
tensor<fp16, [1, 1500, 512]> x_67_cast_fp16 = add(x = x_61_cast_fp16, y = linear_29_cast_fp16)[name = tensor<string, []>("x_67_cast_fp16")];
|
| 322 |
+
tensor<int32, []> var_605 = const()[name = tensor<string, []>("op_605"), val = tensor<int32, []>(-1)];
|
| 323 |
+
tensor<int32, [1]> var_622_axes_0 = const()[name = tensor<string, []>("op_622_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 324 |
+
tensor<fp16, [512]> blocks_5_attn_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_5_attn_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(34881664)))];
|
| 325 |
+
tensor<fp16, [512]> blocks_5_attn_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_5_attn_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(34882752)))];
|
| 326 |
+
tensor<fp16, []> var_611_to_fp16 = const()[name = tensor<string, []>("op_611_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
|
| 327 |
+
tensor<fp16, [1, 1500, 512]> var_622_cast_fp16 = layer_norm(axes = var_622_axes_0, beta = blocks_5_attn_ln_bias_to_fp16, epsilon = var_611_to_fp16, gamma = blocks_5_attn_ln_weight_to_fp16, x = x_67_cast_fp16)[name = tensor<string, []>("op_622_cast_fp16")];
|
| 328 |
+
tensor<fp16, [512, 512]> var_633_to_fp16 = const()[name = tensor<string, []>("op_633_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(34883840)))];
|
| 329 |
+
tensor<fp16, [512]> var_634_to_fp16 = const()[name = tensor<string, []>("op_634_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(35408192)))];
|
| 330 |
+
tensor<fp16, [1, 1500, 512]> linear_30_cast_fp16 = linear(bias = var_634_to_fp16, weight = var_633_to_fp16, x = var_622_cast_fp16)[name = tensor<string, []>("linear_30_cast_fp16")];
|
| 331 |
+
tensor<fp16, [512, 512]> var_637_to_fp16 = const()[name = tensor<string, []>("op_637_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(35409280)))];
|
| 332 |
+
tensor<fp16, [1, 1500, 512]> linear_31_cast_fp16 = linear(bias = linear_1_bias_0_to_fp16, weight = var_637_to_fp16, x = var_622_cast_fp16)[name = tensor<string, []>("linear_31_cast_fp16")];
|
| 333 |
+
tensor<fp16, [512, 512]> var_641_to_fp16 = const()[name = tensor<string, []>("op_641_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(35933632)))];
|
| 334 |
+
tensor<fp16, [512]> var_642_to_fp16 = const()[name = tensor<string, []>("op_642_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36457984)))];
|
| 335 |
+
tensor<fp16, [1, 1500, 512]> linear_32_cast_fp16 = linear(bias = var_642_to_fp16, weight = var_641_to_fp16, x = var_622_cast_fp16)[name = tensor<string, []>("linear_32_cast_fp16")];
|
| 336 |
+
tensor<int32, [4]> var_650 = const()[name = tensor<string, []>("op_650"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 337 |
+
tensor<fp16, [1, 1500, 8, 64]> var_651_cast_fp16 = reshape(shape = var_650, x = linear_30_cast_fp16)[name = tensor<string, []>("op_651_cast_fp16")];
|
| 338 |
+
tensor<fp16, [1, 1, 1, 1]> const_52_to_fp16 = const()[name = tensor<string, []>("const_52_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 339 |
+
tensor<fp16, [1, 1500, 8, 64]> q_cast_fp16 = mul(x = var_651_cast_fp16, y = const_52_to_fp16)[name = tensor<string, []>("q_cast_fp16")];
|
| 340 |
+
tensor<int32, [4]> var_657 = const()[name = tensor<string, []>("op_657"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 341 |
+
tensor<fp16, [1, 1500, 8, 64]> var_658_cast_fp16 = reshape(shape = var_657, x = linear_31_cast_fp16)[name = tensor<string, []>("op_658_cast_fp16")];
|
| 342 |
+
tensor<fp16, [1, 1, 1, 1]> const_53_to_fp16 = const()[name = tensor<string, []>("const_53_to_fp16"), val = tensor<fp16, [1, 1, 1, 1]>([[[[0x1.6ap-2]]]])];
|
| 343 |
+
tensor<fp16, [1, 1500, 8, 64]> k_cast_fp16 = mul(x = var_658_cast_fp16, y = const_53_to_fp16)[name = tensor<string, []>("k_cast_fp16")];
|
| 344 |
+
tensor<int32, [4]> var_664 = const()[name = tensor<string, []>("op_664"), val = tensor<int32, [4]>([1, 1500, 8, -1])];
|
| 345 |
+
tensor<fp16, [1, 1500, 8, 64]> var_665_cast_fp16 = reshape(shape = var_664, x = linear_32_cast_fp16)[name = tensor<string, []>("op_665_cast_fp16")];
|
| 346 |
+
tensor<int32, [4]> var_666 = const()[name = tensor<string, []>("op_666"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 347 |
+
tensor<bool, []> qk_transpose_x_0 = const()[name = tensor<string, []>("qk_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 348 |
+
tensor<bool, []> qk_transpose_y_0 = const()[name = tensor<string, []>("qk_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 349 |
+
tensor<int32, [4]> transpose_34_perm_0 = const()[name = tensor<string, []>("transpose_34_perm_0"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 350 |
+
tensor<int32, [4]> transpose_35_perm_0 = const()[name = tensor<string, []>("transpose_35_perm_0"), val = tensor<int32, [4]>([0, 2, 3, 1])];
|
| 351 |
+
tensor<fp16, [1, 8, 64, 1500]> transpose_37 = transpose(perm = transpose_35_perm_0, x = k_cast_fp16)[name = tensor<string, []>("transpose_37")];
|
| 352 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_38 = transpose(perm = transpose_34_perm_0, x = q_cast_fp16)[name = tensor<string, []>("transpose_38")];
|
| 353 |
+
tensor<fp16, [1, 8, 1500, 1500]> qk_cast_fp16 = matmul(transpose_x = qk_transpose_x_0, transpose_y = qk_transpose_y_0, x = transpose_38, y = transpose_37)[name = tensor<string, []>("qk_cast_fp16")];
|
| 354 |
+
tensor<fp16, [1, 8, 1500, 1500]> var_670_cast_fp16 = softmax(axis = var_605, x = qk_cast_fp16)[name = tensor<string, []>("op_670_cast_fp16")];
|
| 355 |
+
tensor<bool, []> var_672_transpose_x_0 = const()[name = tensor<string, []>("op_672_transpose_x_0"), val = tensor<bool, []>(false)];
|
| 356 |
+
tensor<bool, []> var_672_transpose_y_0 = const()[name = tensor<string, []>("op_672_transpose_y_0"), val = tensor<bool, []>(false)];
|
| 357 |
+
tensor<fp16, [1, 8, 1500, 64]> transpose_39 = transpose(perm = var_666, x = var_665_cast_fp16)[name = tensor<string, []>("transpose_39")];
|
| 358 |
+
tensor<fp16, [1, 8, 1500, 64]> var_672_cast_fp16 = matmul(transpose_x = var_672_transpose_x_0, transpose_y = var_672_transpose_y_0, x = var_670_cast_fp16, y = transpose_39)[name = tensor<string, []>("op_672_cast_fp16")];
|
| 359 |
+
tensor<int32, [4]> var_673 = const()[name = tensor<string, []>("op_673"), val = tensor<int32, [4]>([0, 2, 1, 3])];
|
| 360 |
+
tensor<int32, [3]> concat_5 = const()[name = tensor<string, []>("concat_5"), val = tensor<int32, [3]>([1, 1500, 512])];
|
| 361 |
+
tensor<fp16, [1, 1500, 8, 64]> transpose_36 = transpose(perm = var_673, x = var_672_cast_fp16)[name = tensor<string, []>("transpose_36")];
|
| 362 |
+
tensor<fp16, [1, 1500, 512]> x_71_cast_fp16 = reshape(shape = concat_5, x = transpose_36)[name = tensor<string, []>("x_71_cast_fp16")];
|
| 363 |
+
tensor<fp16, [512, 512]> var_678_to_fp16 = const()[name = tensor<string, []>("op_678_to_fp16"), val = tensor<fp16, [512, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36459072)))];
|
| 364 |
+
tensor<fp16, [512]> var_679_to_fp16 = const()[name = tensor<string, []>("op_679_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36983424)))];
|
| 365 |
+
tensor<fp16, [1, 1500, 512]> linear_33_cast_fp16 = linear(bias = var_679_to_fp16, weight = var_678_to_fp16, x = x_71_cast_fp16)[name = tensor<string, []>("linear_33_cast_fp16")];
|
| 366 |
+
tensor<fp16, [1, 1500, 512]> x_73_cast_fp16 = add(x = x_67_cast_fp16, y = linear_33_cast_fp16)[name = tensor<string, []>("x_73_cast_fp16")];
|
| 367 |
+
tensor<int32, [1]> var_686_axes_0 = const()[name = tensor<string, []>("op_686_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 368 |
+
tensor<fp16, [512]> blocks_5_mlp_ln_weight_to_fp16 = const()[name = tensor<string, []>("blocks_5_mlp_ln_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36984512)))];
|
| 369 |
+
tensor<fp16, [512]> blocks_5_mlp_ln_bias_to_fp16 = const()[name = tensor<string, []>("blocks_5_mlp_ln_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36985600)))];
|
| 370 |
+
tensor<fp16, [1, 1500, 512]> var_686_cast_fp16 = layer_norm(axes = var_686_axes_0, beta = blocks_5_mlp_ln_bias_to_fp16, epsilon = var_611_to_fp16, gamma = blocks_5_mlp_ln_weight_to_fp16, x = x_73_cast_fp16)[name = tensor<string, []>("op_686_cast_fp16")];
|
| 371 |
+
tensor<fp16, [2048, 512]> var_695_to_fp16 = const()[name = tensor<string, []>("op_695_to_fp16"), val = tensor<fp16, [2048, 512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(36986688)))];
|
| 372 |
+
tensor<fp16, [2048]> var_696_to_fp16 = const()[name = tensor<string, []>("op_696_to_fp16"), val = tensor<fp16, [2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(39083904)))];
|
| 373 |
+
tensor<fp16, [1, 1500, 2048]> linear_34_cast_fp16 = linear(bias = var_696_to_fp16, weight = var_695_to_fp16, x = var_686_cast_fp16)[name = tensor<string, []>("linear_34_cast_fp16")];
|
| 374 |
+
tensor<string, []> x_77_mode_0 = const()[name = tensor<string, []>("x_77_mode_0"), val = tensor<string, []>("EXACT")];
|
| 375 |
+
tensor<fp16, [1, 1500, 2048]> x_77_cast_fp16 = gelu(mode = x_77_mode_0, x = linear_34_cast_fp16)[name = tensor<string, []>("x_77_cast_fp16")];
|
| 376 |
+
tensor<fp16, [512, 2048]> var_701_to_fp16 = const()[name = tensor<string, []>("op_701_to_fp16"), val = tensor<fp16, [512, 2048]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(39088064)))];
|
| 377 |
+
tensor<fp16, [512]> var_702_to_fp16 = const()[name = tensor<string, []>("op_702_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(41185280)))];
|
| 378 |
+
tensor<fp16, [1, 1500, 512]> linear_35_cast_fp16 = linear(bias = var_702_to_fp16, weight = var_701_to_fp16, x = x_77_cast_fp16)[name = tensor<string, []>("linear_35_cast_fp16")];
|
| 379 |
+
tensor<fp16, [1, 1500, 512]> x_cast_fp16 = add(x = x_73_cast_fp16, y = linear_35_cast_fp16)[name = tensor<string, []>("x_cast_fp16")];
|
| 380 |
+
tensor<int32, [1]> var_716_axes_0 = const()[name = tensor<string, []>("op_716_axes_0"), val = tensor<int32, [1]>([-1])];
|
| 381 |
+
tensor<fp16, [512]> ln_post_weight_to_fp16 = const()[name = tensor<string, []>("ln_post_weight_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(41186368)))];
|
| 382 |
+
tensor<fp16, [512]> ln_post_bias_to_fp16 = const()[name = tensor<string, []>("ln_post_bias_to_fp16"), val = tensor<fp16, [512]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(41187456)))];
|
| 383 |
+
tensor<fp16, []> var_707_to_fp16 = const()[name = tensor<string, []>("op_707_to_fp16"), val = tensor<fp16, []>(0x1.5p-17)];
|
| 384 |
+
tensor<fp16, [1, 1500, 512]> var_716_cast_fp16 = layer_norm(axes = var_716_axes_0, beta = ln_post_bias_to_fp16, epsilon = var_707_to_fp16, gamma = ln_post_weight_to_fp16, x = x_cast_fp16)[name = tensor<string, []>("op_716_cast_fp16")];
|
| 385 |
+
tensor<string, []> var_716_cast_fp16_to_fp32_dtype_0 = const()[name = tensor<string, []>("op_716_cast_fp16_to_fp32_dtype_0"), val = tensor<string, []>("fp32")];
|
| 386 |
+
tensor<fp32, [1, 1500, 512]> output = cast(dtype = var_716_cast_fp16_to_fp32_dtype_0, x = var_716_cast_fp16)[name = tensor<string, []>("cast_36")];
|
| 387 |
+
} -> (output);
|
| 388 |
+
}
|
ggml-base.en-encoder.mlmodelc/weights/weight.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fc998211e55f0972c70e3d29103477cfe8c6dd485cd68438951f83fa3ee3b770
|
| 3 |
+
size 41188544
|
ggml-base.en.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a03779c86df3323075f5e796cb2ce5029f00ec8869eee3fdfb897afe36c6d002
|
| 3 |
+
size 147964211
|
ggml_to_pt.py
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import struct
|
| 2 |
+
import torch
|
| 3 |
+
import numpy as np
|
| 4 |
+
from collections import OrderedDict
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
import sys
|
| 7 |
+
|
| 8 |
+
if len(sys.argv) < 3:
|
| 9 |
+
print(
|
| 10 |
+
"Usage: convert-ggml-to-pt.py model.bin dir-output\n")
|
| 11 |
+
sys.exit(1)
|
| 12 |
+
|
| 13 |
+
fname_inp = Path(sys.argv[1])
|
| 14 |
+
dir_out = Path(sys.argv[2])
|
| 15 |
+
fname_out = dir_out / "torch-model.pt"
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
# Open the ggml file
|
| 20 |
+
with open(fname_inp, "rb") as f:
|
| 21 |
+
# Read magic number and hyperparameters
|
| 22 |
+
magic_number, n_vocab, n_audio_ctx, n_audio_state, n_audio_head, n_audio_layer, n_text_ctx, n_text_state, n_text_head, n_text_layer, n_mels, use_f16 = struct.unpack("12i", f.read(48))
|
| 23 |
+
print(f"Magic number: {magic_number}")
|
| 24 |
+
print(f"Vocab size: {n_vocab}")
|
| 25 |
+
print(f"Audio context size: {n_audio_ctx}")
|
| 26 |
+
print(f"Audio state size: {n_audio_state}")
|
| 27 |
+
print(f"Audio head size: {n_audio_head}")
|
| 28 |
+
print(f"Audio layer size: {n_audio_layer}")
|
| 29 |
+
print(f"Text context size: {n_text_ctx}")
|
| 30 |
+
print(f"Text head size: {n_text_head}")
|
| 31 |
+
print(f"Mel size: {n_mels}")
|
| 32 |
+
# Read mel filters
|
| 33 |
+
# mel_filters = np.fromfile(f, dtype=np.float32, count=n_mels * 2).reshape(n_mels, 2)
|
| 34 |
+
# print(f"Mel filters: {mel_filters}")
|
| 35 |
+
filters_shape_0 = struct.unpack("i", f.read(4))[0]
|
| 36 |
+
print(f"Filters shape 0: {filters_shape_0}")
|
| 37 |
+
filters_shape_1 = struct.unpack("i", f.read(4))[0]
|
| 38 |
+
print(f"Filters shape 1: {filters_shape_1}")
|
| 39 |
+
|
| 40 |
+
# Read tokenizer tokens
|
| 41 |
+
# bytes = f.read(4)
|
| 42 |
+
# print(bytes)
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
# for i in range(filters.shape[0]):
|
| 46 |
+
# for j in range(filters.shape[1]):
|
| 47 |
+
# fout.write(struct.pack("f", filters[i][j]))
|
| 48 |
+
mel_filters = np.zeros((filters_shape_0, filters_shape_1))
|
| 49 |
+
|
| 50 |
+
for i in range(filters_shape_0):
|
| 51 |
+
for j in range(filters_shape_1):
|
| 52 |
+
mel_filters[i][j] = struct.unpack("f", f.read(4))[0]
|
| 53 |
+
|
| 54 |
+
bytes_data = f.read(4)
|
| 55 |
+
num_tokens = struct.unpack("i", bytes_data)[0]
|
| 56 |
+
tokens = {}
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
for _ in range(num_tokens):
|
| 60 |
+
token_len = struct.unpack("i", f.read(4))[0]
|
| 61 |
+
token = f.read(token_len)
|
| 62 |
+
tokens[token] = {}
|
| 63 |
+
|
| 64 |
+
# Read model variables
|
| 65 |
+
model_state_dict = OrderedDict()
|
| 66 |
+
while True:
|
| 67 |
+
try:
|
| 68 |
+
n_dims, name_length, ftype = struct.unpack("iii", f.read(12))
|
| 69 |
+
except struct.error:
|
| 70 |
+
break # End of file
|
| 71 |
+
dims = [struct.unpack("i", f.read(4))[0] for _ in range(n_dims)]
|
| 72 |
+
dims = dims[::-1]
|
| 73 |
+
name = f.read(name_length).decode("utf-8")
|
| 74 |
+
if ftype == 1: # f16
|
| 75 |
+
data = np.fromfile(f, dtype=np.float16, count=np.prod(dims)).reshape(dims)
|
| 76 |
+
else: # f32
|
| 77 |
+
data = np.fromfile(f, dtype=np.float32, count=np.prod(dims)).reshape(dims)
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
if name in ["encoder.conv1.bias", "encoder.conv2.bias"]:
|
| 81 |
+
|
| 82 |
+
data = data[:, 0]
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
model_state_dict[name] = torch.from_numpy(data)
|
| 86 |
+
|
| 87 |
+
# Now you have the model's state_dict stored in model_state_dict
|
| 88 |
+
# You can load this state_dict into a model with the same architecture
|
| 89 |
+
|
| 90 |
+
# dims = ModelDimensions(**checkpoint["dims"])
|
| 91 |
+
# model = Whisper(dims)
|
| 92 |
+
from whisper import Whisper, ModelDimensions
|
| 93 |
+
dims = ModelDimensions(
|
| 94 |
+
n_mels=n_mels,
|
| 95 |
+
n_audio_ctx=n_audio_ctx,
|
| 96 |
+
n_audio_state=n_audio_state,
|
| 97 |
+
n_audio_head=n_audio_head,
|
| 98 |
+
n_audio_layer=n_audio_layer,
|
| 99 |
+
n_text_ctx=n_text_ctx,
|
| 100 |
+
n_text_state=n_text_state,
|
| 101 |
+
n_text_head=n_text_head,
|
| 102 |
+
n_text_layer=n_text_layer,
|
| 103 |
+
n_vocab=n_vocab,
|
| 104 |
+
)
|
| 105 |
+
model = Whisper(dims) # Replace with your model's class
|
| 106 |
+
model.load_state_dict(model_state_dict)
|
| 107 |
+
|
| 108 |
+
# Save the model in PyTorch format
|
| 109 |
+
torch.save(model.state_dict(), fname_out)
|
openvino-conversion-requirements.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
openvino-dev[pytorch,onnx]
|
| 2 |
+
openai-whisper
|