Instructions to use Matir/granite-speech-4.1-2b-plus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Matir/granite-speech-4.1-2b-plus with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Matir/granite-speech-4.1-2b-plus")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("Matir/granite-speech-4.1-2b-plus") model = AutoModelForSpeechSeq2Seq.from_pretrained("Matir/granite-speech-4.1-2b-plus") - Notebooks
- Google Colab
- Kaggle
Commit ·
d383cb8
0
Parent(s):
Duplicate from ibm-granite/granite-speech-4.1-2b-plus
Browse filesCo-authored-by: Madison Lee <kristunlee@users.noreply.huggingface.co>
- .gitattributes +35 -0
- README.md +307 -0
- chat_template.jinja +121 -0
- config.json +87 -0
- generation_config.json +10 -0
- model-00001-of-00003.safetensors +3 -0
- model-00002-of-00003.safetensors +3 -0
- model-00003-of-00003.safetensors +3 -0
- model.safetensors.index.json +961 -0
- model.sig +1 -0
- processor_config.json +17 -0
- tokenizer.json +0 -0
- tokenizer_config.json +20 -0
.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,307 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- multilingual
|
| 5 |
+
- en
|
| 6 |
+
- fr
|
| 7 |
+
- de
|
| 8 |
+
- es
|
| 9 |
+
- pt
|
| 10 |
+
base_model:
|
| 11 |
+
- ibm-granite/granite-4.0-1b-base
|
| 12 |
+
library_name: transformers
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Granite-Speech-4.1-2B-Plus
|
| 16 |
+
|
| 17 |
+
## Model Summary
|
| 18 |
+
|
| 19 |
+
Granite-Speech-4.1-2B-Plus has similar capabilities to the [Granite-Speech-4.1-2B](https://huggingface.co/ibm-granite/granite-speech-4.1-2b) model. The plus model adds two new community-requested rich transcription features that can be activated with a simple prompt change: speaker-attributed ASR (speaker labels and word transcripts) and word-level timing information. Unlike the base mode, the plus model doesn't provide punctuation and capitalization.
|
| 20 |
+
|
| 21 |
+
The model was trained on corpora similar to the [Granite-Speech-4.1-2B](https://huggingface.co/ibm-granite/granite-speech-4.1-2b) model which were augmented with speaker turns and word-level timestamp tags. This allows the model to provide different modes of functionality controlled by different prompts.
|
| 22 |
+
|
| 23 |
+
Two additional model variants explore different capabilities and inference optimization:
|
| 24 |
+
|
| 25 |
+
- [Granite-Speech-4.1-2B](https://huggingface.co/ibm-granite/granite-speech-4.1-2b) for applications where accuracy is the primary concern with support for punctuated, capitalized transcripts, AST and keyword-biased recognition, and includes Japanese.
|
| 26 |
+
- [Granite-Speech-4.1-2B-NAR](https://huggingface.co/ibm-granite/granite-speech-4.1-2b-nar) introduces a novel non-autoregressive architecture for higher throughput
|
| 27 |
+
|
| 28 |
+
### ASR only mode
|
| 29 |
+
|
| 30 |
+
In this mode the model generates only the text transcript similar to the [Granite-Speech-4.1-2B](https://huggingface.co/ibm-granite/granite-speech-4.1-2b) model.
|
| 31 |
+
|
| 32 |
+
### Speaker attributed ASR (SAA)
|
| 33 |
+
|
| 34 |
+
In this mode, the model adds speaker tags in the format of `[Speaker N]:` where $N$ is the speaker number, before each speaker turn. The speakers are numbered by their order of appearance so the first speaker will always be marked with `[Speaker 1]:` and the second with `[Speaker 2]:`, etc. For example: `"[Speaker 1]: Hello how are you [Speaker 2]: I'm fine and how are you feeling [Speaker 1]: I feel wonderful"`.
|
| 35 |
+
|
| 36 |
+
See [Resources](#resources) for more information about SAA.
|
| 37 |
+
|
| 38 |
+
### Word-level timestamps
|
| 39 |
+
|
| 40 |
+
In this mode, the model adds timestamp tags after each word indicating the end of the word in the audio. Silences are transcribed as `_` and a timestamp tag also indicates their end. The format of the tag is `[T:N]` where $N$ is an integer number indicating the time in centiseconds (1/100th of a second). To reduce the amount of generated tokens, only the last three digits of $N$ are provided. This causes a rollover after 10 seconds.
|
| 41 |
+
|
| 42 |
+
The conversion from time $t$ in seconds to timestamp is $N = round(t*100) \mod 1000$. To convert back to seconds, use $t = N/100 + 10R$ where $R$ is the rollover counter. See code below for example implementation in Python.
|
| 43 |
+
|
| 44 |
+
See [Resources](#resources) for more information about timestamps.
|
| 45 |
+
|
| 46 |
+
### Incremental decoding
|
| 47 |
+
|
| 48 |
+
There are cases where we want to transcribe a new audio segment along with previous segments that we've already transcribed. This can be useful for providing longer context for the model in order to improve transcription accuracy or to maintain the speaker numbering in SAA mode. To avoid re-decoding the previous segments, we can provide the previous transcription in the `prefix_text` field of the conversation template. The model will decode the parts after that. See the code below for examples.
|
| 49 |
+
|
| 50 |
+
### Keyword list biasing (KWB)
|
| 51 |
+
|
| 52 |
+
Keyword list biasing capability is available to enhance the recognition of keywords, such as names and technical terms.
|
| 53 |
+
This is particularly useful in tasks where complex terms may otherwise be misrecognized.
|
| 54 |
+
Keyword biasing can be applied by including the keywords directly in the prompt; for example, in ASR mode: `Can you transcribe the speech into a written format? Keywords: …`
|
| 55 |
+
|
| 56 |
+
Users may provide either a single keyword or a list of keywords, which may also include terms that do not appear in the input audio, making them well suited for batch processing or recurring domain-specific use cases.
|
| 57 |
+
|
| 58 |
+
See [Resources](#resources) for more information about keyword list biasing.
|
| 59 |
+
|
| 60 |
+
## Evaluations
|
| 61 |
+
|
| 62 |
+
Our evaluations showed that this model works well with audio segments up to 9 minutes long for ASR and SAA, and up to 5 minutes for timestamps.
|
| 63 |
+
|
| 64 |
+
### ASR
|
| 65 |
+
|
| 66 |
+
**Performance on** [**HuggingFace Open ASR leaderboard**](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)**:**
|
| 67 |
+
| **model** | **Average WER** | **AMI** | **Earnings22** | **Gigaspeech** | **LS Clean** | **LS Other** | **SPGISpeech** | **Tedlium** | **Voxpopuli** |
|
| 68 |
+
| :----------------------------------------- | :-------------: | :-----: | :------------: | :------------: | :----------: | :----------: | :------------: | :---------: | :-----------: |
|
| 69 |
+
| **ibm-granite/granite-speech-4.1-2b-plus** | 5.71 | 8.63 | 8.68 | 10.38 | 1.44 | 3.06 | 3.72 | 3.89 | 5.9 |
|
| 70 |
+
| ibm-granite/granite-speech-4.1-2b | 5.33 | 8.09 | 8.37 | 9.8 | 1.33 | 2.5 | 3.78 | 3.07 | 5.7 |
|
| 71 |
+
| ibm-granite/granite-speech-4.1-2b-nar | 5.44 | 8.03 | 8.44 | 10.16 | 1.28 | 2.77 | 3.33 | 3.62 | 5.86 |
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
(Using [speculative decoding](https://github.com/huggingface/open_asr_leaderboard/blob/main/granite/run_eval_speculative.py))
|
| 75 |
+
|
| 76 |
+
**Keyword list biasing accuracy - Keyword F1 score (%, ↑ higher is better):**
|
| 77 |
+
|
| 78 |
+
| Mode | Gigaspeech | LS-C | LS-O | SPGISpeech | VOX | TED_LIUM | Earnings22 | CV-en | CV-de | CV-es | CV-fr | CV-pt |
|
| 79 |
+
| ----------- | ---------- | -------- | -------- | ---------- | -------- | -------- | ---------- | -------- | -------- | -------- | -------- | -------- |
|
| 80 |
+
| Without KWB | 74.2 | 89.1 | 78.2 | 80.8 | 93.9 | 87.9 | 68.8 | 74.6 | 78.5 | 83.1 | 74.5 | 90.0 |
|
| 81 |
+
| With KWB | **84.1** | **96.1** | **93.0** | **92.5** | **96.3** | **94.9** | **81.5** | **91.5** | **92.9** | **93.9** | **90.6** | **95.0** |
|
| 82 |
+
|
| 83 |
+
### Speaker Attributed ASR
|
| 84 |
+
|
| 85 |
+
**Speaker Attributed ASR performance - WDER (%, ↓ lower is better):**
|
| 86 |
+
|
| 87 |
+
| **Model** | **FISHER** | **CALLHOME English** | **AMI-SDM** | **GALE** |
|
| 88 |
+
| :----------------------------- | :--------: | :------------------: | :---------: | :------: |
|
| 89 |
+
| VibeVoice ASR [1] | 2.8 | 7.1 | 27.4 | 44.8 |
|
| 90 |
+
| **Granite-speech-4.1-2b-plus** | **0.9** | **2.2** | **14.6** | **30.2** |
|
| 91 |
+
|
| 92 |
+
The results are averaged over 2-5 minute speech segments.
|
| 93 |
+
|
| 94 |
+
(The evaluation metric: Word Diarization Error Rate [WDER] is the percentage of words attributed to the wrong speaker)
|
| 95 |
+
|
| 96 |
+
### Timestamps
|
| 97 |
+
|
| 98 |
+
**Word-level timestamp accuracy - AAS (ms, ↓ lower is better):**
|
| 99 |
+
|
| 100 |
+
| **Model** | **AMI-I** | **AMI-S** | **LS-C** | **LS-O** | **VOX** | **CV** | **MLS** | **TMT** | **En Avg** | **MLS-fr** | **MLS-es** | **MLS-de** | **MLS-pt** | **CV-fr** | **CV-es** | **CV-de** | **CV-pt** | **ML Avg** |
|
| 101 |
+
| :----------------------------- | :-------: | :-------: | :------: | :------: | :------: | :------: | :------: | :------: | :--------: | :--------: | :--------: | :--------: | :--------: | :-------: | :-------: | :-------: | :-------: | :--------: |
|
| 102 |
+
| Qwen3-FA [2] | 48.1 | 82.5 | 27.8 | 29.3 | **41.0** | 48.4 | 34.3 | 29.9 | 42.7 | **38.1** | 27.0 | **31.2** | **26.3** | 30.3 | 40.0 | 29.4 | 34.2 | 33.3 |
|
| 103 |
+
| CrisperWhisper [3] | 55.7 | **64.3** | 35.9 | 40.1 | 47.2 | 97.4 | 46.4 | 42.7 | 53.7 | 35.6 | 28.0 | **31.2** | 36.8 | 62.9 | 58.9 | 60.9 | 83.8 | 50.1 |
|
| 104 |
+
| Canary-v2 [4] | 127.8 | 129.7 | 92.5 | 89.2 | 109.9 | 110.3 | 94.3 | 86.1 | 105.0 | 85.0 | 81.1 | 80.2 | – | 86.8 | 88.5 | 91.5 | – | – |
|
| 105 |
+
| WhisperX [5] | 107.1 | 150.2 | 71.7 | 72.0 | 78.8 | 91.2 | 79.2 | 63.6 | 89.2 | 117.3 | 84.7 | 132.2 | 75.0 | 104.2 | 88.1 | 126.8 | 79.5 | 101.0 |
|
| 106 |
+
| **Granite-speech-4.1-2b-plus** | **43.4** | 69.0 | **11.4** | **14.6** | 80.2 | **43.3** | **24.3** | **24.5** | **38.8** | 45.4 | **23.0** | 41.3 | 47.1 | **18.6** | **19.3** | **19.5** | **24.2** | **29.8** |
|
| 107 |
+
|
| 108 |
+
(The evaluation metric: Accumulated Averaging Shift [AAS] is measuring the average time shift of each word)
|
| 109 |
+
|
| 110 |
+
## Release Date
|
| 111 |
+
|
| 112 |
+
April 28, 2026
|
| 113 |
+
|
| 114 |
+
## License
|
| 115 |
+
|
| 116 |
+
[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 117 |
+
|
| 118 |
+
## Supported Languages
|
| 119 |
+
|
| 120 |
+
English, French, German, Spanish, Portuguese
|
| 121 |
+
|
| 122 |
+
## Intended Use
|
| 123 |
+
|
| 124 |
+
The model is intended to be used in enterprise applications that involve processing of speech input especially when a rich transcript adding speaker turns and time stamps is desired. In particular, the model is well-suited for English, French, German, Spanish, and Portuguese speech-to-text.
|
| 125 |
+
|
| 126 |
+
## Usage
|
| 127 |
+
|
| 128 |
+
The Granite Speech model is supported natively in `transformers>=5.8`. Below is a simple example of how to use the different modes of the model.
|
| 129 |
+
|
| 130 |
+
### Usage with `transformers`
|
| 131 |
+
|
| 132 |
+
First [install pytorch](https://pytorch.org/get-started/locally/).
|
| 133 |
+
|
| 134 |
+
Install [transformers](https://huggingface.co/docs/transformers/installation). The code for the granite-speech-plus model was added recently so you might need to install from the sources until the PyPI package is updated.
|
| 135 |
+
|
| 136 |
+
```shell
|
| 137 |
+
pip install torchaudio datasets accelerate torchcodec
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
**Setup** — load the model and a test audio clip:
|
| 141 |
+
|
| 142 |
+
```python
|
| 143 |
+
import re
|
| 144 |
+
import torch
|
| 145 |
+
from datasets import Audio, load_dataset
|
| 146 |
+
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
Load the model and define a general function for decoding the audio:
|
| 150 |
+
|
| 151 |
+
```python
|
| 152 |
+
MODEL_NAME = "ibm-granite/granite-speech-4.1-2b-plus"
|
| 153 |
+
|
| 154 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 155 |
+
processor = AutoProcessor.from_pretrained(MODEL_NAME)
|
| 156 |
+
tokenizer = processor.tokenizer
|
| 157 |
+
model = AutoModelForSpeechSeq2Seq.from_pretrained(MODEL_NAME, device_map=device, dtype=torch.bfloat16)
|
| 158 |
+
model.eval()
|
| 159 |
+
|
| 160 |
+
SYSTEM_PROMPT = "Knowledge Cutoff Date: April 2024.\nToday's Date: December 19, 2024.\nYou are Granite, developed by IBM. You are a helpful AI assistant"
|
| 161 |
+
|
| 162 |
+
@torch.inference_mode()
|
| 163 |
+
def transcribe(audio, prompt, max_new_tokens=2000, prefix_text=None):
|
| 164 |
+
chat = [{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": prompt}]
|
| 165 |
+
extra = {"prefix_text": prefix_text} if prefix_text is not None else {}
|
| 166 |
+
prompt_text = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True, **extra)
|
| 167 |
+
inputs = processor(prompt_text, audio, device=device, return_tensors="pt").to(device)
|
| 168 |
+
outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False, num_beams=1)
|
| 169 |
+
new_tokens = outputs[0, inputs["input_ids"].shape[-1]:]
|
| 170 |
+
output_text = tokenizer.decode(new_tokens, add_special_tokens=False, skip_special_tokens=True)
|
| 171 |
+
return output_text
|
| 172 |
+
```
|
| 173 |
+
|
| 174 |
+
Load some example audio data from the AMI dataset
|
| 175 |
+
|
| 176 |
+
```python
|
| 177 |
+
SAMPLE_RATE = 16000
|
| 178 |
+
|
| 179 |
+
ds = load_dataset("diarizers-community/ami", "ihm", split="test")
|
| 180 |
+
ds = ds.cast_column("audio", Audio(sampling_rate=SAMPLE_RATE, num_channels=1))
|
| 181 |
+
|
| 182 |
+
TEST_SAMPLE = 0
|
| 183 |
+
START_TIME, END_TIME = 5 * 60, 6 * 60
|
| 184 |
+
audio = ds["audio"][TEST_SAMPLE].get_samples_played_in_range(START_TIME, END_TIME)
|
| 185 |
+
```
|
| 186 |
+
|
| 187 |
+
**Task 1: ASR** — plain speech-to-text transcription:
|
| 188 |
+
|
| 189 |
+
```python
|
| 190 |
+
ASR_PROMPT = "<|audio|> can you transcribe the speech into a written format?"
|
| 191 |
+
|
| 192 |
+
asr_text = transcribe(audio.data, ASR_PROMPT)
|
| 193 |
+
print(asr_text)
|
| 194 |
+
```
|
| 195 |
+
|
| 196 |
+
**Task 2: Speaker Attributed ASR** — transcription with speaker labels:
|
| 197 |
+
|
| 198 |
+
```python
|
| 199 |
+
SAA_PROMPT = "<|audio|> Speaker attribution: Transcribe and denote who is speaking by adding [Speaker 1]: and [Speaker 2]: tags before speaker turns."
|
| 200 |
+
|
| 201 |
+
saa_text = transcribe(audio.data, SAA_PROMPT)
|
| 202 |
+
for segment in re.split(r"(\[Speaker \d+\]:)", saa_text):
|
| 203 |
+
print(segment.strip())
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
**Task 3: Word-level timestamps** — transcription with per-word timing:
|
| 207 |
+
|
| 208 |
+
The timestamps are given in centiseconds and are modulo 1000 (=10 seconds)
|
| 209 |
+
so we need to unwrap them by adding multiples of 10 seconds.
|
| 210 |
+
|
| 211 |
+
```python
|
| 212 |
+
TS_PROMPT = "<|audio|> Timestamps: Transcribe the speech. After each word, add a timestamp tag showing the end time in centiseconds, e.g. hello [T:45] world [T:82]"
|
| 213 |
+
|
| 214 |
+
ts_text = transcribe(audio.data, TS_PROMPT, max_new_tokens=10000)
|
| 215 |
+
ts_words = re.split(r"\[T:(\d+)\]", ts_text)
|
| 216 |
+
last_word_end_time = 0
|
| 217 |
+
offset_time = 0
|
| 218 |
+
for word, ts in zip(ts_words[::2], ts_words[1::2]):
|
| 219 |
+
word_end_time = float(ts) / 100
|
| 220 |
+
while word_end_time + offset_time < last_word_end_time:
|
| 221 |
+
offset_time += 10
|
| 222 |
+
last_word_end_time = word_end_time + offset_time
|
| 223 |
+
print(f"{word}\t{last_word_end_time:.2f}s")
|
| 224 |
+
```
|
| 225 |
+
|
| 226 |
+
**Task 4: Incremental decoding** — transcribe segments while accumulating audio context:
|
| 227 |
+
|
| 228 |
+
```python
|
| 229 |
+
NUM_SEGMENTS = 3
|
| 230 |
+
previous_transcript = ""
|
| 231 |
+
all_audio = None
|
| 232 |
+
|
| 233 |
+
for k in range(NUM_SEGMENTS):
|
| 234 |
+
t1 = START_TIME + (END_TIME - START_TIME) * k / NUM_SEGMENTS
|
| 235 |
+
t2 = START_TIME + (END_TIME - START_TIME) * (k + 1) / NUM_SEGMENTS
|
| 236 |
+
new_audio = ds["audio"][TEST_SAMPLE].get_samples_played_in_range(t1, t2)
|
| 237 |
+
all_audio = new_audio.data if all_audio is None else torch.cat([all_audio, new_audio.data], dim=-1)
|
| 238 |
+
saa_text = transcribe(all_audio, SAA_PROMPT, prefix_text=previous_transcript)
|
| 239 |
+
print(f"{t1:06.2f}-{t2:06.2f}:\t{saa_text}")
|
| 240 |
+
previous_transcript = (previous_transcript + " " + saa_text).strip()
|
| 241 |
+
```
|
| 242 |
+
|
| 243 |
+
## Model Architecture
|
| 244 |
+
|
| 245 |
+
The model shares the same architecture as the [Granite-Speech-4.1-2B](https://huggingface.co/ibm-granite/granite-speech-4.1-2b) model.
|
| 246 |
+
|
| 247 |
+
## Training Data
|
| 248 |
+
|
| 249 |
+
The model was trained on the same datasets as [Granite-Speech-4.1-2B](https://huggingface.co/ibm-granite/granite-speech-4.1-2b).
|
| 250 |
+
|
| 251 |
+
Additional training data for SAA was created using audio segments from datasets that have speaker identification (e.g. Multilingual-Librispeech). Segments with alternating speakers were concatenated to create a long multi-speaker sample.
|
| 252 |
+
|
| 253 |
+
|
| 254 |
+
### Training Data for Timestamps
|
| 255 |
+
|
| 256 |
+
Word-level timestamping capabilities are achieved by using a combination of publicly available speech corpora: LibriSpeech, MLS (en, fr, de, pt, es), CommonVoice (en, fr, de, pt, es), VoxPopuli (en, fr, de, es), AMI-IHM, Switchboard, TIMIT and YODAS. For AMI-IHM, Switchboard and TIMIT, we use the available timestamp annotations. For all other datasets, we obtain word-level alignments using the Montreal Forced Aligner (MFA), a GMM-HMM based forced alignment tool. We also use MFA to insert silence boundaries into the manually annotated datasets.
|
| 257 |
+
|
| 258 |
+
To ensure high-quality training data, we validate the MFA-derived alignments using forced alignments with our CTC-based speech encoder. We compute the Accumulated Average Shift (AAS), the mean absolute error between timestamps in milliseconds, for the CTC and MFA alignments and retain only samples with the lowest alignment error: the top 95% for English and top 70% for non-English data. For the larger datasets (YODAS and MLS-en), we cap the training data at 4M and 5M samples, respectively.
|
| 259 |
+
|
| 260 |
+
Additional training data containing long audio samples with timestamps were generated by concatenation of short segments.
|
| 261 |
+
|
| 262 |
+
The model was trained on audio samples up to 10 minutes for ASR and SAA, and up to 5 minutes for timestamps.
|
| 263 |
+
|
| 264 |
+
## Infrastructure
|
| 265 |
+
|
| 266 |
+
We train Granite Speech using IBM's supercomputing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable
|
| 267 |
+
and efficient infrastructure for training our models over thousands of GPUs. The training of this particular model was completed in about 5 days on 32
|
| 268 |
+
H100 GPUs.
|
| 269 |
+
|
| 270 |
+
## Ethical Considerations and Limitations
|
| 271 |
+
|
| 272 |
+
The use of Large Speech and Language Models can trigger certain risks and ethical considerations. Although our alignment processes include safety considerations,
|
| 273 |
+
the model may in some cases produce inaccurate, biased, offensive or unwanted responses to user prompts. Additionally, whether smaller models may exhibit increased
|
| 274 |
+
susceptibility to hallucination in generation scenarios due to their reduced sizes, which could limit their ability to generate coherent and contextually accurate responses, remains uncertain.
|
| 275 |
+
This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain.
|
| 276 |
+
|
| 277 |
+
IBM recommends using this model for automatic speech recognition and translation tasks. The model's design improves safety by limiting how audio inputs can influence the system.
|
| 278 |
+
If an unfamiliar or malformed prompt is received, the model simply ignores it and performs transcription, which is the default fallback mode.
|
| 279 |
+
This minimizes the risk of adversarial inputs, unlike integrated models that directly interpret audio and may be more exposed to such attacks. Note that more general speech tasks may pose higher inherent risks of triggering unwanted outputs.
|
| 280 |
+
|
| 281 |
+
To enhance safety, we recommend using Granite-Speech-4.1-2B-Plus alongside Granite Guardian. Granite Guardian is a fine-tuned instruct model designed to detect and flag risks in prompts and responses across key dimensions outlined in the IBM AI Risk Atlas.
|
| 282 |
+
|
| 283 |
+
## Resources
|
| 284 |
+
|
| 285 |
+
- 📄 Read the papers:
|
| 286 |
+
- [Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS](https://arxiv.org/abs/2604.11269)
|
| 287 |
+
- [In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions](https://arxiv.org/abs/2604.22817)
|
| 288 |
+
- [Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction](https://arxiv.org/abs/2604.12398)
|
| 289 |
+
- [Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities](https://arxiv.org/abs/2505.08699)
|
| 290 |
+
- [Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts](https://arxiv.org/abs/2603.11243)
|
| 291 |
+
- [NLE: Non-autoregressive LLM-based ASR by Transcript Editing](https://arxiv.org/abs/2603.08397)
|
| 292 |
+
- ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
|
| 293 |
+
- 🚀 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
|
| 294 |
+
- 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
|
| 295 |
+
|
| 296 |
+
## References
|
| 297 |
+
|
| 298 |
+
[1] VibeVoice-ASR (Transformers-compatible version). Available online: https://huggingface.co/microsoft/VibeVoice-ASR-HF.
|
| 299 |
+
|
| 300 |
+
[2] X. Shi et al., "Qwen3-ASR technical report," 2026. arXiv
|
| 301 |
+
|
| 302 |
+
[3] M. Zusag, L. Wagner, and B. Thallinger, "CrisperWhisper: Accurate timestamps on verbatim speech transcriptions," in Proc. Interspeech, 2024.
|
| 303 |
+
|
| 304 |
+
[4] M. Sekoyan, N. R. Koluguri, N. Tadevosyan, P. Zelasko, T. Bartley, N. Karpov, J. Balam, and B. Ginsburg, "Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and high-performance models for multilingual ASR and AST," 2025. arXiv
|
| 305 |
+
|
| 306 |
+
[5] M. Bain, J. Huh, T. Han, and A. Zisserman, "WhisperX: Time-accurate speech transcription of long-form audio," 2023. arXiv
|
| 307 |
+
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- set tools_system_message_prefix = 'You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>' %}
|
| 2 |
+
{%- set tools_system_message_suffix = '\n</tools>\n\nFor each tool call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.' %}
|
| 3 |
+
{%- set documents_system_message_prefix = 'You are a helpful assistant with access to the following documents. You may use one or more documents to assist with the user query.\n\nYou are given a list of documents within <documents></documents> XML tags:\n<documents>' %}
|
| 4 |
+
{%- set documents_system_message_suffix = '\n</documents>\n\nWrite the response to the user\'s input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data.' %}
|
| 5 |
+
{%- set g4_default_system_message = 'You are a helpful assistant. Please ensure responses are professional, accurate, and safe.' %}
|
| 6 |
+
{%- if available_tools is defined and available_tools %}
|
| 7 |
+
{%- set tools = available_tools %}
|
| 8 |
+
{%- endif %}
|
| 9 |
+
{%- set ns = namespace(tools_system_message=tools_system_message_prefix,
|
| 10 |
+
documents_system_message=documents_system_message_prefix,
|
| 11 |
+
default_system_message=g4_default_system_message,
|
| 12 |
+
system_message=''
|
| 13 |
+
) %}
|
| 14 |
+
{%- if tools %}
|
| 15 |
+
{%- for tool in tools %}
|
| 16 |
+
{%- set ns.tools_system_message = ns.tools_system_message + '\n' + (tool | tojson) %}
|
| 17 |
+
{%- endfor %}
|
| 18 |
+
{%- set ns.tools_system_message = ns.tools_system_message + tools_system_message_suffix %}
|
| 19 |
+
{%- else %}
|
| 20 |
+
{%- set ns.tools_system_message = '' %}
|
| 21 |
+
{%- endif %}
|
| 22 |
+
{%- if documents %}
|
| 23 |
+
{%- for document in documents %}
|
| 24 |
+
{%- set ns.documents_system_message = ns.documents_system_message + '\n' + (document | tojson) %}
|
| 25 |
+
{%- endfor %}
|
| 26 |
+
{%- set ns.documents_system_message = ns.documents_system_message + documents_system_message_suffix %}
|
| 27 |
+
{%- else %}
|
| 28 |
+
{%- set ns.documents_system_message = '' %}
|
| 29 |
+
{%- endif %}
|
| 30 |
+
{%- if messages[0].role == 'system' %}
|
| 31 |
+
{%- if messages[0].content is string %}
|
| 32 |
+
{%- set ns.system_message = messages[0].content %}
|
| 33 |
+
{%- elif messages[0].content is iterable %}
|
| 34 |
+
{%- for entry in messages[0].content %}
|
| 35 |
+
{%- if entry.type== 'text' %}
|
| 36 |
+
{%- if ns.system_message != '' %}
|
| 37 |
+
{%- set ns.system_message = ns.system_message + '\n' %}
|
| 38 |
+
{%- endif %}
|
| 39 |
+
{%- set ns.system_message = ns.system_message + entry.text %}
|
| 40 |
+
{%- endif %}
|
| 41 |
+
{%- endfor %}
|
| 42 |
+
{%- endif %}
|
| 43 |
+
{%- if tools and documents %}
|
| 44 |
+
{%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message + '\n\n' + ns.documents_system_message %}
|
| 45 |
+
{%- elif tools %}
|
| 46 |
+
{%- set ns.system_message = ns.system_message + '\n\n' + ns.tools_system_message %}
|
| 47 |
+
{%- elif documents %}
|
| 48 |
+
{%- set ns.system_message = ns.system_message + '\n\n' + ns.documents_system_message %}
|
| 49 |
+
{%- endif %}
|
| 50 |
+
{%- else %}
|
| 51 |
+
{%- if tools and documents %}
|
| 52 |
+
{%- set ns.system_message = ns.tools_system_message + '\n\n' + ns.documents_system_message %}
|
| 53 |
+
{%- elif tools %}
|
| 54 |
+
{%- set ns.system_message = ns.tools_system_message %}
|
| 55 |
+
{%- elif documents %}
|
| 56 |
+
{%- set ns.system_message = ns.documents_system_message %}
|
| 57 |
+
{%- endif %}
|
| 58 |
+
{%- endif %}
|
| 59 |
+
{%- if ns.system_message %}
|
| 60 |
+
{{- '<|start_of_role|>system<|end_of_role|>' + ns.system_message + '<|end_of_text|>\n' }}
|
| 61 |
+
{%- else %}
|
| 62 |
+
{{- '<|start_of_role|>system<|end_of_role|>' + ns.default_system_message + '<|end_of_text|>\n' }}
|
| 63 |
+
{%- endif %}
|
| 64 |
+
{%- for message in messages %}
|
| 65 |
+
{%- set content = namespace(val='') %}
|
| 66 |
+
{%- if message.content is string %}
|
| 67 |
+
{%- set content.val = message.content %}
|
| 68 |
+
{%- else %}
|
| 69 |
+
{%- if message.content is iterable %}
|
| 70 |
+
{%- for entry in message.content %}
|
| 71 |
+
{%- if entry.type== 'text' %}
|
| 72 |
+
{%- if content.val != '' %}
|
| 73 |
+
{%- set content.val = content.val + '\n' %}
|
| 74 |
+
{%- endif %}
|
| 75 |
+
{%- set content.val = content.val + entry.text %}
|
| 76 |
+
{%- endif %}
|
| 77 |
+
{%- endfor %}
|
| 78 |
+
{%- endif %}
|
| 79 |
+
{%- endif %}
|
| 80 |
+
{%- if (message.role == 'user') or (message.role == 'system' and not loop.first) %}
|
| 81 |
+
{{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val + '<|end_of_text|>\n' }}
|
| 82 |
+
{%- elif message.role == 'assistant' %}
|
| 83 |
+
{{- '<|start_of_role|>' + message.role + '<|end_of_role|>' + content.val }}
|
| 84 |
+
{%- if message.tool_calls %}
|
| 85 |
+
{%- for tool_call in message.tool_calls %}
|
| 86 |
+
{%- if (loop.first and content.val) or (not loop.first) %}
|
| 87 |
+
{{- '\n' }}
|
| 88 |
+
{%- endif %}
|
| 89 |
+
{%- if tool_call.function %}
|
| 90 |
+
{%- set tool_call = tool_call.function %}
|
| 91 |
+
{%- endif %}
|
| 92 |
+
{{- '<tool_call>\n{"name": "' }}
|
| 93 |
+
{{- tool_call.name }}
|
| 94 |
+
{{- '", "arguments": ' }}
|
| 95 |
+
{%- if tool_call.arguments is string %}
|
| 96 |
+
{{- tool_call.arguments }}
|
| 97 |
+
{%- else %}
|
| 98 |
+
{{- tool_call.arguments | tojson }}
|
| 99 |
+
{%- endif %}
|
| 100 |
+
{{- '}\n</tool_call>' }}
|
| 101 |
+
{%- endfor %}
|
| 102 |
+
{%- endif %}
|
| 103 |
+
{{- '<|end_of_text|>\n' }}
|
| 104 |
+
{%- elif message.role == 'tool' %}
|
| 105 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != 'tool') %}
|
| 106 |
+
{{- '<|start_of_role|>user<|end_of_role|>' }}
|
| 107 |
+
{%- endif %}
|
| 108 |
+
{{- '\n<tool_response>\n' }}
|
| 109 |
+
{{- content.val }}
|
| 110 |
+
{{- '\n</tool_response>' }}
|
| 111 |
+
{%- if loop.last or (messages[loop.index0 + 1].role != 'tool') %}
|
| 112 |
+
{{- '<|end_of_text|>\n' }}
|
| 113 |
+
{%- endif %}
|
| 114 |
+
{%- endif %}
|
| 115 |
+
{%- endfor %}
|
| 116 |
+
{%- if add_generation_prompt %}
|
| 117 |
+
{{- '<|start_of_role|>assistant<|end_of_role|>' }}
|
| 118 |
+
{%- if prefix_text is defined and prefix_text %}
|
| 119 |
+
{{- prefix_text }}
|
| 120 |
+
{%- endif %}
|
| 121 |
+
{%- endif %}
|
config.json
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"GraniteSpeechPlusForConditionalGeneration"
|
| 4 |
+
],
|
| 5 |
+
"audio_token_index": 100352,
|
| 6 |
+
"downsample_rate": 5,
|
| 7 |
+
"dtype": "bfloat16",
|
| 8 |
+
"encoder_config": {
|
| 9 |
+
"cat_hidden_layers": [
|
| 10 |
+
3
|
| 11 |
+
],
|
| 12 |
+
"context_size": 200,
|
| 13 |
+
"conv_expansion_factor": 2,
|
| 14 |
+
"conv_kernel_size": 15,
|
| 15 |
+
"dim_head": 128,
|
| 16 |
+
"dropout": 0.1,
|
| 17 |
+
"feedforward_mult": 4,
|
| 18 |
+
"hidden_dim": 1024,
|
| 19 |
+
"input_dim": 160,
|
| 20 |
+
"max_pos_emb": 512,
|
| 21 |
+
"model_type": "granite_speech_plus_encoder",
|
| 22 |
+
"num_heads": 8,
|
| 23 |
+
"num_layers": 16,
|
| 24 |
+
"output_dim": 348
|
| 25 |
+
},
|
| 26 |
+
"has_lora_adapter": false,
|
| 27 |
+
"initializer_range": 0.02,
|
| 28 |
+
"model_type": "granite_speech_plus",
|
| 29 |
+
"projector_config": {
|
| 30 |
+
"_attn_implementation_autoset": true,
|
| 31 |
+
"attention_probs_dropout_prob": 0.1,
|
| 32 |
+
"cross_attention_frequency": 1,
|
| 33 |
+
"encoder_hidden_size": 2048,
|
| 34 |
+
"hidden_act": "gelu",
|
| 35 |
+
"hidden_dropout_prob": 0.1,
|
| 36 |
+
"hidden_size": 1024,
|
| 37 |
+
"initializer_range": 0.02,
|
| 38 |
+
"intermediate_size": 4096,
|
| 39 |
+
"layer_norm_eps": 1e-12,
|
| 40 |
+
"max_position_embeddings": 2048,
|
| 41 |
+
"model_type": "blip_2_qformer",
|
| 42 |
+
"num_attention_heads": 16,
|
| 43 |
+
"num_hidden_layers": 2,
|
| 44 |
+
"pad_token_id": 0,
|
| 45 |
+
"position_embedding_type": "absolute",
|
| 46 |
+
"use_qformer_text_input": false,
|
| 47 |
+
"vocab_size": 30522
|
| 48 |
+
},
|
| 49 |
+
"text_config": {
|
| 50 |
+
"_name_or_path": "/proj/speech/saon/slam-llm/29.2-c/granite-4.0-1b-base",
|
| 51 |
+
"architectures": [
|
| 52 |
+
"GraniteForCausalLM"
|
| 53 |
+
],
|
| 54 |
+
"attention_bias": false,
|
| 55 |
+
"attention_dropout": 0.0,
|
| 56 |
+
"attention_multiplier": 0.0078125,
|
| 57 |
+
"bos_token_id": 100257,
|
| 58 |
+
"dtype": "float32",
|
| 59 |
+
"embedding_multiplier": 12,
|
| 60 |
+
"eos_token_id": 100257,
|
| 61 |
+
"hidden_act": "silu",
|
| 62 |
+
"hidden_size": 2048,
|
| 63 |
+
"initializer_range": 0.1,
|
| 64 |
+
"intermediate_size": 4096,
|
| 65 |
+
"logits_scaling": 8,
|
| 66 |
+
"max_position_embeddings": 4096,
|
| 67 |
+
"mlp_bias": false,
|
| 68 |
+
"model_type": "granite",
|
| 69 |
+
"num_attention_heads": 16,
|
| 70 |
+
"num_hidden_layers": 40,
|
| 71 |
+
"num_key_value_heads": 4,
|
| 72 |
+
"pad_token_id": 100256,
|
| 73 |
+
"residual_multiplier": 0.22,
|
| 74 |
+
"rms_norm_eps": 1e-05,
|
| 75 |
+
"rope_parameters": {
|
| 76 |
+
"rope_theta": 10000,
|
| 77 |
+
"rope_type": "default"
|
| 78 |
+
},
|
| 79 |
+
"tie_word_embeddings": true,
|
| 80 |
+
"use_cache": true,
|
| 81 |
+
"vocab_size": 100353,
|
| 82 |
+
"rope_theta": 10000,
|
| 83 |
+
"rope_type": "default"
|
| 84 |
+
},
|
| 85 |
+
"transformers_version": "5.6.0.dev0",
|
| 86 |
+
"window_size": 15
|
| 87 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"bos_token_id": 100257,
|
| 4 |
+
"eos_token_id": 100257,
|
| 5 |
+
"output_attentions": false,
|
| 6 |
+
"output_hidden_states": false,
|
| 7 |
+
"pad_token_id": 100256,
|
| 8 |
+
"transformers_version": "5.6.0.dev0",
|
| 9 |
+
"use_cache": true
|
| 10 |
+
}
|
model-00001-of-00003.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:af45105ba955e3a796f39c3cddc6feae9fb4696b46e99f18355df9d7c8bdb0ba
|
| 3 |
+
size 1992505016
|
model-00002-of-00003.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:172bddcb0b9fe4e59b4302eecc478bbe5fb477759b80a52e476b43b55c9493a7
|
| 3 |
+
size 1993777408
|
model-00003-of-00003.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:12994776f7c9e24cda3339ee8a6ca6a07600f5ae4a4c38d66703dcefb8ff4624
|
| 3 |
+
size 237587992
|
model.safetensors.index.json
ADDED
|
@@ -0,0 +1,961 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"metadata": {
|
| 3 |
+
"total_parameters": 2111812956,
|
| 4 |
+
"total_size": 4223757112
|
| 5 |
+
},
|
| 6 |
+
"weight_map": {
|
| 7 |
+
"encoder.input_linear.bias": "model-00002-of-00003.safetensors",
|
| 8 |
+
"encoder.input_linear.weight": "model-00002-of-00003.safetensors",
|
| 9 |
+
"encoder.layers.0.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 10 |
+
"encoder.layers.0.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 11 |
+
"encoder.layers.0.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 12 |
+
"encoder.layers.0.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 13 |
+
"encoder.layers.0.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 14 |
+
"encoder.layers.0.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 15 |
+
"encoder.layers.0.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 16 |
+
"encoder.layers.0.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 17 |
+
"encoder.layers.0.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 18 |
+
"encoder.layers.0.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 19 |
+
"encoder.layers.0.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 20 |
+
"encoder.layers.0.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 21 |
+
"encoder.layers.0.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 22 |
+
"encoder.layers.0.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 23 |
+
"encoder.layers.0.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 24 |
+
"encoder.layers.0.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 25 |
+
"encoder.layers.0.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 26 |
+
"encoder.layers.0.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 27 |
+
"encoder.layers.0.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 28 |
+
"encoder.layers.0.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 29 |
+
"encoder.layers.0.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 30 |
+
"encoder.layers.0.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 31 |
+
"encoder.layers.0.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 32 |
+
"encoder.layers.0.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 33 |
+
"encoder.layers.0.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 34 |
+
"encoder.layers.0.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 35 |
+
"encoder.layers.0.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 36 |
+
"encoder.layers.0.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 37 |
+
"encoder.layers.0.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 38 |
+
"encoder.layers.0.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 39 |
+
"encoder.layers.0.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 40 |
+
"encoder.layers.0.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 41 |
+
"encoder.layers.0.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 42 |
+
"encoder.layers.1.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 43 |
+
"encoder.layers.1.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 44 |
+
"encoder.layers.1.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 45 |
+
"encoder.layers.1.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 46 |
+
"encoder.layers.1.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 47 |
+
"encoder.layers.1.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 48 |
+
"encoder.layers.1.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 49 |
+
"encoder.layers.1.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 50 |
+
"encoder.layers.1.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 51 |
+
"encoder.layers.1.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 52 |
+
"encoder.layers.1.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 53 |
+
"encoder.layers.1.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 54 |
+
"encoder.layers.1.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 55 |
+
"encoder.layers.1.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 56 |
+
"encoder.layers.1.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 57 |
+
"encoder.layers.1.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 58 |
+
"encoder.layers.1.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 59 |
+
"encoder.layers.1.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 60 |
+
"encoder.layers.1.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 61 |
+
"encoder.layers.1.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 62 |
+
"encoder.layers.1.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 63 |
+
"encoder.layers.1.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 64 |
+
"encoder.layers.1.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 65 |
+
"encoder.layers.1.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 66 |
+
"encoder.layers.1.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 67 |
+
"encoder.layers.1.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 68 |
+
"encoder.layers.1.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 69 |
+
"encoder.layers.1.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 70 |
+
"encoder.layers.1.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 71 |
+
"encoder.layers.1.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 72 |
+
"encoder.layers.1.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 73 |
+
"encoder.layers.1.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 74 |
+
"encoder.layers.1.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 75 |
+
"encoder.layers.10.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 76 |
+
"encoder.layers.10.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 77 |
+
"encoder.layers.10.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 78 |
+
"encoder.layers.10.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 79 |
+
"encoder.layers.10.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 80 |
+
"encoder.layers.10.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 81 |
+
"encoder.layers.10.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 82 |
+
"encoder.layers.10.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 83 |
+
"encoder.layers.10.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 84 |
+
"encoder.layers.10.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 85 |
+
"encoder.layers.10.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 86 |
+
"encoder.layers.10.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 87 |
+
"encoder.layers.10.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 88 |
+
"encoder.layers.10.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 89 |
+
"encoder.layers.10.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 90 |
+
"encoder.layers.10.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 91 |
+
"encoder.layers.10.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 92 |
+
"encoder.layers.10.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 93 |
+
"encoder.layers.10.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 94 |
+
"encoder.layers.10.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 95 |
+
"encoder.layers.10.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 96 |
+
"encoder.layers.10.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 97 |
+
"encoder.layers.10.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 98 |
+
"encoder.layers.10.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 99 |
+
"encoder.layers.10.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 100 |
+
"encoder.layers.10.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 101 |
+
"encoder.layers.10.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 102 |
+
"encoder.layers.10.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 103 |
+
"encoder.layers.10.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 104 |
+
"encoder.layers.10.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 105 |
+
"encoder.layers.10.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 106 |
+
"encoder.layers.10.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 107 |
+
"encoder.layers.10.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 108 |
+
"encoder.layers.11.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 109 |
+
"encoder.layers.11.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 110 |
+
"encoder.layers.11.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 111 |
+
"encoder.layers.11.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 112 |
+
"encoder.layers.11.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 113 |
+
"encoder.layers.11.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 114 |
+
"encoder.layers.11.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 115 |
+
"encoder.layers.11.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 116 |
+
"encoder.layers.11.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 117 |
+
"encoder.layers.11.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 118 |
+
"encoder.layers.11.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 119 |
+
"encoder.layers.11.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 120 |
+
"encoder.layers.11.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 121 |
+
"encoder.layers.11.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 122 |
+
"encoder.layers.11.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 123 |
+
"encoder.layers.11.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 124 |
+
"encoder.layers.11.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 125 |
+
"encoder.layers.11.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 126 |
+
"encoder.layers.11.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 127 |
+
"encoder.layers.11.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 128 |
+
"encoder.layers.11.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 129 |
+
"encoder.layers.11.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 130 |
+
"encoder.layers.11.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 131 |
+
"encoder.layers.11.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 132 |
+
"encoder.layers.11.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 133 |
+
"encoder.layers.11.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 134 |
+
"encoder.layers.11.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 135 |
+
"encoder.layers.11.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 136 |
+
"encoder.layers.11.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 137 |
+
"encoder.layers.11.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 138 |
+
"encoder.layers.11.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 139 |
+
"encoder.layers.11.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 140 |
+
"encoder.layers.11.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 141 |
+
"encoder.layers.12.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 142 |
+
"encoder.layers.12.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 143 |
+
"encoder.layers.12.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 144 |
+
"encoder.layers.12.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 145 |
+
"encoder.layers.12.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 146 |
+
"encoder.layers.12.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 147 |
+
"encoder.layers.12.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 148 |
+
"encoder.layers.12.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 149 |
+
"encoder.layers.12.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 150 |
+
"encoder.layers.12.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 151 |
+
"encoder.layers.12.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 152 |
+
"encoder.layers.12.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 153 |
+
"encoder.layers.12.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 154 |
+
"encoder.layers.12.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 155 |
+
"encoder.layers.12.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 156 |
+
"encoder.layers.12.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 157 |
+
"encoder.layers.12.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 158 |
+
"encoder.layers.12.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 159 |
+
"encoder.layers.12.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 160 |
+
"encoder.layers.12.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 161 |
+
"encoder.layers.12.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 162 |
+
"encoder.layers.12.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 163 |
+
"encoder.layers.12.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 164 |
+
"encoder.layers.12.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 165 |
+
"encoder.layers.12.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 166 |
+
"encoder.layers.12.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 167 |
+
"encoder.layers.12.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 168 |
+
"encoder.layers.12.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 169 |
+
"encoder.layers.12.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 170 |
+
"encoder.layers.12.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 171 |
+
"encoder.layers.12.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 172 |
+
"encoder.layers.12.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 173 |
+
"encoder.layers.12.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 174 |
+
"encoder.layers.13.attn.pre_norm.bias": "model-00003-of-00003.safetensors",
|
| 175 |
+
"encoder.layers.13.attn.pre_norm.weight": "model-00003-of-00003.safetensors",
|
| 176 |
+
"encoder.layers.13.attn.rel_pos_emb.weight": "model-00003-of-00003.safetensors",
|
| 177 |
+
"encoder.layers.13.attn.to_kv.weight": "model-00003-of-00003.safetensors",
|
| 178 |
+
"encoder.layers.13.attn.to_out.bias": "model-00003-of-00003.safetensors",
|
| 179 |
+
"encoder.layers.13.attn.to_out.weight": "model-00003-of-00003.safetensors",
|
| 180 |
+
"encoder.layers.13.attn.to_q.weight": "model-00003-of-00003.safetensors",
|
| 181 |
+
"encoder.layers.13.conv.batch_norm.bias": "model-00003-of-00003.safetensors",
|
| 182 |
+
"encoder.layers.13.conv.batch_norm.num_batches_tracked": "model-00003-of-00003.safetensors",
|
| 183 |
+
"encoder.layers.13.conv.batch_norm.running_mean": "model-00003-of-00003.safetensors",
|
| 184 |
+
"encoder.layers.13.conv.batch_norm.running_var": "model-00003-of-00003.safetensors",
|
| 185 |
+
"encoder.layers.13.conv.batch_norm.weight": "model-00003-of-00003.safetensors",
|
| 186 |
+
"encoder.layers.13.conv.depth_conv.conv.weight": "model-00003-of-00003.safetensors",
|
| 187 |
+
"encoder.layers.13.conv.down_conv.bias": "model-00003-of-00003.safetensors",
|
| 188 |
+
"encoder.layers.13.conv.down_conv.weight": "model-00003-of-00003.safetensors",
|
| 189 |
+
"encoder.layers.13.conv.norm.bias": "model-00003-of-00003.safetensors",
|
| 190 |
+
"encoder.layers.13.conv.norm.weight": "model-00003-of-00003.safetensors",
|
| 191 |
+
"encoder.layers.13.conv.up_conv.bias": "model-00003-of-00003.safetensors",
|
| 192 |
+
"encoder.layers.13.conv.up_conv.weight": "model-00003-of-00003.safetensors",
|
| 193 |
+
"encoder.layers.13.ff1.down_proj.bias": "model-00003-of-00003.safetensors",
|
| 194 |
+
"encoder.layers.13.ff1.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 195 |
+
"encoder.layers.13.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 196 |
+
"encoder.layers.13.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 197 |
+
"encoder.layers.13.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 198 |
+
"encoder.layers.13.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 199 |
+
"encoder.layers.13.ff2.down_proj.bias": "model-00003-of-00003.safetensors",
|
| 200 |
+
"encoder.layers.13.ff2.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 201 |
+
"encoder.layers.13.ff2.pre_norm.bias": "model-00003-of-00003.safetensors",
|
| 202 |
+
"encoder.layers.13.ff2.pre_norm.weight": "model-00003-of-00003.safetensors",
|
| 203 |
+
"encoder.layers.13.ff2.up_proj.bias": "model-00003-of-00003.safetensors",
|
| 204 |
+
"encoder.layers.13.ff2.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 205 |
+
"encoder.layers.13.post_norm.bias": "model-00003-of-00003.safetensors",
|
| 206 |
+
"encoder.layers.13.post_norm.weight": "model-00003-of-00003.safetensors",
|
| 207 |
+
"encoder.layers.14.attn.pre_norm.bias": "model-00003-of-00003.safetensors",
|
| 208 |
+
"encoder.layers.14.attn.pre_norm.weight": "model-00003-of-00003.safetensors",
|
| 209 |
+
"encoder.layers.14.attn.rel_pos_emb.weight": "model-00003-of-00003.safetensors",
|
| 210 |
+
"encoder.layers.14.attn.to_kv.weight": "model-00003-of-00003.safetensors",
|
| 211 |
+
"encoder.layers.14.attn.to_out.bias": "model-00003-of-00003.safetensors",
|
| 212 |
+
"encoder.layers.14.attn.to_out.weight": "model-00003-of-00003.safetensors",
|
| 213 |
+
"encoder.layers.14.attn.to_q.weight": "model-00003-of-00003.safetensors",
|
| 214 |
+
"encoder.layers.14.conv.batch_norm.bias": "model-00003-of-00003.safetensors",
|
| 215 |
+
"encoder.layers.14.conv.batch_norm.num_batches_tracked": "model-00003-of-00003.safetensors",
|
| 216 |
+
"encoder.layers.14.conv.batch_norm.running_mean": "model-00003-of-00003.safetensors",
|
| 217 |
+
"encoder.layers.14.conv.batch_norm.running_var": "model-00003-of-00003.safetensors",
|
| 218 |
+
"encoder.layers.14.conv.batch_norm.weight": "model-00003-of-00003.safetensors",
|
| 219 |
+
"encoder.layers.14.conv.depth_conv.conv.weight": "model-00003-of-00003.safetensors",
|
| 220 |
+
"encoder.layers.14.conv.down_conv.bias": "model-00003-of-00003.safetensors",
|
| 221 |
+
"encoder.layers.14.conv.down_conv.weight": "model-00003-of-00003.safetensors",
|
| 222 |
+
"encoder.layers.14.conv.norm.bias": "model-00003-of-00003.safetensors",
|
| 223 |
+
"encoder.layers.14.conv.norm.weight": "model-00003-of-00003.safetensors",
|
| 224 |
+
"encoder.layers.14.conv.up_conv.bias": "model-00003-of-00003.safetensors",
|
| 225 |
+
"encoder.layers.14.conv.up_conv.weight": "model-00003-of-00003.safetensors",
|
| 226 |
+
"encoder.layers.14.ff1.down_proj.bias": "model-00003-of-00003.safetensors",
|
| 227 |
+
"encoder.layers.14.ff1.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 228 |
+
"encoder.layers.14.ff1.pre_norm.bias": "model-00003-of-00003.safetensors",
|
| 229 |
+
"encoder.layers.14.ff1.pre_norm.weight": "model-00003-of-00003.safetensors",
|
| 230 |
+
"encoder.layers.14.ff1.up_proj.bias": "model-00003-of-00003.safetensors",
|
| 231 |
+
"encoder.layers.14.ff1.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 232 |
+
"encoder.layers.14.ff2.down_proj.bias": "model-00003-of-00003.safetensors",
|
| 233 |
+
"encoder.layers.14.ff2.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 234 |
+
"encoder.layers.14.ff2.pre_norm.bias": "model-00003-of-00003.safetensors",
|
| 235 |
+
"encoder.layers.14.ff2.pre_norm.weight": "model-00003-of-00003.safetensors",
|
| 236 |
+
"encoder.layers.14.ff2.up_proj.bias": "model-00003-of-00003.safetensors",
|
| 237 |
+
"encoder.layers.14.ff2.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 238 |
+
"encoder.layers.14.post_norm.bias": "model-00003-of-00003.safetensors",
|
| 239 |
+
"encoder.layers.14.post_norm.weight": "model-00003-of-00003.safetensors",
|
| 240 |
+
"encoder.layers.15.attn.pre_norm.bias": "model-00003-of-00003.safetensors",
|
| 241 |
+
"encoder.layers.15.attn.pre_norm.weight": "model-00003-of-00003.safetensors",
|
| 242 |
+
"encoder.layers.15.attn.rel_pos_emb.weight": "model-00003-of-00003.safetensors",
|
| 243 |
+
"encoder.layers.15.attn.to_kv.weight": "model-00003-of-00003.safetensors",
|
| 244 |
+
"encoder.layers.15.attn.to_out.bias": "model-00003-of-00003.safetensors",
|
| 245 |
+
"encoder.layers.15.attn.to_out.weight": "model-00003-of-00003.safetensors",
|
| 246 |
+
"encoder.layers.15.attn.to_q.weight": "model-00003-of-00003.safetensors",
|
| 247 |
+
"encoder.layers.15.conv.batch_norm.bias": "model-00003-of-00003.safetensors",
|
| 248 |
+
"encoder.layers.15.conv.batch_norm.num_batches_tracked": "model-00003-of-00003.safetensors",
|
| 249 |
+
"encoder.layers.15.conv.batch_norm.running_mean": "model-00003-of-00003.safetensors",
|
| 250 |
+
"encoder.layers.15.conv.batch_norm.running_var": "model-00003-of-00003.safetensors",
|
| 251 |
+
"encoder.layers.15.conv.batch_norm.weight": "model-00003-of-00003.safetensors",
|
| 252 |
+
"encoder.layers.15.conv.depth_conv.conv.weight": "model-00003-of-00003.safetensors",
|
| 253 |
+
"encoder.layers.15.conv.down_conv.bias": "model-00003-of-00003.safetensors",
|
| 254 |
+
"encoder.layers.15.conv.down_conv.weight": "model-00003-of-00003.safetensors",
|
| 255 |
+
"encoder.layers.15.conv.norm.bias": "model-00003-of-00003.safetensors",
|
| 256 |
+
"encoder.layers.15.conv.norm.weight": "model-00003-of-00003.safetensors",
|
| 257 |
+
"encoder.layers.15.conv.up_conv.bias": "model-00003-of-00003.safetensors",
|
| 258 |
+
"encoder.layers.15.conv.up_conv.weight": "model-00003-of-00003.safetensors",
|
| 259 |
+
"encoder.layers.15.ff1.down_proj.bias": "model-00003-of-00003.safetensors",
|
| 260 |
+
"encoder.layers.15.ff1.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 261 |
+
"encoder.layers.15.ff1.pre_norm.bias": "model-00003-of-00003.safetensors",
|
| 262 |
+
"encoder.layers.15.ff1.pre_norm.weight": "model-00003-of-00003.safetensors",
|
| 263 |
+
"encoder.layers.15.ff1.up_proj.bias": "model-00003-of-00003.safetensors",
|
| 264 |
+
"encoder.layers.15.ff1.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 265 |
+
"encoder.layers.15.ff2.down_proj.bias": "model-00003-of-00003.safetensors",
|
| 266 |
+
"encoder.layers.15.ff2.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 267 |
+
"encoder.layers.15.ff2.pre_norm.bias": "model-00003-of-00003.safetensors",
|
| 268 |
+
"encoder.layers.15.ff2.pre_norm.weight": "model-00003-of-00003.safetensors",
|
| 269 |
+
"encoder.layers.15.ff2.up_proj.bias": "model-00003-of-00003.safetensors",
|
| 270 |
+
"encoder.layers.15.ff2.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 271 |
+
"encoder.layers.15.post_norm.bias": "model-00003-of-00003.safetensors",
|
| 272 |
+
"encoder.layers.15.post_norm.weight": "model-00003-of-00003.safetensors",
|
| 273 |
+
"encoder.layers.2.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 274 |
+
"encoder.layers.2.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 275 |
+
"encoder.layers.2.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 276 |
+
"encoder.layers.2.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 277 |
+
"encoder.layers.2.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 278 |
+
"encoder.layers.2.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 279 |
+
"encoder.layers.2.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 280 |
+
"encoder.layers.2.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 281 |
+
"encoder.layers.2.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 282 |
+
"encoder.layers.2.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 283 |
+
"encoder.layers.2.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 284 |
+
"encoder.layers.2.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 285 |
+
"encoder.layers.2.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 286 |
+
"encoder.layers.2.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 287 |
+
"encoder.layers.2.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 288 |
+
"encoder.layers.2.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 289 |
+
"encoder.layers.2.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 290 |
+
"encoder.layers.2.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 291 |
+
"encoder.layers.2.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 292 |
+
"encoder.layers.2.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 293 |
+
"encoder.layers.2.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 294 |
+
"encoder.layers.2.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 295 |
+
"encoder.layers.2.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 296 |
+
"encoder.layers.2.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 297 |
+
"encoder.layers.2.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 298 |
+
"encoder.layers.2.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 299 |
+
"encoder.layers.2.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 300 |
+
"encoder.layers.2.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 301 |
+
"encoder.layers.2.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 302 |
+
"encoder.layers.2.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 303 |
+
"encoder.layers.2.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 304 |
+
"encoder.layers.2.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 305 |
+
"encoder.layers.2.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 306 |
+
"encoder.layers.3.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 307 |
+
"encoder.layers.3.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 308 |
+
"encoder.layers.3.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 309 |
+
"encoder.layers.3.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 310 |
+
"encoder.layers.3.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 311 |
+
"encoder.layers.3.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 312 |
+
"encoder.layers.3.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 313 |
+
"encoder.layers.3.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 314 |
+
"encoder.layers.3.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 315 |
+
"encoder.layers.3.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 316 |
+
"encoder.layers.3.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 317 |
+
"encoder.layers.3.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 318 |
+
"encoder.layers.3.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 319 |
+
"encoder.layers.3.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 320 |
+
"encoder.layers.3.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 321 |
+
"encoder.layers.3.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 322 |
+
"encoder.layers.3.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 323 |
+
"encoder.layers.3.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 324 |
+
"encoder.layers.3.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 325 |
+
"encoder.layers.3.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 326 |
+
"encoder.layers.3.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 327 |
+
"encoder.layers.3.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 328 |
+
"encoder.layers.3.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 329 |
+
"encoder.layers.3.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 330 |
+
"encoder.layers.3.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 331 |
+
"encoder.layers.3.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 332 |
+
"encoder.layers.3.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 333 |
+
"encoder.layers.3.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 334 |
+
"encoder.layers.3.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 335 |
+
"encoder.layers.3.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 336 |
+
"encoder.layers.3.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 337 |
+
"encoder.layers.3.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 338 |
+
"encoder.layers.3.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 339 |
+
"encoder.layers.4.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 340 |
+
"encoder.layers.4.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 341 |
+
"encoder.layers.4.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 342 |
+
"encoder.layers.4.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 343 |
+
"encoder.layers.4.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 344 |
+
"encoder.layers.4.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 345 |
+
"encoder.layers.4.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 346 |
+
"encoder.layers.4.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 347 |
+
"encoder.layers.4.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 348 |
+
"encoder.layers.4.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 349 |
+
"encoder.layers.4.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 350 |
+
"encoder.layers.4.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 351 |
+
"encoder.layers.4.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 352 |
+
"encoder.layers.4.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 353 |
+
"encoder.layers.4.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 354 |
+
"encoder.layers.4.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 355 |
+
"encoder.layers.4.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 356 |
+
"encoder.layers.4.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 357 |
+
"encoder.layers.4.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 358 |
+
"encoder.layers.4.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 359 |
+
"encoder.layers.4.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 360 |
+
"encoder.layers.4.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 361 |
+
"encoder.layers.4.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 362 |
+
"encoder.layers.4.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 363 |
+
"encoder.layers.4.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 364 |
+
"encoder.layers.4.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 365 |
+
"encoder.layers.4.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 366 |
+
"encoder.layers.4.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 367 |
+
"encoder.layers.4.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 368 |
+
"encoder.layers.4.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 369 |
+
"encoder.layers.4.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 370 |
+
"encoder.layers.4.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 371 |
+
"encoder.layers.4.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 372 |
+
"encoder.layers.5.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 373 |
+
"encoder.layers.5.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 374 |
+
"encoder.layers.5.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 375 |
+
"encoder.layers.5.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 376 |
+
"encoder.layers.5.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 377 |
+
"encoder.layers.5.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 378 |
+
"encoder.layers.5.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 379 |
+
"encoder.layers.5.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 380 |
+
"encoder.layers.5.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 381 |
+
"encoder.layers.5.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 382 |
+
"encoder.layers.5.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 383 |
+
"encoder.layers.5.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 384 |
+
"encoder.layers.5.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 385 |
+
"encoder.layers.5.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 386 |
+
"encoder.layers.5.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 387 |
+
"encoder.layers.5.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 388 |
+
"encoder.layers.5.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 389 |
+
"encoder.layers.5.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 390 |
+
"encoder.layers.5.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 391 |
+
"encoder.layers.5.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 392 |
+
"encoder.layers.5.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 393 |
+
"encoder.layers.5.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 394 |
+
"encoder.layers.5.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 395 |
+
"encoder.layers.5.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 396 |
+
"encoder.layers.5.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 397 |
+
"encoder.layers.5.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 398 |
+
"encoder.layers.5.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 399 |
+
"encoder.layers.5.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 400 |
+
"encoder.layers.5.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 401 |
+
"encoder.layers.5.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 402 |
+
"encoder.layers.5.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 403 |
+
"encoder.layers.5.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 404 |
+
"encoder.layers.5.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 405 |
+
"encoder.layers.6.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 406 |
+
"encoder.layers.6.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 407 |
+
"encoder.layers.6.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 408 |
+
"encoder.layers.6.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 409 |
+
"encoder.layers.6.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 410 |
+
"encoder.layers.6.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 411 |
+
"encoder.layers.6.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 412 |
+
"encoder.layers.6.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 413 |
+
"encoder.layers.6.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 414 |
+
"encoder.layers.6.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 415 |
+
"encoder.layers.6.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 416 |
+
"encoder.layers.6.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 417 |
+
"encoder.layers.6.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 418 |
+
"encoder.layers.6.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 419 |
+
"encoder.layers.6.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 420 |
+
"encoder.layers.6.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 421 |
+
"encoder.layers.6.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 422 |
+
"encoder.layers.6.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 423 |
+
"encoder.layers.6.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 424 |
+
"encoder.layers.6.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 425 |
+
"encoder.layers.6.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 426 |
+
"encoder.layers.6.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 427 |
+
"encoder.layers.6.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 428 |
+
"encoder.layers.6.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 429 |
+
"encoder.layers.6.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 430 |
+
"encoder.layers.6.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 431 |
+
"encoder.layers.6.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 432 |
+
"encoder.layers.6.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 433 |
+
"encoder.layers.6.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 434 |
+
"encoder.layers.6.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 435 |
+
"encoder.layers.6.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 436 |
+
"encoder.layers.6.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 437 |
+
"encoder.layers.6.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 438 |
+
"encoder.layers.7.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 439 |
+
"encoder.layers.7.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 440 |
+
"encoder.layers.7.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 441 |
+
"encoder.layers.7.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 442 |
+
"encoder.layers.7.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 443 |
+
"encoder.layers.7.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 444 |
+
"encoder.layers.7.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 445 |
+
"encoder.layers.7.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 446 |
+
"encoder.layers.7.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 447 |
+
"encoder.layers.7.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 448 |
+
"encoder.layers.7.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 449 |
+
"encoder.layers.7.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 450 |
+
"encoder.layers.7.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 451 |
+
"encoder.layers.7.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 452 |
+
"encoder.layers.7.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 453 |
+
"encoder.layers.7.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 454 |
+
"encoder.layers.7.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 455 |
+
"encoder.layers.7.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 456 |
+
"encoder.layers.7.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 457 |
+
"encoder.layers.7.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 458 |
+
"encoder.layers.7.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 459 |
+
"encoder.layers.7.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 460 |
+
"encoder.layers.7.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 461 |
+
"encoder.layers.7.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 462 |
+
"encoder.layers.7.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 463 |
+
"encoder.layers.7.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 464 |
+
"encoder.layers.7.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 465 |
+
"encoder.layers.7.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 466 |
+
"encoder.layers.7.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 467 |
+
"encoder.layers.7.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 468 |
+
"encoder.layers.7.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 469 |
+
"encoder.layers.7.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 470 |
+
"encoder.layers.7.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 471 |
+
"encoder.layers.8.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 472 |
+
"encoder.layers.8.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 473 |
+
"encoder.layers.8.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 474 |
+
"encoder.layers.8.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 475 |
+
"encoder.layers.8.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 476 |
+
"encoder.layers.8.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 477 |
+
"encoder.layers.8.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 478 |
+
"encoder.layers.8.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 479 |
+
"encoder.layers.8.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 480 |
+
"encoder.layers.8.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 481 |
+
"encoder.layers.8.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 482 |
+
"encoder.layers.8.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 483 |
+
"encoder.layers.8.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 484 |
+
"encoder.layers.8.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 485 |
+
"encoder.layers.8.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 486 |
+
"encoder.layers.8.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 487 |
+
"encoder.layers.8.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 488 |
+
"encoder.layers.8.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 489 |
+
"encoder.layers.8.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 490 |
+
"encoder.layers.8.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 491 |
+
"encoder.layers.8.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 492 |
+
"encoder.layers.8.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 493 |
+
"encoder.layers.8.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 494 |
+
"encoder.layers.8.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 495 |
+
"encoder.layers.8.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 496 |
+
"encoder.layers.8.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 497 |
+
"encoder.layers.8.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 498 |
+
"encoder.layers.8.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 499 |
+
"encoder.layers.8.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 500 |
+
"encoder.layers.8.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 501 |
+
"encoder.layers.8.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 502 |
+
"encoder.layers.8.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 503 |
+
"encoder.layers.8.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 504 |
+
"encoder.layers.9.attn.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 505 |
+
"encoder.layers.9.attn.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 506 |
+
"encoder.layers.9.attn.rel_pos_emb.weight": "model-00002-of-00003.safetensors",
|
| 507 |
+
"encoder.layers.9.attn.to_kv.weight": "model-00002-of-00003.safetensors",
|
| 508 |
+
"encoder.layers.9.attn.to_out.bias": "model-00002-of-00003.safetensors",
|
| 509 |
+
"encoder.layers.9.attn.to_out.weight": "model-00002-of-00003.safetensors",
|
| 510 |
+
"encoder.layers.9.attn.to_q.weight": "model-00002-of-00003.safetensors",
|
| 511 |
+
"encoder.layers.9.conv.batch_norm.bias": "model-00002-of-00003.safetensors",
|
| 512 |
+
"encoder.layers.9.conv.batch_norm.num_batches_tracked": "model-00002-of-00003.safetensors",
|
| 513 |
+
"encoder.layers.9.conv.batch_norm.running_mean": "model-00002-of-00003.safetensors",
|
| 514 |
+
"encoder.layers.9.conv.batch_norm.running_var": "model-00002-of-00003.safetensors",
|
| 515 |
+
"encoder.layers.9.conv.batch_norm.weight": "model-00002-of-00003.safetensors",
|
| 516 |
+
"encoder.layers.9.conv.depth_conv.conv.weight": "model-00002-of-00003.safetensors",
|
| 517 |
+
"encoder.layers.9.conv.down_conv.bias": "model-00002-of-00003.safetensors",
|
| 518 |
+
"encoder.layers.9.conv.down_conv.weight": "model-00002-of-00003.safetensors",
|
| 519 |
+
"encoder.layers.9.conv.norm.bias": "model-00002-of-00003.safetensors",
|
| 520 |
+
"encoder.layers.9.conv.norm.weight": "model-00002-of-00003.safetensors",
|
| 521 |
+
"encoder.layers.9.conv.up_conv.bias": "model-00002-of-00003.safetensors",
|
| 522 |
+
"encoder.layers.9.conv.up_conv.weight": "model-00002-of-00003.safetensors",
|
| 523 |
+
"encoder.layers.9.ff1.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 524 |
+
"encoder.layers.9.ff1.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 525 |
+
"encoder.layers.9.ff1.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 526 |
+
"encoder.layers.9.ff1.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 527 |
+
"encoder.layers.9.ff1.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 528 |
+
"encoder.layers.9.ff1.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 529 |
+
"encoder.layers.9.ff2.down_proj.bias": "model-00002-of-00003.safetensors",
|
| 530 |
+
"encoder.layers.9.ff2.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 531 |
+
"encoder.layers.9.ff2.pre_norm.bias": "model-00002-of-00003.safetensors",
|
| 532 |
+
"encoder.layers.9.ff2.pre_norm.weight": "model-00002-of-00003.safetensors",
|
| 533 |
+
"encoder.layers.9.ff2.up_proj.bias": "model-00002-of-00003.safetensors",
|
| 534 |
+
"encoder.layers.9.ff2.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 535 |
+
"encoder.layers.9.post_norm.bias": "model-00002-of-00003.safetensors",
|
| 536 |
+
"encoder.layers.9.post_norm.weight": "model-00002-of-00003.safetensors",
|
| 537 |
+
"encoder.out.bias": "model-00003-of-00003.safetensors",
|
| 538 |
+
"encoder.out.weight": "model-00003-of-00003.safetensors",
|
| 539 |
+
"encoder.out_mid.bias": "model-00003-of-00003.safetensors",
|
| 540 |
+
"encoder.out_mid.weight": "model-00003-of-00003.safetensors",
|
| 541 |
+
"language_model.model.embed_tokens.weight": "model-00001-of-00003.safetensors",
|
| 542 |
+
"language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 543 |
+
"language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 544 |
+
"language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 545 |
+
"language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 546 |
+
"language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 547 |
+
"language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 548 |
+
"language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 549 |
+
"language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 550 |
+
"language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 551 |
+
"language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 552 |
+
"language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 553 |
+
"language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 554 |
+
"language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 555 |
+
"language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 556 |
+
"language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 557 |
+
"language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 558 |
+
"language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 559 |
+
"language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 560 |
+
"language_model.model.layers.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 561 |
+
"language_model.model.layers.10.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 562 |
+
"language_model.model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 563 |
+
"language_model.model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 564 |
+
"language_model.model.layers.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 565 |
+
"language_model.model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 566 |
+
"language_model.model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 567 |
+
"language_model.model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 568 |
+
"language_model.model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 569 |
+
"language_model.model.layers.11.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 570 |
+
"language_model.model.layers.11.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 571 |
+
"language_model.model.layers.11.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 572 |
+
"language_model.model.layers.11.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 573 |
+
"language_model.model.layers.11.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 574 |
+
"language_model.model.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 575 |
+
"language_model.model.layers.11.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 576 |
+
"language_model.model.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 577 |
+
"language_model.model.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 578 |
+
"language_model.model.layers.12.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 579 |
+
"language_model.model.layers.12.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 580 |
+
"language_model.model.layers.12.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 581 |
+
"language_model.model.layers.12.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 582 |
+
"language_model.model.layers.12.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 583 |
+
"language_model.model.layers.12.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 584 |
+
"language_model.model.layers.12.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 585 |
+
"language_model.model.layers.12.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 586 |
+
"language_model.model.layers.12.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 587 |
+
"language_model.model.layers.13.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 588 |
+
"language_model.model.layers.13.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 589 |
+
"language_model.model.layers.13.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 590 |
+
"language_model.model.layers.13.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 591 |
+
"language_model.model.layers.13.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 592 |
+
"language_model.model.layers.13.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 593 |
+
"language_model.model.layers.13.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 594 |
+
"language_model.model.layers.13.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 595 |
+
"language_model.model.layers.13.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 596 |
+
"language_model.model.layers.14.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 597 |
+
"language_model.model.layers.14.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 598 |
+
"language_model.model.layers.14.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 599 |
+
"language_model.model.layers.14.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 600 |
+
"language_model.model.layers.14.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 601 |
+
"language_model.model.layers.14.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 602 |
+
"language_model.model.layers.14.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 603 |
+
"language_model.model.layers.14.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 604 |
+
"language_model.model.layers.14.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 605 |
+
"language_model.model.layers.15.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 606 |
+
"language_model.model.layers.15.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 607 |
+
"language_model.model.layers.15.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 608 |
+
"language_model.model.layers.15.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 609 |
+
"language_model.model.layers.15.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 610 |
+
"language_model.model.layers.15.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 611 |
+
"language_model.model.layers.15.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 612 |
+
"language_model.model.layers.15.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 613 |
+
"language_model.model.layers.15.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 614 |
+
"language_model.model.layers.16.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 615 |
+
"language_model.model.layers.16.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 616 |
+
"language_model.model.layers.16.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 617 |
+
"language_model.model.layers.16.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 618 |
+
"language_model.model.layers.16.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 619 |
+
"language_model.model.layers.16.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 620 |
+
"language_model.model.layers.16.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 621 |
+
"language_model.model.layers.16.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 622 |
+
"language_model.model.layers.16.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 623 |
+
"language_model.model.layers.17.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 624 |
+
"language_model.model.layers.17.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 625 |
+
"language_model.model.layers.17.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 626 |
+
"language_model.model.layers.17.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 627 |
+
"language_model.model.layers.17.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 628 |
+
"language_model.model.layers.17.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 629 |
+
"language_model.model.layers.17.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 630 |
+
"language_model.model.layers.17.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 631 |
+
"language_model.model.layers.17.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 632 |
+
"language_model.model.layers.18.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 633 |
+
"language_model.model.layers.18.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 634 |
+
"language_model.model.layers.18.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 635 |
+
"language_model.model.layers.18.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 636 |
+
"language_model.model.layers.18.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 637 |
+
"language_model.model.layers.18.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 638 |
+
"language_model.model.layers.18.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 639 |
+
"language_model.model.layers.18.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 640 |
+
"language_model.model.layers.18.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 641 |
+
"language_model.model.layers.19.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 642 |
+
"language_model.model.layers.19.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 643 |
+
"language_model.model.layers.19.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 644 |
+
"language_model.model.layers.19.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 645 |
+
"language_model.model.layers.19.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 646 |
+
"language_model.model.layers.19.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 647 |
+
"language_model.model.layers.19.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 648 |
+
"language_model.model.layers.19.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 649 |
+
"language_model.model.layers.19.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 650 |
+
"language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 651 |
+
"language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 652 |
+
"language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 653 |
+
"language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 654 |
+
"language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 655 |
+
"language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 656 |
+
"language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 657 |
+
"language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 658 |
+
"language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 659 |
+
"language_model.model.layers.20.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 660 |
+
"language_model.model.layers.20.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 661 |
+
"language_model.model.layers.20.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 662 |
+
"language_model.model.layers.20.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 663 |
+
"language_model.model.layers.20.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 664 |
+
"language_model.model.layers.20.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 665 |
+
"language_model.model.layers.20.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 666 |
+
"language_model.model.layers.20.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 667 |
+
"language_model.model.layers.20.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 668 |
+
"language_model.model.layers.21.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 669 |
+
"language_model.model.layers.21.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 670 |
+
"language_model.model.layers.21.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 671 |
+
"language_model.model.layers.21.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 672 |
+
"language_model.model.layers.21.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 673 |
+
"language_model.model.layers.21.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 674 |
+
"language_model.model.layers.21.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 675 |
+
"language_model.model.layers.21.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 676 |
+
"language_model.model.layers.21.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 677 |
+
"language_model.model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 678 |
+
"language_model.model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 679 |
+
"language_model.model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 680 |
+
"language_model.model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 681 |
+
"language_model.model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 682 |
+
"language_model.model.layers.22.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 683 |
+
"language_model.model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 684 |
+
"language_model.model.layers.22.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 685 |
+
"language_model.model.layers.22.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 686 |
+
"language_model.model.layers.23.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 687 |
+
"language_model.model.layers.23.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 688 |
+
"language_model.model.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 689 |
+
"language_model.model.layers.23.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 690 |
+
"language_model.model.layers.23.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 691 |
+
"language_model.model.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 692 |
+
"language_model.model.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 693 |
+
"language_model.model.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 694 |
+
"language_model.model.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 695 |
+
"language_model.model.layers.24.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 696 |
+
"language_model.model.layers.24.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 697 |
+
"language_model.model.layers.24.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 698 |
+
"language_model.model.layers.24.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 699 |
+
"language_model.model.layers.24.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 700 |
+
"language_model.model.layers.24.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 701 |
+
"language_model.model.layers.24.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 702 |
+
"language_model.model.layers.24.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 703 |
+
"language_model.model.layers.24.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 704 |
+
"language_model.model.layers.25.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 705 |
+
"language_model.model.layers.25.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 706 |
+
"language_model.model.layers.25.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 707 |
+
"language_model.model.layers.25.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 708 |
+
"language_model.model.layers.25.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 709 |
+
"language_model.model.layers.25.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 710 |
+
"language_model.model.layers.25.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 711 |
+
"language_model.model.layers.25.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 712 |
+
"language_model.model.layers.25.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 713 |
+
"language_model.model.layers.26.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 714 |
+
"language_model.model.layers.26.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 715 |
+
"language_model.model.layers.26.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 716 |
+
"language_model.model.layers.26.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 717 |
+
"language_model.model.layers.26.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 718 |
+
"language_model.model.layers.26.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 719 |
+
"language_model.model.layers.26.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 720 |
+
"language_model.model.layers.26.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 721 |
+
"language_model.model.layers.26.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 722 |
+
"language_model.model.layers.27.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 723 |
+
"language_model.model.layers.27.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 724 |
+
"language_model.model.layers.27.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 725 |
+
"language_model.model.layers.27.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 726 |
+
"language_model.model.layers.27.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 727 |
+
"language_model.model.layers.27.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 728 |
+
"language_model.model.layers.27.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 729 |
+
"language_model.model.layers.27.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 730 |
+
"language_model.model.layers.27.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 731 |
+
"language_model.model.layers.28.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 732 |
+
"language_model.model.layers.28.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 733 |
+
"language_model.model.layers.28.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 734 |
+
"language_model.model.layers.28.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 735 |
+
"language_model.model.layers.28.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 736 |
+
"language_model.model.layers.28.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 737 |
+
"language_model.model.layers.28.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 738 |
+
"language_model.model.layers.28.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 739 |
+
"language_model.model.layers.28.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 740 |
+
"language_model.model.layers.29.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 741 |
+
"language_model.model.layers.29.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 742 |
+
"language_model.model.layers.29.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 743 |
+
"language_model.model.layers.29.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 744 |
+
"language_model.model.layers.29.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 745 |
+
"language_model.model.layers.29.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 746 |
+
"language_model.model.layers.29.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 747 |
+
"language_model.model.layers.29.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 748 |
+
"language_model.model.layers.29.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 749 |
+
"language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 750 |
+
"language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 751 |
+
"language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 752 |
+
"language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 753 |
+
"language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 754 |
+
"language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 755 |
+
"language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 756 |
+
"language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 757 |
+
"language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 758 |
+
"language_model.model.layers.30.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 759 |
+
"language_model.model.layers.30.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 760 |
+
"language_model.model.layers.30.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 761 |
+
"language_model.model.layers.30.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 762 |
+
"language_model.model.layers.30.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 763 |
+
"language_model.model.layers.30.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 764 |
+
"language_model.model.layers.30.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 765 |
+
"language_model.model.layers.30.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 766 |
+
"language_model.model.layers.30.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 767 |
+
"language_model.model.layers.31.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 768 |
+
"language_model.model.layers.31.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 769 |
+
"language_model.model.layers.31.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 770 |
+
"language_model.model.layers.31.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 771 |
+
"language_model.model.layers.31.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 772 |
+
"language_model.model.layers.31.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 773 |
+
"language_model.model.layers.31.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 774 |
+
"language_model.model.layers.31.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 775 |
+
"language_model.model.layers.31.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 776 |
+
"language_model.model.layers.32.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 777 |
+
"language_model.model.layers.32.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 778 |
+
"language_model.model.layers.32.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 779 |
+
"language_model.model.layers.32.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 780 |
+
"language_model.model.layers.32.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 781 |
+
"language_model.model.layers.32.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 782 |
+
"language_model.model.layers.32.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 783 |
+
"language_model.model.layers.32.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 784 |
+
"language_model.model.layers.32.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 785 |
+
"language_model.model.layers.33.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 786 |
+
"language_model.model.layers.33.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 787 |
+
"language_model.model.layers.33.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 788 |
+
"language_model.model.layers.33.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 789 |
+
"language_model.model.layers.33.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 790 |
+
"language_model.model.layers.33.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 791 |
+
"language_model.model.layers.33.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 792 |
+
"language_model.model.layers.33.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 793 |
+
"language_model.model.layers.33.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 794 |
+
"language_model.model.layers.34.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 795 |
+
"language_model.model.layers.34.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 796 |
+
"language_model.model.layers.34.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 797 |
+
"language_model.model.layers.34.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 798 |
+
"language_model.model.layers.34.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 799 |
+
"language_model.model.layers.34.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 800 |
+
"language_model.model.layers.34.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 801 |
+
"language_model.model.layers.34.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 802 |
+
"language_model.model.layers.34.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 803 |
+
"language_model.model.layers.35.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 804 |
+
"language_model.model.layers.35.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 805 |
+
"language_model.model.layers.35.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 806 |
+
"language_model.model.layers.35.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 807 |
+
"language_model.model.layers.35.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 808 |
+
"language_model.model.layers.35.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 809 |
+
"language_model.model.layers.35.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 810 |
+
"language_model.model.layers.35.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 811 |
+
"language_model.model.layers.35.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 812 |
+
"language_model.model.layers.36.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 813 |
+
"language_model.model.layers.36.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 814 |
+
"language_model.model.layers.36.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 815 |
+
"language_model.model.layers.36.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 816 |
+
"language_model.model.layers.36.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 817 |
+
"language_model.model.layers.36.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 818 |
+
"language_model.model.layers.36.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 819 |
+
"language_model.model.layers.36.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 820 |
+
"language_model.model.layers.36.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 821 |
+
"language_model.model.layers.37.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 822 |
+
"language_model.model.layers.37.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 823 |
+
"language_model.model.layers.37.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 824 |
+
"language_model.model.layers.37.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 825 |
+
"language_model.model.layers.37.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 826 |
+
"language_model.model.layers.37.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 827 |
+
"language_model.model.layers.37.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 828 |
+
"language_model.model.layers.37.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 829 |
+
"language_model.model.layers.37.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 830 |
+
"language_model.model.layers.38.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 831 |
+
"language_model.model.layers.38.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 832 |
+
"language_model.model.layers.38.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 833 |
+
"language_model.model.layers.38.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 834 |
+
"language_model.model.layers.38.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 835 |
+
"language_model.model.layers.38.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 836 |
+
"language_model.model.layers.38.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 837 |
+
"language_model.model.layers.38.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 838 |
+
"language_model.model.layers.38.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 839 |
+
"language_model.model.layers.39.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 840 |
+
"language_model.model.layers.39.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 841 |
+
"language_model.model.layers.39.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 842 |
+
"language_model.model.layers.39.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 843 |
+
"language_model.model.layers.39.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 844 |
+
"language_model.model.layers.39.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 845 |
+
"language_model.model.layers.39.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 846 |
+
"language_model.model.layers.39.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 847 |
+
"language_model.model.layers.39.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 848 |
+
"language_model.model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 849 |
+
"language_model.model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 850 |
+
"language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 851 |
+
"language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 852 |
+
"language_model.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 853 |
+
"language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 854 |
+
"language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 855 |
+
"language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 856 |
+
"language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 857 |
+
"language_model.model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 858 |
+
"language_model.model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 859 |
+
"language_model.model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 860 |
+
"language_model.model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 861 |
+
"language_model.model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 862 |
+
"language_model.model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 863 |
+
"language_model.model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 864 |
+
"language_model.model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 865 |
+
"language_model.model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 866 |
+
"language_model.model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 867 |
+
"language_model.model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 868 |
+
"language_model.model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 869 |
+
"language_model.model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 870 |
+
"language_model.model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 871 |
+
"language_model.model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 872 |
+
"language_model.model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 873 |
+
"language_model.model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 874 |
+
"language_model.model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 875 |
+
"language_model.model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 876 |
+
"language_model.model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 877 |
+
"language_model.model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 878 |
+
"language_model.model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 879 |
+
"language_model.model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 880 |
+
"language_model.model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 881 |
+
"language_model.model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 882 |
+
"language_model.model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 883 |
+
"language_model.model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 884 |
+
"language_model.model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 885 |
+
"language_model.model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 886 |
+
"language_model.model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 887 |
+
"language_model.model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 888 |
+
"language_model.model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 889 |
+
"language_model.model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 890 |
+
"language_model.model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 891 |
+
"language_model.model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 892 |
+
"language_model.model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 893 |
+
"language_model.model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 894 |
+
"language_model.model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 895 |
+
"language_model.model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 896 |
+
"language_model.model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 897 |
+
"language_model.model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 898 |
+
"language_model.model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 899 |
+
"language_model.model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 900 |
+
"language_model.model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 901 |
+
"language_model.model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 902 |
+
"language_model.model.norm.weight": "model-00002-of-00003.safetensors",
|
| 903 |
+
"projector.linear.bias": "model-00003-of-00003.safetensors",
|
| 904 |
+
"projector.linear.weight": "model-00003-of-00003.safetensors",
|
| 905 |
+
"projector.qformer.encoder.layer.0.attention.attention.key.bias": "model-00003-of-00003.safetensors",
|
| 906 |
+
"projector.qformer.encoder.layer.0.attention.attention.key.weight": "model-00003-of-00003.safetensors",
|
| 907 |
+
"projector.qformer.encoder.layer.0.attention.attention.query.bias": "model-00003-of-00003.safetensors",
|
| 908 |
+
"projector.qformer.encoder.layer.0.attention.attention.query.weight": "model-00003-of-00003.safetensors",
|
| 909 |
+
"projector.qformer.encoder.layer.0.attention.attention.value.bias": "model-00003-of-00003.safetensors",
|
| 910 |
+
"projector.qformer.encoder.layer.0.attention.attention.value.weight": "model-00003-of-00003.safetensors",
|
| 911 |
+
"projector.qformer.encoder.layer.0.attention.output.LayerNorm.bias": "model-00003-of-00003.safetensors",
|
| 912 |
+
"projector.qformer.encoder.layer.0.attention.output.LayerNorm.weight": "model-00003-of-00003.safetensors",
|
| 913 |
+
"projector.qformer.encoder.layer.0.attention.output.dense.bias": "model-00003-of-00003.safetensors",
|
| 914 |
+
"projector.qformer.encoder.layer.0.attention.output.dense.weight": "model-00003-of-00003.safetensors",
|
| 915 |
+
"projector.qformer.encoder.layer.0.crossattention.attention.key.bias": "model-00003-of-00003.safetensors",
|
| 916 |
+
"projector.qformer.encoder.layer.0.crossattention.attention.key.weight": "model-00003-of-00003.safetensors",
|
| 917 |
+
"projector.qformer.encoder.layer.0.crossattention.attention.query.bias": "model-00003-of-00003.safetensors",
|
| 918 |
+
"projector.qformer.encoder.layer.0.crossattention.attention.query.weight": "model-00003-of-00003.safetensors",
|
| 919 |
+
"projector.qformer.encoder.layer.0.crossattention.attention.value.bias": "model-00003-of-00003.safetensors",
|
| 920 |
+
"projector.qformer.encoder.layer.0.crossattention.attention.value.weight": "model-00003-of-00003.safetensors",
|
| 921 |
+
"projector.qformer.encoder.layer.0.crossattention.output.LayerNorm.bias": "model-00003-of-00003.safetensors",
|
| 922 |
+
"projector.qformer.encoder.layer.0.crossattention.output.LayerNorm.weight": "model-00003-of-00003.safetensors",
|
| 923 |
+
"projector.qformer.encoder.layer.0.crossattention.output.dense.bias": "model-00003-of-00003.safetensors",
|
| 924 |
+
"projector.qformer.encoder.layer.0.crossattention.output.dense.weight": "model-00003-of-00003.safetensors",
|
| 925 |
+
"projector.qformer.encoder.layer.0.intermediate_query.dense.bias": "model-00003-of-00003.safetensors",
|
| 926 |
+
"projector.qformer.encoder.layer.0.intermediate_query.dense.weight": "model-00003-of-00003.safetensors",
|
| 927 |
+
"projector.qformer.encoder.layer.0.output_query.LayerNorm.bias": "model-00003-of-00003.safetensors",
|
| 928 |
+
"projector.qformer.encoder.layer.0.output_query.LayerNorm.weight": "model-00003-of-00003.safetensors",
|
| 929 |
+
"projector.qformer.encoder.layer.0.output_query.dense.bias": "model-00003-of-00003.safetensors",
|
| 930 |
+
"projector.qformer.encoder.layer.0.output_query.dense.weight": "model-00003-of-00003.safetensors",
|
| 931 |
+
"projector.qformer.encoder.layer.1.attention.attention.key.bias": "model-00003-of-00003.safetensors",
|
| 932 |
+
"projector.qformer.encoder.layer.1.attention.attention.key.weight": "model-00003-of-00003.safetensors",
|
| 933 |
+
"projector.qformer.encoder.layer.1.attention.attention.query.bias": "model-00003-of-00003.safetensors",
|
| 934 |
+
"projector.qformer.encoder.layer.1.attention.attention.query.weight": "model-00003-of-00003.safetensors",
|
| 935 |
+
"projector.qformer.encoder.layer.1.attention.attention.value.bias": "model-00003-of-00003.safetensors",
|
| 936 |
+
"projector.qformer.encoder.layer.1.attention.attention.value.weight": "model-00003-of-00003.safetensors",
|
| 937 |
+
"projector.qformer.encoder.layer.1.attention.output.LayerNorm.bias": "model-00003-of-00003.safetensors",
|
| 938 |
+
"projector.qformer.encoder.layer.1.attention.output.LayerNorm.weight": "model-00003-of-00003.safetensors",
|
| 939 |
+
"projector.qformer.encoder.layer.1.attention.output.dense.bias": "model-00003-of-00003.safetensors",
|
| 940 |
+
"projector.qformer.encoder.layer.1.attention.output.dense.weight": "model-00003-of-00003.safetensors",
|
| 941 |
+
"projector.qformer.encoder.layer.1.crossattention.attention.key.bias": "model-00003-of-00003.safetensors",
|
| 942 |
+
"projector.qformer.encoder.layer.1.crossattention.attention.key.weight": "model-00003-of-00003.safetensors",
|
| 943 |
+
"projector.qformer.encoder.layer.1.crossattention.attention.query.bias": "model-00003-of-00003.safetensors",
|
| 944 |
+
"projector.qformer.encoder.layer.1.crossattention.attention.query.weight": "model-00003-of-00003.safetensors",
|
| 945 |
+
"projector.qformer.encoder.layer.1.crossattention.attention.value.bias": "model-00003-of-00003.safetensors",
|
| 946 |
+
"projector.qformer.encoder.layer.1.crossattention.attention.value.weight": "model-00003-of-00003.safetensors",
|
| 947 |
+
"projector.qformer.encoder.layer.1.crossattention.output.LayerNorm.bias": "model-00003-of-00003.safetensors",
|
| 948 |
+
"projector.qformer.encoder.layer.1.crossattention.output.LayerNorm.weight": "model-00003-of-00003.safetensors",
|
| 949 |
+
"projector.qformer.encoder.layer.1.crossattention.output.dense.bias": "model-00003-of-00003.safetensors",
|
| 950 |
+
"projector.qformer.encoder.layer.1.crossattention.output.dense.weight": "model-00003-of-00003.safetensors",
|
| 951 |
+
"projector.qformer.encoder.layer.1.intermediate_query.dense.bias": "model-00003-of-00003.safetensors",
|
| 952 |
+
"projector.qformer.encoder.layer.1.intermediate_query.dense.weight": "model-00003-of-00003.safetensors",
|
| 953 |
+
"projector.qformer.encoder.layer.1.output_query.LayerNorm.bias": "model-00003-of-00003.safetensors",
|
| 954 |
+
"projector.qformer.encoder.layer.1.output_query.LayerNorm.weight": "model-00003-of-00003.safetensors",
|
| 955 |
+
"projector.qformer.encoder.layer.1.output_query.dense.bias": "model-00003-of-00003.safetensors",
|
| 956 |
+
"projector.qformer.encoder.layer.1.output_query.dense.weight": "model-00003-of-00003.safetensors",
|
| 957 |
+
"projector.qformer.layernorm.bias": "model-00003-of-00003.safetensors",
|
| 958 |
+
"projector.qformer.layernorm.weight": "model-00003-of-00003.safetensors",
|
| 959 |
+
"projector.query": "model-00003-of-00003.safetensors"
|
| 960 |
+
}
|
| 961 |
+
}
|
model.sig
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"mediaType":"application/vnd.dev.sigstore.bundle.v0.3+json","verificationMaterial":{"certificate":{"rawBytes":"MIIC4zCCAmmgAwIBAgIUdyMDhcTRCJ5nxnx4+D7aSwGX+jQwCgYIKoZIzj0EAwMwNzEVMBMGA1UEChMMc2lnc3RvcmUuZGV2MR4wHAYDVQQDExVzaWdzdG9yZS1pbnRlcm1lZGlhdGUwHhcNMjYwNDI5MTQwNTExWhcNMjYwNDI5MTQxNTExWjAAMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEtL+ibg3TGKZXRrWDCPykjxiS7Tcl8unONnDBjhXlZf/QdJmXcVpzh98Zn33+1tnzfv4VRncInxtyjKlqP/n4nqOCAYgwggGEMA4GA1UdDwEB/wQEAwIHgDATBgNVHSUEDDAKBggrBgEFBQcDAzAdBgNVHQ4EFgQUVBwejKWid2blG7gECmQ8nXdvhSowHwYDVR0jBBgwFoAU39Ppz1YkEZb5qNjpKFWixi4YZD8wIgYDVR0RAQH/BBgwFoEUR3Jhbml0ZS1zaWduQGlibS5jb20wNAYKKwYBBAGDvzABAQQmaHR0cHM6Ly9zaWdzdG9yZS52ZXJpZnkuaWJtLmNvbS9vYXV0aDIwNgYKKwYBBAGDvzABCAQoDCZodHRwczovL3NpZ3N0b3JlLnZlcmlmeS5pYm0uY29tL29hdXRoMjCBigYKKwYBBAHWeQIEAgR8BHoAeAB2AN09MGrGxxEyYxkeHJlnNwKiSl643jyt/4eKcoAvKe6OAAABndmO2k0AAAQDAEcwRQIgJERD5l1/3gZseBUIqAzWalStyLN0dGJtShScgbqxB78CIQDbhzX9WB9gVKwXsUhxtDG7uHuEHMu50ta8Bhd0dj5MvzAKBggqhkjOPQQDAwNoADBlAjB7EmWvftmLv+O/yBDJ4AWC7UXOjazuKe9QHeYhxGNHNUnqIf04oI/8v7fqNr+VDUQCMQCmc3WtR+XE966CwOhSmzfQKML0FhTdU0cVzEZxXQtD4WZ3+IWumMCWHGoOiXfQ4j4="},"tlogEntries":[{"logIndex":"1401712462","logId":{"keyId":"wNI9atQGlz+VWfO6LRygH4QUfY/8W4RFwiT5i5WRgB0="},"kindVersion":{"kind":"dsse","version":"0.0.1"},"integratedTime":"1777471511","inclusionPromise":{"signedEntryTimestamp":"MEUCIQDn7d+bjYJK8X+lHBoROCGii71IERNFfon2YIjZAMlJ0gIgBT/pZSz7mMAECG+30teGdObU6Q3GWVcUNqsRSpv1AP0="},"inclusionProof":{"logIndex":"1279808200","rootHash":"pXcrxvq/zcwGUOjyr1yQzRBj9r83n612+BoBUftiJuM=","treeSize":"1279808207","hashes":["9g7ioKrk3Rp+DCpcUZtGIdygoj+Y6/U1fFtmo4JZpLk=","Me6EdSYwfjNL40IIgq73Obyiua/KLRS+nhoQ/Q4t/NU=","zMWriW3oGgRzAAb74dHqXSEf5JVHwCF9E7mJB7pXJRY=","UShKpOTD6XTAgwxT5Fg/O4i2oNBS8tZ38uLrkSP4/1M=","yy0h4WR2/BxXFEpe7BZRrOlOy/ks7JHGTrDWCPCoj9A=","j+3a2J2BVscXcgnoYo5NbtvVjEdPpAocY0KFcmtnS2U=","yt+wav3mKvzKs2yKc2VwNW6tRIpQ2hyFbR20GFREHzM=","XNEL1Y7Hey1LV0cTUrotQytYHNyqLVydBwYyeO4/3NA=","JdFHhy4beJOIn6UvDpQlK7zuJZRI1JQLnL4eTXzIDMc=","5RkPOw/UmluMtjuvzF/Gug2fNGcCK6n7DWqjdSgjos8=","d9hA39Ot2M7fkyE+rWh4D5tn70iuQ9bWZMetFQz1ePk=","wa5W79zKcyNncVVFXx8PM8785J+n0U0qxiK2GXKz2Hk=","7y22/OdvnNTJ3gzz57WEW6D/mmmrLXV0dVQyDwenx5A=","DOCeoSMovIvLExkhIvisow9AuNXgeWs4ECkyR6EcqYU="],"checkpoint":{"envelope":"rekor.sigstore.dev - 1193050959916656506\n1279808207\npXcrxvq/zcwGUOjyr1yQzRBj9r83n612+BoBUftiJuM=\n\n— rekor.sigstore.dev wNI9ajBFAiEA9slI/8MUBfXFwQOguZyk3ydIbXxvaGZNLhFJnc+UDosCIAzhMcoZ1yyiStPp2Nm8h1iQVvWw0NCLuwMOfLCZgcnx\n"}},"canonicalizedBody":"eyJhcGlWZXJzaW9uIjoiMC4wLjEiLCJraW5kIjoiZHNzZSIsInNwZWMiOnsiZW52ZWxvcGVIYXNoIjp7ImFsZ29yaXRobSI6InNoYTI1NiIsInZhbHVlIjoiZDBhMmJmZTI0NTVlMzE1ZWVkNWRhYWY3NWZhYzE4NjY2MmFlZmYxODBlOGM4MGIzNzBmOWUzZWMxM2E0MjliNCJ9LCJwYXlsb2FkSGFzaCI6eyJhbGdvcml0aG0iOiJzaGEyNTYiLCJ2YWx1ZSI6ImI0NjM3ZGJjNTk0M2NjMjA2NGU4ZTdhYWUwMmE5MTI4OTNmM2M3MDIzNTg1ZjY3M2Q5MDU0NTBkY2E1OTZlNTgifSwic2lnbmF0dXJlcyI6W3sic2lnbmF0dXJlIjoiTUVZQ0lRRHRBNTJZTklONmQ2c0RMdnZReS9vM3g4blJSMXE4SC8yd0E5bWJRcWFPdUFJaEFOQUZWU2tEcm01UktMeDNZU3VOcmVRdmwrSW43ckt2OHR0aDQ3bUgxU1o1IiwidmVyaWZpZXIiOiJMUzB0TFMxQ1JVZEpUaUJEUlZKVVNVWkpRMEZVUlMwdExTMHRDazFKU1VNMGVrTkRRVzF0WjBGM1NVSkJaMGxWWkhsTlJHaGpWRkpEU2pWdWVHNTROQ3RFTjJGVGQwZFlLMnBSZDBObldVbExiMXBKZW1vd1JVRjNUWGNLVG5wRlZrMUNUVWRCTVZWRlEyaE5UV015Ykc1ak0xSjJZMjFWZFZwSFZqSk5ValIzU0VGWlJGWlJVVVJGZUZaNllWZGtlbVJIT1hsYVV6RndZbTVTYkFwamJURnNXa2RzYUdSSFZYZElhR05PVFdwWmQwNUVTVFZOVkZGM1RsUkZlRmRvWTA1TmFsbDNUa1JKTlUxVVVYaE9WRVY0VjJwQlFVMUdhM2RGZDFsSUNrdHZXa2w2YWpCRFFWRlpTVXR2V2tsNmFqQkVRVkZqUkZGblFVVjBUQ3RwWW1jelZFZExXbGhTY2xkRVExQjVhMnA0YVZNM1ZHTnNPSFZ1VDA1dVJFSUthbWhZYkZwbUwxRmtTbTFZWTFad2VtZzVPRnB1TXpNck1YUnVlbVoyTkZaU2JtTkpibmgwZVdwTGJIRlFMMjQwYm5GUFEwRlpaM2RuWjBkRlRVRTBSd3BCTVZWa1JIZEZRaTkzVVVWQmQwbElaMFJCVkVKblRsWklVMVZGUkVSQlMwSm5aM0pDWjBWR1FsRmpSRUY2UVdSQ1owNVdTRkUwUlVablVWVldRbmRsQ21wTFYybGtNbUpzUnpkblJVTnRVVGh1V0dSMmFGTnZkMGgzV1VSV1VqQnFRa0puZDBadlFWVXpPVkJ3ZWpGWmEwVmFZalZ4VG1wd1MwWlhhWGhwTkZrS1drUTRkMGxuV1VSV1VqQlNRVkZJTDBKQ1ozZEdiMFZWVWpOS2FHSnRiREJhVXpGNllWZGtkVkZIYkdsaVV6VnFZakl3ZDA1QldVdExkMWxDUWtGSFJBcDJla0ZDUVZGUmJXRklVakJqU0UwMlRIazVlbUZYWkhwa1J6bDVXbE0xTWxwWVNuQmFibXQxWVZkS2RFeHRUblppVXpsMldWaFdNR0ZFU1hkT1oxbExDa3QzV1VKQ1FVZEVkbnBCUWtOQlVXOUVRMXB2WkVoU2QyTjZiM1pNTTA1d1dqTk9NR0l6U214TWJscHNZMjFzYldWVE5YQlpiVEIxV1RJNWRFd3lPV2dLWkZoU2IwMXFRMEpwWjFsTFMzZFpRa0pCU0ZkbFVVbEZRV2RTT0VKSWIwRmxRVUl5UVU0d09VMUhja2Q0ZUVWNVdYaHJaVWhLYkc1T2QwdHBVMncyTkFvemFubDBMelJsUzJOdlFYWkxaVFpQUVVGQlFtNWtiVTh5YXpCQlFVRlJSRUZGWTNkU1VVbG5Ta1ZTUkRWc01TOHpaMXB6WlVKVlNYRkJlbGRoYkZOMENubE1UakJrUjBwMFUyaFRZMmRpY1hoQ056aERTVkZFWW1oNldEbFhRamxuVmt0M1dITlZhSGgwUkVjM2RVaDFSVWhOZFRVd2RHRTRRbWhrTUdScU5VMEtkbnBCUzBKblozRm9hMnBQVUZGUlJFRjNUbTlCUkVKc1FXcENOMFZ0VjNabWRHMU1kaXRQTDNsQ1JFbzBRVmRETjFWWVQycGhlblZMWlRsUlNHVlphQXA0UjA1SVRsVnVjVWxtTURSdlNTODRkamRtY1U1eUsxWkVWVkZEVFZGRGJXTXpWM1JTSzFoRk9UWTJRM2RQYUZOdGVtWlJTMDFNTUVab1ZHUlZNR05XQ25wRlduaFlVWFJFTkZkYU15dEpWM1Z0VFVOWFNFZHZUMmxZWmxFMGFqUTlDaTB0TFMwdFJVNUVJRU5GVWxSSlJrbERRVlJGTFMwdExTMEsifV19fQ=="}],"timestampVerificationData":{"rfc3161Timestamps":[{"signedTimestamp":"MIIE6jADAgEAMIIE4QYJKoZIhvcNAQcCoIIE0jCCBM4CAQMxDTALBglghkgBZQMEAgEwgcIGCyqGSIb3DQEJEAEEoIGyBIGvMIGsAgEBBgkrBgEEAYO/MAIwMTANBglghkgBZQMEAgEFAAQgDIdYmqikLId7vUz4P+XXeWBEP8Gq1HPoyTTa3lhDwAECFCZeeuHUEB0uUpKG/PojEUruI0unGA8yMDI2MDQyOTE0MDUxMVowAwIBAQIJAOx9JzKbbVcfoDKkMDAuMRUwEwYDVQQKEwxzaWdzdG9yZS5kZXYxFTATBgNVBAMTDHNpZ3N0b3JlLXRzYaCCAhQwggIQMIIBlqADAgECAhQ6E1QvDJBh7rzBQy/Lio6LKiOLDDAKBggqhkjOPQQDAzA5MRUwEwYDVQQKEwxzaWdzdG9yZS5kZXYxIDAeBgNVBAMTF3NpZ3N0b3JlLXRzYS1zZWxmc2lnbmVkMB4XDTI1MDQwODA2NTk0M1oXDTM1MDQwNjA2NTk0M1owLjEVMBMGA1UEChMMc2lnc3RvcmUuZGV2MRUwEwYDVQQDEwxzaWdzdG9yZS10c2EwdjAQBgcqhkjOPQIBBgUrgQQAIgNiAATitrZnyEo2KDZP2QWMIBOgYbfSOTL5ZC/cHMv6Yq+HVIo1H9TC7Cx80KDiyvKhgB3wTqKyi9UDczhqg12b1AOLnRnydMTK+qB8M+1MjBci1+Jb8AV/VXu7CRuQCiPTHFyjajBoMA4GA1UdDwEB/wQEAwIHgDAdBgNVHQ4EFgQUif15Q4fP0GVGwwJGxyxzW3206wMwHwYDVR0jBBgwFoAUmOwB73+7Uf/UlR5vioiYUweJzr8wFgYDVR0lAQH/BAwwCgYIKwYBBQUHAwgwCgYIKoZIzj0EAwMDaAAwZQIwO2mxX/opo7SrIX9QyxfZpJRcpAV2gZOm1AZzR+2rVyy6Uc8Ybp2ybIw13ckH4bcRAjEA5qO8FyOkmYpvg2/7ZNqiPxRzn5vqKHoVcIIqtpKq6l7TvOqzAxxclN7VwTG8e++XMYIB2zCCAdcCAQEwUTA5MRUwEwYDVQQKEwxzaWdzdG9yZS5kZXYxIDAeBgNVBAMTF3NpZ3N0b3JlLXRzYS1zZWxmc2lnbmVkAhQ6E1QvDJBh7rzBQy/Lio6LKiOLDDALBglghkgBZQMEAgGggfwwGgYJKoZIhvcNAQkDMQ0GCyqGSIb3DQEJEAEEMBwGCSqGSIb3DQEJBTEPFw0yNjA0MjkxNDA1MTFaMC8GCSqGSIb3DQEJBDEiBCAYLU3UeOTovAYLP6snqgyVvFTtfWwYfY4PKgftTSBVXzCBjgYLKoZIhvcNAQkQAi8xfzB9MHsweQQghfknvAerYsrDtENWwQ78gbLGiD/aernm2HDZ0TrNBbcwVTA9pDswOTEVMBMGA1UEChMMc2lnc3RvcmUuZGV2MSAwHgYDVQQDExdzaWdzdG9yZS10c2Etc2VsZnNpZ25lZAIUOhNULwyQYe68wUMvy4qOiyojiwwwCgYIKoZIzj0EAwIEZzBlAjEAwtBzMR4y3Kq0V601T3cLrORS/nWhmC2BuswpqvudbkQr2UOKja+YGu973r9GGOnGAjBxeZFlirrEGdcs/ZgKaTUH2nXoSlQBCwD6MY/az8h99i14ULNNlr4nNnCqpM4LH0E="}]}},"dsseEnvelope":{"payload":"ewogICJfdHlwZSI6ICJodHRwczovL2luLXRvdG8uaW8vU3RhdGVtZW50L3YxIiwKICAic3ViamVjdCI6IFsKICAgIHsKICAgICAgIm5hbWUiOiAiZ3Jhbml0ZS1zcGVlY2gtNC4xLTJiLXBsdXMiLAogICAgICAiZGlnZXN0IjogewogICAgICAgICJzaGEyNTYiOiAiZWM0YTA5MDdlNjVkZGI3N2JjZjA0MGZhNDJmMzMyNmI3NzdhNGFiYjVmYzFmMDRmMDg0MGFhNjA2OTAxMjNlNyIKICAgICAgfQogICAgfQogIF0sCiAgInByZWRpY2F0ZVR5cGUiOiAiaHR0cHM6Ly9tb2RlbF9zaWduaW5nL3NpZ25hdHVyZS92MS4wIiwKICAicHJlZGljYXRlIjogewogICAgInNlcmlhbGl6YXRpb24iOiB7CiAgICAgICJoYXNoX3R5cGUiOiAic2hhMjU2IiwKICAgICAgImlnbm9yZV9wYXRocyI6IFsKICAgICAgICAiLmdpdGh1YiIsCiAgICAgICAgIi5naXRhdHRyaWJ1dGVzIiwKICAgICAgICAiLmdpdGlnbm9yZSIsCiAgICAgICAgIm1vZGVsLnNpZyIsCiAgICAgICAgIi5jYWNoZSIsCiAgICAgICAgIi5naXQiCiAgICAgIF0sCiAgICAgICJhbGxvd19zeW1saW5rcyI6IGZhbHNlLAogICAgICAibWV0aG9kIjogImZpbGVzIgogICAgfSwKICAgICJyZXNvdXJjZXMiOiBbCiAgICAgIHsKICAgICAgICAiZGlnZXN0IjogIjc5N2JkZmZhOTFlNDdlMDE5ZWI3ZGQ3M2MzODY1NDliN2ZlZWVjOTMwMGQzOWJlZTk3ZGVkNzE1MmVmYWYxOTAiLAogICAgICAgICJuYW1lIjogIlJFQURNRS5tZCIsCiAgICAgICAgImFsZ29yaXRobSI6ICJzaGEyNTYiCiAgICAgIH0sCiAgICAgIHsKICAgICAgICAiZGlnZXN0IjogImI3MmUyMTZmMDZiOTRhZmRjN2NjZTUwOWI3MThhNjMxNGFiYzQ3YzI4MWNiZDkwZDQwNWE1YzU4Nzg2ZGEzMmUiLAogICAgICAgICJuYW1lIjogImNoYXRfdGVtcGxhdGUuamluamEiLAogICAgICAgICJhbGdvcml0aG0iOiAic2hhMjU2IgogICAgICB9LAogICAgICB7CiAgICAgICAgImRpZ2VzdCI6ICJkM2ZjZmNjNzRkMjIyZjQ1MGQ5MzNjNTk4MjlhODE0ZmQ3ZjA3NjBmNDQzYmYwMjU2YTA2Y2FkODg5MTJjZGI1IiwKICAgICAgICAibmFtZSI6ICJjb25maWcuanNvbiIsCiAgICAgICAgImFsZ29yaXRobSI6ICJzaGEyNTYiCiAgICAgIH0sCiAgICAgIHsKICAgICAgICAiZGlnZXN0IjogImUyNjE0ODY2ZmEyNzY0M2U4YzI2NWFhMmM3NTc5NDAwNzY5YWFlNThmNDkyNTFkNGJlMzZiZTYzMGY5YTFhZDYiLAogICAgICAgICJuYW1lIjogImdlbmVyYXRpb25fY29uZmlnLmpzb24iLAogICAgICAgICJhbGdvcml0aG0iOiAic2hhMjU2IgogICAgICB9LAogICAgICB7CiAgICAgICAgImRpZ2VzdCI6ICJhZjQ1MTA1YmE5NTVlM2E3OTZmMzljM2NkZGM2ZmVhZTlmYjQ2OTZiNDZlOTlmMTgzNTVkZjlkN2M4YmRiMGJhIiwKICAgICAgICAibmFtZSI6ICJtb2RlbC0wMDAwMS1vZi0wMDAwMy5zYWZldGVuc29ycyIsCiAgICAgICAgImFsZ29yaXRobSI6ICJzaGEyNTYiCiAgICAgIH0sCiAgICAgIHsKICAgICAgICAiZGlnZXN0IjogIjE3MmJkZGNiMGI5ZmU0ZTU5YjQzMDJlZWNjNDc4YmJlNWZiNDc3NzU5YjgwYTUyZTQ3NmI0M2I1NWM5NDkzYTciLAogICAgICAgICJuYW1lIjogIm1vZGVsLTAwMDAyLW9mLTAwMDAzLnNhZmV0ZW5zb3JzIiwKICAgICAgICAiYWxnb3JpdGhtIjogInNoYTI1NiIKICAgICAgfSwKICAgICAgewogICAgICAgICJkaWdlc3QiOiAiMTI5OTQ3NzZmN2M5ZTI0Y2RhMzMzOWVlOGE2Y2E2YTA3NjAwZjVhZTRhNGMzOGQ2NjcwM2RjZWZiOGZmNDYyNCIsCiAgICAgICAgIm5hbWUiOiAibW9kZWwtMDAwMDMtb2YtMDAwMDMuc2FmZXRlbnNvcnMiLAogICAgICAgICJhbGdvcml0aG0iOiAic2hhMjU2IgogICAgICB9LAogICAgICB7CiAgICAgICAgImRpZ2VzdCI6ICI1ZGM3YTY1YjBmMDFjM2ZhMDY5OTI4Y2YwMTZhZDhjMmEwNTlkNWQ3YjY2ZjEyNDgwODNkNDIzMTQ1YzMwZDhkIiwKICAgICAgICAibmFtZSI6ICJtb2RlbC5zYWZldGVuc29ycy5pbmRleC5qc29uIiwKICAgICAgICAiYWxnb3JpdGhtIjogInNoYTI1NiIKICAgICAgfSwKICAgICAgewogICAgICAgICJkaWdlc3QiOiAiYTlkYThlZjRmMzAxYzZkMTg5YmI1YWJhN2VjZjFmYTY0ZWVlNGE3NTI4NTNhZTM1MzNkMThkZWY3NDE0OWEyNyIsCiAgICAgICAgIm5hbWUiOiAicHJvY2Vzc29yX2NvbmZpZy5qc29uIiwKICAgICAgICAiYWxnb3JpdGhtIjogInNoYTI1NiIKICAgICAgfSwKICAgICAgewogICAgICAgICJkaWdlc3QiOiAiMmZlYjg1OWJkNzEyM2YwN2YyZmJmYjZlMjFjNjc2OGY1M2E5NDVhMTUyNzQ1ZDA5M2UzZjUyZDkyMGVmNjczNSIsCiAgICAgICAgIm5hbWUiOiAidG9rZW5pemVyLmpzb24iLAogICAgICAgICJhbGdvcml0aG0iOiAic2hhMjU2IgogICAgICB9LAogICAgICB7CiAgICAgICAgImRpZ2VzdCI6ICJiYjNkYWNlMTdkNjI1NmMzM2I4M2Y0YWNlZjRkYjc3NzcxYzIwMjQ5N2M4NDJjMjI3NmU3OTQ0OTZlMTBiYTFmIiwKICAgICAgICAibmFtZSI6ICJ0b2tlbml6ZXJfY29uZmlnLmpzb24iLAogICAgICAgICJhbGdvcml0aG0iOiAic2hhMjU2IgogICAgICB9CiAgICBdCiAgfQp9","payloadType":"application/vnd.in-toto+json","signatures":[{"sig":"MEYCIQDtA52YNIN6d6sDLvvQy/o3x8nRR1q8H/2wA9mbQqaOuAIhANAFVSkDrm5RKLx3YSuNreQvl+In7rKv8tth47mH1SZ5"}]}}
|
processor_config.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"audio_processor": {
|
| 3 |
+
"feature_extractor_type": "GraniteSpeechFeatureExtractor",
|
| 4 |
+
"melspec_kwargs": {
|
| 5 |
+
"hop_length": 160,
|
| 6 |
+
"n_fft": 512,
|
| 7 |
+
"n_mels": 80,
|
| 8 |
+
"sample_rate": 16000,
|
| 9 |
+
"win_length": 400
|
| 10 |
+
},
|
| 11 |
+
"projector_downsample_rate": 5,
|
| 12 |
+
"projector_window_size": 15,
|
| 13 |
+
"sampling_rate": 16000
|
| 14 |
+
},
|
| 15 |
+
"audio_token": "<|audio|>",
|
| 16 |
+
"processor_class": "GraniteSpeechProcessor"
|
| 17 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"audio_token": "<|audio|>",
|
| 4 |
+
"backend": "tokenizers",
|
| 5 |
+
"bos_token": "<|end_of_text|>",
|
| 6 |
+
"clean_up_tokenization_spaces": false,
|
| 7 |
+
"eos_token": "<|end_of_text|>",
|
| 8 |
+
"errors": "replace",
|
| 9 |
+
"is_local": true,
|
| 10 |
+
"local_files_only": false,
|
| 11 |
+
"model_max_length": 1000000000000000019884624838656,
|
| 12 |
+
"model_specific_special_tokens": {
|
| 13 |
+
"audio_token": "<|audio|>"
|
| 14 |
+
},
|
| 15 |
+
"pad_token": "<|pad|>",
|
| 16 |
+
"padding_side": "left",
|
| 17 |
+
"processor_class": "GraniteSpeechProcessor",
|
| 18 |
+
"tokenizer_class": "GPT2Tokenizer",
|
| 19 |
+
"unk_token": "<|unk|>"
|
| 20 |
+
}
|