auralray commited on
Commit
acbef3a
·
verified ·
1 Parent(s): 7055085

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
.gitattributes CHANGED
@@ -33,3 +33,16 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ dataset/_train.xlsx filter=lfs diff=lfs merge=lfs -text
37
+ dataset/test_.xlsx filter=lfs diff=lfs merge=lfs -text
38
+ dataset/train.xlsx filter=lfs diff=lfs merge=lfs -text
39
+ gradcam/KKLFKKILKYL-temp.png filter=lfs diff=lfs merge=lfs -text
40
+ gradcam/KKLFKKiLKYL-diff.png filter=lfs diff=lfs merge=lfs -text
41
+ gradcam/KKLFKKiLKYL-muta.png filter=lfs diff=lfs merge=lfs -text
42
+ gradcam/KWKIKWPVKWFKML-temp.png filter=lfs diff=lfs merge=lfs -text
43
+ gradcam/KWKIKWPVKWfKML-diff.png filter=lfs diff=lfs merge=lfs -text
44
+ gradcam/KWKIKWPVKWfKML-muta.png filter=lfs diff=lfs merge=lfs -text
45
+ vis/tsne_highlight.png filter=lfs diff=lfs merge=lfs -text
46
+ vis/tsne_pointcloud.png filter=lfs diff=lfs merge=lfs -text
47
+ vis/umap_before.png filter=lfs diff=lfs merge=lfs -text
48
+ vis/umap_highlight.png filter=lfs diff=lfs merge=lfs -text
ImageMolEncoder.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85eebbe81192401d0b4337f89e0eea507092c396909ff83bd6b569fd89d49750
3
+ size 44782591
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI-based D-amino acid substitution for optimizing antimicrobial peptides to treat multidrug-resistant bacterial infection
2
+ This repository contains the code for the paper "AI-based D-amino acid substitution for optimizing antimicrobial peptides to treat multidrug-resistant bacterial infection"
3
+
4
+ ## Requirements
5
+ ```
6
+ mamba_ssm==2.2.4
7
+ numpy==1.26.3
8
+ pandas==2.1.4
9
+ rdkit==2024.3.5
10
+ scikit_learn==1.4.1.post1
11
+ scipy==1.13.0
12
+ torch==2.2.0
13
+ torchmetrics==1.3.1
14
+ torchvision==0.17.0
15
+ ```
16
+ You can install them with `pip install -r requirements.txt`
17
+
18
+ Additionally, `mamba_ssm` is optional since it is not used for our final method.
19
+ You can comment `mamba_ssm==2.2.4` in `requirements.txt` and `from mamba_ssm import Mamba` in `network.py` out if you don't want to install it and avoid use `--q-encoder mamba`.
20
+
21
+ ## Training
22
+ There are two .py file for training: `main.py` and `main_simple.py`.
23
+
24
+ `main.py`: Can train model with Classification and Regression tasks. Prefered with regression task.
25
+
26
+ `main_simple.py`: Can ONLY train model with Classification task. Prefered with classification task. `simple` means a simple dataset that direct loads pre-processed data.
27
+
28
+ example:
29
+ ```
30
+ python main-simple.py \
31
+ --q-encoder cnn \ # Encoder, can be cnn, lstm, gru, mamba, mha
32
+ --channels 16 \ # Encoder channels
33
+ --side-enc lstm \ # Side sequence Encoder, only lstm implemented, only use with cnn encoder
34
+ --fusion att \ # Fusion method, can be att, mlp or diff
35
+ --task cls \ # Task, can be cls or reg
36
+ --loss ce \ # Loss, can be ce or mse, some other losses can be found in code
37
+ --batch-size 32 \ # Batch size
38
+ --epochs 35 \ # Epochs
39
+ --gpu 0 \ # GPU index to use, -1 for cpu
40
+ # ===CNN only options=== \
41
+ --pcs \ # Enable protease cleavage site dyeing for input pictures
42
+ --resize 768 \ # Resize input pictures, can be 1 or 2 numbers like 768 or 768 512
43
+ # ===main_simple.py only options=== \
44
+ --llm-data # Use LLM augmented training data
45
+ ```
46
+ Corresponding model weight checkpoints will be saved in the subdirectory of `run-cls` or `run-reg`, e.g. `/run-cls/cnn-att-16-lstm-pcs-simple-llm-768-oneway-ce-32-0.001-35/`
47
+
48
+ For more arguments, please refer to the code of `main.py` or `main_simple.py`
49
+
50
+ ## Inference
51
+ You can simple replace `main.py` with `infer.py` in your training command to do inference. Remember to add `--simple` if you used checkpoints trained from `main_simple.py`
52
+
53
+ For case study scanning, please use `infer_case.py` with an additional argument `--case r2` or `--case YOUR_PEPTIDE_SEQUENCE`
54
+
55
+ Inference results will be saved in the weights directory in `csv` format, e.g. `/run-cls/cnn-att-16-lstm-pcs-simple-llm-768-oneway-ce-32-0.001-35/preds_test.csv`
__pycache__/dataset.cpython-311.pyc ADDED
Binary file (42.8 kB). View file
 
__pycache__/infer_case.cpython-311.pyc ADDED
Binary file (16.9 kB). View file
 
__pycache__/loss.cpython-311.pyc ADDED
Binary file (11.4 kB). View file
 
__pycache__/network.cpython-311.pyc ADDED
Binary file (31.8 kB). View file
 
__pycache__/train.cpython-311.pyc ADDED
Binary file (12.9 kB). View file
 
__pycache__/utils.cpython-311.pyc ADDED
Binary file (18.1 kB). View file
 
dataset.py ADDED
@@ -0,0 +1,865 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import numpy as np
3
+ import itertools
4
+ import torch
5
+ from torch.utils.data import Dataset
6
+ import re
7
+ import json
8
+ from typing import Literal
9
+ import os
10
+ # import io
11
+ from rdkit import Chem
12
+ from rdkit.Chem import AllChem
13
+ from rdkit.Chem.Draw import rdMolDraw2D
14
+ # from PIL import Image
15
+ import torchvision.io as tvio
16
+ # import torchvision.transforms as tvt
17
+ import torchvision.transforms.v2.functional as tvtF
18
+
19
+ # --- 辅助函数 ---
20
+
21
+ # 定义20种常见氨基酸字母(按字母顺序)
22
+ AMINO_ACIDS = ['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L',
23
+ 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y']
24
+ AA_to_index = {aa: i for i, aa in enumerate(AMINO_ACIDS)}
25
+ valid_aa = set(AMINO_ACIDS)
26
+
27
+ def is_valid_sequence(seq):
28
+ """
29
+ 判断序列是否只包含标准氨基酸字符(允许大写或小写,
30
+ 对于小写表示 D 型氨基酸也视为合法)
31
+ """
32
+ for ch in seq:
33
+ if not ch.isalpha():
34
+ return False
35
+ if ch.upper() not in valid_aa:
36
+ return False
37
+ return True
38
+
39
+ def parse_mic(mic_str):
40
+ """
41
+ 解析 MIC 数据,支持以下几种格式:
42
+ 1. 数字,例如 "5" -> 5.0
43
+ 2. ">{数字}" 或 "≥{数字}"(例如 ">4" 或 "≥ 4")→ 数值乘以 1.5
44
+ 3. 平均值±标准差,例如 "3.2 ± 0.4" → 取平均值 3.2
45
+ 4. 范围形式,例如 "2.0 - 4.0" → (2.0 + 4.0)/2
46
+
47
+ 注:符号与数字之间可能存在空格,大于等于符号为 "≥" 而非 ">="
48
+ """
49
+ if not isinstance(mic_str, str):
50
+ return float(mic_str)
51
+
52
+ mic_str = mic_str.strip()
53
+ mic_str = re.sub(r'\s+', '', mic_str)
54
+
55
+ # 匹配纯数字
56
+ if re.fullmatch(r'\d+(\.\d+)?', mic_str):
57
+ return float(mic_str)
58
+
59
+ # 匹配 >{数字} 或 ≥{数字}
60
+ m = re.fullmatch(r'[>≥](\d+(\.\d+)?)', mic_str)
61
+ if m:
62
+ num = float(m.group(1))
63
+ return num * 1.5
64
+
65
+ # 匹配 <{数字} 或 ≤{数字}
66
+ m = re.fullmatch(r'[<≤](\d+(\.\d+)?)', mic_str)
67
+ if m:
68
+ num = float(m.group(1))
69
+ return num * 0.75
70
+
71
+ # 匹配 {数字}±{数字}
72
+ m = re.fullmatch(r'(\d+(\.\d+)?)[±](\d+(\.\d+)?)', mic_str)
73
+ if m:
74
+ return float(m.group(1))
75
+
76
+ # 匹配 {数字}-{数字}
77
+ m = re.fullmatch(r'(\d+(\.\d+)?)-(\d+(\.\d+)?)', mic_str)
78
+ if m:
79
+ num1 = float(m.group(1))
80
+ num2 = float(m.group(3))
81
+ return (num1 + num2) / 2.0
82
+
83
+ print(f"Warning: 无法解析 MIC 值 {mic_str}")
84
+ return np.nan
85
+
86
+ def encode_sequence(seq, pad_length):
87
+ """
88
+ 将多肽序列转换为固定大小 (pad_length, 21) 的张量:
89
+ - 每个残基对应一行;
90
+ - 第1列: 表示是否为 D 型氨基酸(若字符为小写,则置 1,否则为 0);
91
+ - 后20列: 20种常见氨基酸的独热编码(先转为大写匹配)。
92
+ 若序列长度小于 pad_length,则在末尾填充全 0 行。
93
+ """
94
+ n = len(seq)
95
+ arr = np.zeros((pad_length, 21), dtype=np.float32)
96
+
97
+ # 对实际序列部分进行编码
98
+ for i, char in enumerate(seq):
99
+ if i >= pad_length:
100
+ break # 超出部分不处理(数据集构造时已过滤掉长序列)
101
+ if char.islower():
102
+ d_indicator = 1.0
103
+ aa = char.upper()
104
+ else:
105
+ d_indicator = 0.0
106
+ aa = char
107
+ arr[i, 0] = d_indicator
108
+ if aa in AA_to_index:
109
+ idx = AA_to_index[aa]
110
+ arr[i, idx + 1] = 1.0
111
+ else:
112
+ print(f"Warning: 氨基酸 {aa} 不在标准列表中")
113
+ return torch.tensor(arr)
114
+
115
+ def geometric_mean(values):
116
+ """
117
+ 计算数值序列的几何平均值
118
+ """
119
+ log_vals = np.log(np.array(values))
120
+ return float(np.exp(log_vals.mean()))
121
+
122
+ def process_label(ratio, task):
123
+ """
124
+ 对比值 ratio 进行 log2 变换,并根据 task 参数返回最终标签:
125
+ - task="reg": 返回 log₂比值,并转换为 np.float32;
126
+ - task="cls": 根据 log₂比值进行分类:
127
+ 如果 x <= -0.5 返回 1,
128
+ 否则返回 0.
129
+ 若 ratio 非正,返回 np.nan。
130
+ """
131
+ if ratio <= 0:
132
+ return np.nan
133
+ ratio_log = np.log2(ratio)
134
+ if task == "reg":
135
+ return np.float32(ratio_log)
136
+ elif task == "cls":
137
+ if ratio_log < 0.:
138
+ return 1
139
+ else:
140
+ return 0
141
+ else:
142
+ raise ValueError("未知的 task 类型,请使用 'reg' 或 'cls'")
143
+
144
+ # --- 数据预处理与构建数据集 ---
145
+
146
+ def load_data(xlsx_file, condition=None):
147
+ """
148
+ 从 xlsx 文件中读取数据,将每个具体变种(同一原型-变种)对应的 MIC 值取几何平均,
149
+ 并按照原型分组。对于原型和变种序列,若存在非标准氨基酸或非字母字符,则过滤掉该行数据。
150
+
151
+ 返回:
152
+ groups: dict,其中 key 为原型序列,
153
+ value 为 dict,其 key 为变种序列("SEQUENCE - D-type amino acid substitution"),
154
+ value 为该变种所有 MIC 值的几何平均
155
+ """
156
+ df = pd.read_excel(xlsx_file)
157
+ # df = df[df['TARGET ACTIVITY - ACTIVITY MEASURE VALUE'] != 'MBC']
158
+
159
+ groups = {}
160
+ for _, row in df.iterrows():
161
+ orig = row["SEQUENCE - Original"]
162
+ variant = row["SEQUENCE - D-type amino acid substitution"]
163
+ mic_raw = row["TARGET ACTIVITY - CONCENTRATION"]
164
+
165
+ # 过滤包含非标准氨基酸或非字母字符的序列(原型和变种均检查)
166
+ if not (isinstance(orig, str) and is_valid_sequence(orig)):
167
+ continue
168
+ if not (isinstance(variant, str) and is_valid_sequence(variant)):
169
+ continue
170
+
171
+ mic_val = parse_mic(mic_raw)
172
+
173
+ if orig not in groups:
174
+ groups[orig] = {}
175
+ if variant not in groups[orig]:
176
+ groups[orig][variant] = []
177
+ groups[orig][variant].append(mic_val)
178
+
179
+ # 对每个变种计算几何平均(过滤掉 NaN 值)
180
+ groups_avg = {}
181
+ for orig, var_dict in groups.items():
182
+ groups_avg[orig] = {}
183
+ for variant, mic_list in var_dict.items():
184
+ mic_list = [x for x in mic_list if not np.isnan(x)]
185
+ if len(mic_list) == 0:
186
+ continue
187
+ groups_avg[orig][variant] = geometric_mean(mic_list)
188
+ return groups_avg
189
+
190
+
191
+ def load_data_stability(xlsx_file, condition):
192
+ """
193
+ 从 xlsx 文件中读取数据,将每个具体变种(同一原型-变种)对应的 MIC 值取几何平均,
194
+ 并按照原型分组。对于原型和变种序列,若存在非标准氨基酸或非字母字符,则过滤掉该行数据。
195
+
196
+ 返回:
197
+ groups: dict,其中 key 为原型序列,
198
+ value 为 dict,其 key 为变种序列("SEQUENCE - D-type amino acid substitution"),
199
+ value 为该变种所有 MIC 值的几何平均
200
+ """
201
+ map_dict = {
202
+ '125fbs': '12.5% FBS',
203
+ '25fbs': '25% FBS',
204
+ 'mhb': 'MHB',
205
+ 'nacl': '150mM NaCl'
206
+ }
207
+ df = pd.read_excel(xlsx_file)
208
+ df = df[df['Condition'] == map_dict[condition]]
209
+
210
+ groups = {}
211
+ for _, row in df.iterrows():
212
+ variant = row["SEQUENCE"]
213
+ orig = variant.upper()
214
+ mic_raw = row["Activity"]
215
+
216
+ # 过滤包含非标准氨基酸或非字母字符的序列(原型和变种均检查)
217
+ if not (isinstance(orig, str) and is_valid_sequence(orig)):
218
+ continue
219
+ if not (isinstance(variant, str) and is_valid_sequence(variant)):
220
+ continue
221
+
222
+ mic_val = parse_mic(mic_raw)
223
+
224
+ if orig not in groups:
225
+ groups[orig] = {}
226
+ if variant not in groups[orig]:
227
+ groups[orig][variant] = []
228
+ groups[orig][variant].append(mic_val)
229
+
230
+ # 对每个变种计算几何平均(过滤掉 NaN 值)
231
+ groups_avg = {}
232
+ for orig, var_dict in groups.items():
233
+ groups_avg[orig] = {}
234
+ for variant, mic_list in var_dict.items():
235
+ mic_list = [x for x in mic_list if not np.isnan(x)]
236
+ if len(mic_list) == 0:
237
+ continue
238
+ groups_avg[orig][variant] = geometric_mean(mic_list)
239
+ return groups_avg
240
+
241
+ class PeptidePairDataset(Dataset):
242
+ def __init__(self, mode=Literal['train', 'test', '125fbs', 'nacl', '25fbs', 'mhb'], pad_length=30, task="cls",
243
+ include_reverse=False, include_self=False, one_way=False, gf=False) :
244
+ """
245
+ 构建数据集:
246
+ - 从 xlsx 文件中读取数据,并按照原型分组,
247
+ 同时过滤包含非标准氨基酸或非字母字符的行,以及变种序列长度超过 pad_length 的样本;
248
+ - 对于同一原型下不同变种构成配对;
249
+ - 参数 include_reverse: 是否启用正反组合(同时添加 (A, B) 和 (B, A));
250
+ - 参数 include_self: 是否启用自组合(添加 (A, A),标签为 log₂(1)=0);
251
+ - 参数 task: "reg" 表示回归任务(输出 32 位浮点数标签),"cls" 表示分类任务,
252
+ 将 log₂比值转为整数标签,规则为:
253
+ log₂比值 ≤ -0.5 → 1,
254
+ log₂比值 ≥ 0.5 → 2,
255
+ -0.5 < log₂比值 < 0.5 → 0.
256
+
257
+ 每个数据项返回:
258
+ - 变种多肽序列编码后的张量,形状为 (pad_length, 21)
259
+ - 另一个变种多肽序列编码后的张量,形状为 (pad_length, 21)
260
+ - 标签:根据 task 不同分别为 32 位浮点数或整数
261
+ """
262
+ if mode == "train":
263
+ loader = load_data
264
+ xlsx_file = os.path.join(os.path.dirname(__file__), 'dataset', 'train.xlsx')
265
+ elif mode in ["test", "r2_case", 'r2_case_', "125fbs", "nacl", "25fbs", "mhb"]:
266
+ one_way = True
267
+ if mode in ["test", "r2_case", 'r2_case_']:
268
+ loader = load_data
269
+ xlsx_file = os.path.join(os.path.dirname(__file__), 'dataset', f'{mode}.xlsx')
270
+ else:
271
+ loader = load_data_stability
272
+ xlsx_file = os.path.join(os.path.dirname(__file__), 'dataset', 'stability.xlsx')
273
+ else:
274
+ raise ValueError("未知的 mode,请使用 'train' 或 'test'")
275
+
276
+ self.data = []
277
+ self.pad_length = pad_length
278
+ self.task = task
279
+ groups_avg = loader(xlsx_file, mode)
280
+ if gf:
281
+ gf_dict = torch.load(os.path.join(os.path.dirname(__file__), 'dataset', 'protbert.pth'))
282
+
283
+ # 针对每个原型,过滤掉长度超过 pad_length 的变种
284
+ for orig, variant_dict in groups_avg.items():
285
+ # a = len(self.data)
286
+ filtered_variants = {variant: mic for variant, mic in variant_dict.items()
287
+ if len(variant) <= pad_length}
288
+ variants = list(filtered_variants.keys())
289
+ n_variants = len(variants)
290
+ if n_variants == 0:
291
+ continue
292
+
293
+ if gf:
294
+ glob_feat = gf_dict[orig.upper()]
295
+
296
+ # 若启用自组合,则添加 (A, A) 样本,标签为 process_label(1, task) → log2(1)=0(再分类也为 0)
297
+ if include_self and (not one_way):
298
+ for variant in variants:
299
+ encoded_seq = encode_sequence(variant, pad_length)
300
+ label = process_label(1.0, task) # log2(1)=0
301
+ if gf:
302
+ self.data.append(((encoded_seq, encoded_seq, glob_feat), label))
303
+ else:
304
+ self.data.append(((encoded_seq, encoded_seq), label))
305
+
306
+ # 添加不同变种之间的样本
307
+ for i in [0] if one_way else range(n_variants):
308
+ for j in range(i + 1, n_variants):
309
+ var1 = variants[i]
310
+ var2 = variants[j]
311
+ mic1 = filtered_variants[var1]
312
+ mic2 = filtered_variants[var2]
313
+
314
+ # 正向组合: (var1, var2) 标签为 log₂(mic2/mic1)
315
+ ratio = mic2 / mic1 if mic1 != 0 else np.nan
316
+ label = process_label(ratio, task)
317
+ if np.isnan(label):
318
+ continue
319
+ encoded_var1 = encode_sequence(var1, pad_length)
320
+ encoded_var2 = encode_sequence(var2, pad_length)
321
+ if gf:
322
+ self.data.append(((encoded_var1, encoded_var2, glob_feat), label))
323
+ else:
324
+ self.data.append(((encoded_var1, encoded_var2), label))
325
+
326
+ # 若启用正反组合,则添加 (var2, var1)
327
+ if include_reverse and (not one_way):
328
+ rev_ratio = mic1 / mic2 if mic2 != 0 else np.nan
329
+ rev_label = process_label(rev_ratio, task)
330
+ if gf:
331
+ self.data.append(((encoded_var2, encoded_var1, glob_feat), rev_label))
332
+ else:
333
+ self.data.append(((encoded_var2, encoded_var1), rev_label))
334
+ # b = len(self.data)
335
+ # print(f"{orig},{b - a}")
336
+
337
+ def reg_sample_weight(self):
338
+ y = []
339
+ for _, label in self.data:
340
+ y.append(label)
341
+ y = np.array(y)
342
+ mu = np.mean(y)
343
+ sigma = np.std(y)
344
+ p = 1 / (sigma * np.sqrt(2 * np.pi)) * np.exp(-((y - mu) ** 2) / (2 * sigma ** 2))
345
+
346
+ # 如果未提供 C,则使用 p 的中位数作为基准常数
347
+ C = np.median(p)
348
+ epsilon = 1e-6
349
+
350
+ # 使用对数转化计算采样权重: p 值越低权重越高
351
+ weights = np.log(C / (p + epsilon))
352
+
353
+ # 可选:对权重进行归一化处理,使得权重均值为1
354
+ weights_normalized = weights / np.mean(weights)
355
+ positive_weights = np.exp(weights_normalized)
356
+
357
+ return torch.tensor(positive_weights, dtype=torch.float32)
358
+
359
+ def __len__(self):
360
+ return len(self.data)
361
+
362
+ def __getitem__(self, idx):
363
+ return self.data[idx]
364
+
365
+
366
+ class PeptidePairPicDataset(Dataset):
367
+ def __init__(self, mode=Literal['train', 'test', '125fbs', 'nacl', '25fbs', 'mhb'], pad_length=30, task="reg",
368
+ include_reverse=False, include_self=False, one_way=False, gf=False,
369
+ side_enc=None, pcs=False, resize=None) :
370
+ """
371
+ 构建数据集:
372
+ - 从 xlsx 文件中读取数据,并按照原型分组,
373
+ 同时过滤包含非标准氨基酸或非字母字符的行,以及变种序列长度超过 pad_length 的样本;
374
+ - 对于同一原型下不同变种构成配对;
375
+ - 参数 include_reverse: 是否启用正反组合(同时添加 (A, B) 和 (B, A));
376
+ - 参数 include_self: 是否启用自组合(添加 (A, A),标签为 log₂(1)=0);
377
+ - 参数 task: "reg" 表示回归任务(输出 32 位浮点数标签),"cls" 表示分类任务,
378
+ 将 log₂比值转为整数标签,规则为:
379
+ log₂比值 ≤ -0.5 → 1,
380
+ log₂比值 ≥ 0.5 → 2,
381
+ -0.5 < log₂比值 < 0.5 → 0.
382
+
383
+ 每个数据项返回:
384
+ - 变种多肽序列编码后的张量,形状为 (pad_length, 21)
385
+ - 另一个变种多肽序列编码后的张量,形状为 (pad_length, 21)
386
+ - 标签:根据 task 不同分别为 32 位浮点数或整数
387
+ """
388
+ if mode == "train":
389
+ loader = load_data
390
+ xlsx_file = os.path.join(os.path.dirname(__file__), 'dataset', 'train.xlsx')
391
+ elif mode in ["test", "r2_case", 'r2_case_', "125fbs", "nacl", "25fbs", "mhb"]:
392
+ one_way = True
393
+ if mode in ["test", "r2_case", 'r2_case_']:
394
+ loader = load_data
395
+ xlsx_file = os.path.join(os.path.dirname(__file__), 'dataset', f'{mode}.xlsx')
396
+ else:
397
+ loader = load_data_stability
398
+ xlsx_file = os.path.join(os.path.dirname(__file__), 'dataset', 'stability.xlsx')
399
+ else:
400
+ raise ValueError("未知的 mode,请使用 'train' 或 'test'")
401
+
402
+ self.data = []
403
+ self.pics = {}
404
+ self.pad_length = pad_length
405
+ self.task = task
406
+ self.gf = gf
407
+ self.side_enc = True if side_enc else False
408
+ self.pcs = pcs
409
+ self.resize = resize
410
+ groups_avg = loader(xlsx_file, mode)
411
+ if gf:
412
+ gf_dict = torch.load(os.path.join(os.path.dirname(__file__), 'dataset', 'protbert.pth'))
413
+
414
+ # 针对每个原型,过滤掉长度超过 pad_length 的变种
415
+ for orig, variant_dict in groups_avg.items():
416
+ # a = len(self.data)
417
+ filtered_variants = {variant: mic for variant, mic in variant_dict.items()
418
+ if len(variant) <= pad_length}
419
+ variants = list(filtered_variants.keys())
420
+ for variant in variants:
421
+ if self.pcs == 'mix' and variant == orig:
422
+ self.pics[variant] = self.read_img(variant, False)
423
+ else:
424
+ self.pics[variant] = self.read_img(variant, self.pcs)
425
+ n_variants = len(variants)
426
+ if n_variants == 0:
427
+ continue
428
+
429
+ if gf:
430
+ glob_feat = gf_dict[orig.upper()]
431
+
432
+ # 若启用自组合,则添加 (A, A) 样本,标签为 process_label(1, task) → log2(1)=0(再分类也为 0)
433
+ if include_self and (not one_way):
434
+ for variant in variants:
435
+ label = process_label(1.0, task) # log2(1)=0
436
+ if gf:
437
+ self.data.append((variant, variant, glob_feat, label))
438
+ else:
439
+ self.data.append((variant, variant, label))
440
+
441
+ # 添加不同变种之间的样本
442
+ for i in [0] if one_way else range(n_variants):
443
+ for j in range(i + 1, n_variants):
444
+ var1 = variants[i]
445
+ var2 = variants[j]
446
+ mic1 = filtered_variants[var1]
447
+ mic2 = filtered_variants[var2]
448
+
449
+ # 正向组合: (var1, var2) 标签为 log₂(mic2/mic1)
450
+ ratio = mic2 / mic1 if mic1 != 0 else np.nan
451
+ label = process_label(ratio, task)
452
+ if np.isnan(label):
453
+ continue
454
+ if gf:
455
+ self.data.append((var1, var2, glob_feat, label))
456
+ else:
457
+ self.data.append((var1, var2, label))
458
+
459
+ # 若启用正反组合,则添加 (var2, var1)
460
+ if include_reverse and (not one_way):
461
+ rev_ratio = mic1 / mic2 if mic2 != 0 else np.nan
462
+ rev_label = process_label(rev_ratio, task)
463
+ if gf:
464
+ self.data.append((var2, var1, glob_feat, rev_label))
465
+ else:
466
+ self.data.append((var2, var1, rev_label))
467
+ # b = len(self.data)
468
+ # print(f"{orig},{b - a}")
469
+
470
+ def reg_sample_weight(self):
471
+ y = []
472
+ for d in self.data:
473
+ label = d[-1]
474
+ y.append(label)
475
+ y = np.array(y)
476
+ mu = np.mean(y)
477
+ sigma = np.std(y)
478
+ p = 1 / (sigma * np.sqrt(2 * np.pi)) * np.exp(-((y - mu) ** 2) / (2 * sigma ** 2))
479
+
480
+ # 如果未提供 C,则使用 p 的中位数作为基准常数
481
+ C = np.median(p)
482
+ epsilon = 1e-6
483
+
484
+ # 使用对数转化计算���样权重: p 值越低权重越高
485
+ weights = np.log(C / (p + epsilon))
486
+
487
+ # 可选:对权重进行归一化处理,使得权重均值为1
488
+ weights_normalized = weights / np.mean(weights)
489
+ positive_weights = np.exp(weights_normalized)
490
+
491
+ return torch.tensor(positive_weights, dtype=torch.float32)
492
+
493
+ def read_img(self, peptide, pcs):
494
+ image = draw_peptide(peptide, self.resize, pcs)
495
+ return image
496
+
497
+ def __len__(self):
498
+ return len(self.data)
499
+
500
+ def __getitem__(self, idx):
501
+ if self.gf:
502
+ seq1, seq2, glob_feat, label = self.data[idx]
503
+ else:
504
+ seq1, seq2, label = self.data[idx]
505
+ img1 = self.pics[seq1]
506
+ img2 = self.pics[seq2]
507
+
508
+ if self.side_enc:
509
+ img1 = (img1, encode_sequence(seq1, self.pad_length))
510
+ img2 = (img2, encode_sequence(seq2, self.pad_length))
511
+
512
+ if self.gf:
513
+ return (img1, img2, glob_feat), label
514
+ else:
515
+ return (img1, img2), label
516
+
517
+
518
+ class SimplePairClsDataset(Dataset):
519
+ def __init__(self, pad_length=30, llm=False, ftr2=False, gf=False,
520
+ q_encoder=None, side_enc=None, pcs=False, resize=None):
521
+ if llm:
522
+ file_path = os.path.join(os.path.dirname(__file__), 'dataset', 'train_set_llm_aug.json')
523
+ elif ftr2:
524
+ file_path = os.path.join(os.path.dirname(__file__), 'dataset', 'finetune_for_r2_llm.json')
525
+ else:
526
+ file_path = os.path.join(os.path.dirname(__file__), 'dataset', 'train_set.json')
527
+ with open(file_path, 'r', encoding='utf-8') as f:
528
+ dataset = json.load(f)
529
+
530
+ self.data = []
531
+ self.pics = {}
532
+ self.pad_length = pad_length
533
+ self.gf = gf
534
+ self.q_encoder = q_encoder
535
+ self.side_enc = True if side_enc else False
536
+ self.pcs = pcs
537
+ self.resize = resize
538
+ if gf:
539
+ self.gf_dict = torch.load(os.path.join(os.path.dirname(__file__), 'dataset', 'protbert.pth'))
540
+
541
+ all_seqs = []
542
+ for orig, variants in dataset.items():
543
+ if len(orig) > pad_length:
544
+ continue
545
+ all_seqs.append(orig)
546
+ for label in ["1", "0"]:
547
+ for variant in variants[label]:
548
+ self.data.append((orig, variant, int(label)))
549
+ all_seqs.append(variant)
550
+ if q_encoder in ['cnn', 'rn18']:
551
+ for i in all_seqs:
552
+ if self.pcs == 'mix' and i.isupper():
553
+ self.pics[i] = self.read_img(i, False)
554
+ else:
555
+ self.pics[i] = self.read_img(i, self.pcs)
556
+
557
+ def read_img(self, peptide, pcs):
558
+ image = draw_peptide(peptide, self.resize, pcs)
559
+ return image
560
+
561
+ def __len__(self):
562
+ return len(self.data)
563
+
564
+ def __getitem__(self, idx):
565
+ seq1, seq2, label = self.data[idx]
566
+ if self.q_encoder in ['cnn', 'rn18']:
567
+ img1 = self.pics[seq1]
568
+ img2 = self.pics[seq2]
569
+
570
+ if self.side_enc:
571
+ img1 = (img1, encode_sequence(seq1, self.pad_length))
572
+ img2 = (img2, encode_sequence(seq2, self.pad_length))
573
+
574
+ else:
575
+ img1 = encode_sequence(seq1, self.pad_length)
576
+ img2 = encode_sequence(seq2, self.pad_length)
577
+
578
+ if self.gf:
579
+ return (img1, img2, self.gf_dict[seq1]), label
580
+ else:
581
+ return (img1, img2), label
582
+
583
+
584
+ class PeptidePairCaseDataset(Dataset):
585
+ def __init__(self, case:str ='r2', pad_length=30, gf=False):
586
+
587
+ if case == 'r2':
588
+ self.template = 'KWKIKWPVKWFKML'
589
+ elif case == 'Indolicidin':
590
+ self.template = 'ILPWKWPWWPWRR'
591
+ elif case == 'Temporin-A':
592
+ self.template = 'FLPLIGRVLSGIL'
593
+ elif case == 'Melittin':
594
+ self.template = 'GIGAVLKVLTTGLPALISWIKRKRQQ'
595
+ elif case == 'Anoplin':
596
+ self.template = 'GLLKRIKTLL'
597
+ else:
598
+ self.template = case.upper().strip()
599
+ self.data = []
600
+ self.pad_length = pad_length
601
+ self.gf = gf
602
+
603
+ if gf:
604
+ self.glob_feat = torch.load(os.path.join(os.path.dirname(__file__), 'dataset', 'protbert.pth'))[self.template]
605
+
606
+ pools = [(ch.upper(), ch.lower()) if ch != 'G' else (ch.upper(),) for ch in self.template]
607
+ # 笛卡尔积,即所有组合
608
+ self.variants = [''.join(chars) for chars in itertools.product(*pools)][1:]
609
+
610
+ self.template_seq = encode_sequence(self.template, self.pad_length)
611
+
612
+ def __len__(self):
613
+ return len(self.variants)
614
+
615
+ def __getitem__(self, idx):
616
+ variant = self.variants[idx]
617
+ seq2, label = variant, variant
618
+ enc_seq1 = self.template_seq
619
+ enc_seq2 = encode_sequence(seq2, self.pad_length)
620
+
621
+ if self.gf:
622
+ return (enc_seq1, enc_seq2, self.glob_feat), label
623
+ else:
624
+ return (enc_seq1, enc_seq2), label
625
+
626
+
627
+
628
+ class PeptidePairPicCaseDataset(Dataset):
629
+ def __init__(self, case:str ='r2', pad_length=30, side_enc=None, pcs=False, resize=None, gf=False):
630
+
631
+ if case == 'r2':
632
+ self.template = 'KWKIKWPVKWFKML'
633
+ elif case == 'Indolicidin':
634
+ self.template = 'ILPWKWPWWPWRR'
635
+ elif case == 'Temporin-A':
636
+ self.template = 'FLPLIGRVLSGIL'
637
+ elif case == 'Melittin':
638
+ self.template = 'GIGAVLKVLTTGLPALISWIKRKRQQ'
639
+ elif case == 'Anoplin':
640
+ self.template = 'GLLKRIKTLL'
641
+ else:
642
+ self.template = case.upper().strip()
643
+ self.data = []
644
+ self.pad_length = pad_length
645
+ self.side_enc = True if side_enc else False
646
+ self.pcs = pcs
647
+ self.resize = resize
648
+ self.gf = gf
649
+
650
+ if gf:
651
+ self.glob_feat = torch.load(os.path.join(os.path.dirname(__file__), 'dataset', 'protbert.pth'))[self.template]
652
+
653
+ pools = [(ch.upper(), ch.lower()) if ch != 'G' else (ch.upper(),) for ch in self.template]
654
+ # 笛卡尔积,即所有组合
655
+ self.variants = [''.join(chars) for chars in itertools.product(*pools)][1:]
656
+
657
+ self.template_pic = self.read_img(self.template)
658
+ if self.side_enc:
659
+ self.template_seq = encode_sequence(self.template, self.pad_length)
660
+
661
+ def read_img(self, peptide):
662
+ image = draw_peptide(peptide, self.resize, self.pcs)
663
+ return image
664
+
665
+ def __len__(self):
666
+ return len(self.variants)
667
+
668
+ def __getitem__(self, idx):
669
+ variant = self.variants[idx]
670
+ seq2, label = variant, variant
671
+ img1 = self.template_pic
672
+ img2 = self.read_img(variant)
673
+
674
+ if self.side_enc:
675
+ img1 = (img1, self.template_seq)
676
+ img2 = (img2, encode_sequence(seq2, self.pad_length))
677
+
678
+ if self.gf:
679
+ return (img1, img2, self.glob_feat), label
680
+ else:
681
+ return (img1, img2), label
682
+
683
+
684
+ aa_side = {
685
+ "A": "C", "R": "CCCNC(N)=N", "N": "CC(=O)N", "D": "CC(=O)O", "C": "CS",
686
+ "E": "CCC(=O)O", "Q": "CCC(=O)N", "G": "", "H": "Cc1cnc[nH]1", "I": "C(C)CC",
687
+ "L": "CC(C)C", "K": "CCCCN", "M": "CCSC", "F": "Cc1ccccc1", "P": "C1CCN1",
688
+ "S": "CO", "T": "C(C)O", "W": "Cc1c[nH]c2ccccc12", "Y": "Cc1ccc(O)cc1", "V": "C(C)C"
689
+ }
690
+
691
+ aa_tpl = {}
692
+ for aa, R in aa_side.items():
693
+ for stereo, chir in (("L", "@"), ("D", "@@")):
694
+ if aa == "G": # Gly 没手性
695
+ backbone = "N[C:{idx}]C" # N-CA(带编号)-C
696
+ else:
697
+ backbone = f"N[C{chir}H:{'{idx}'}]({R})C" # N-[C@H:idx](R)-C
698
+ aa_tpl[f"{aa}_{stereo}"] = backbone + "(=O)" # 中间残基
699
+ aa_tpl[f"{aa}_{stereo}_term"] = backbone + "(=O)O" # C 端
700
+
701
+ def build_peptide_smiles(seq: str) -> str:
702
+ """
703
+ 给定单字母序列,返回 backbone 带 [atom_map] 的 SMILES。
704
+ 大写 = L 型, 小写 = D 型。编号 = 残基序号(1,2,3...) -> α-碳。
705
+ """
706
+ if not seq:
707
+ return ""
708
+
709
+ out = []
710
+ n = len(seq)
711
+ for i, aa in enumerate(seq, start=1):
712
+ key = f"{aa.upper()}_{'L' if aa.isupper() else 'D'}"
713
+ if i == n:
714
+ key += "_term"
715
+ out.append(aa_tpl[key].format(idx=i))
716
+ return "".join(out)
717
+
718
+ protease_patterns = {
719
+ 'trypsin': re.compile(r'(?<=[KR])(?!P)'),
720
+ 'chymotrypsin': re.compile(r'(?<=[FYWL])(?!P)'),
721
+ 'elastase': re.compile(r'(?<=[AVSGT])(?!P)'),
722
+ 'enterokinase': re.compile(r'D{4}K(?=[^P])'),
723
+ 'caspase': re.compile(r'(?<=D)(?=[GSA])'),
724
+ }
725
+
726
+ def draw_peptide(sequence, size=[768], pcs=False):
727
+ """
728
+ 根据输入序列生成多肽结构图,并基于常见蛋白酶识别模式高亮酶切位点肽键(红色)。
729
+ 支持的酶及其正则模式(P1--P1'):
730
+ • trypsin: (?<=[KR])(?!P)
731
+ • chymotrypsin: (?<=[FYWL])(?!P)
732
+ • elastase: (?<=[AVSGT])(?!P)
733
+ • enterokinase: D{4}K(?=[^P])
734
+ • caspase: (?<=D)(?=[GSA])
735
+ """
736
+
737
+ # # 1. 生成带 atom map 的 SMILES(现在序号标注在α-碳上)
738
+ smiles = build_peptide_smiles(sequence)
739
+ mol = Chem.MolFromSmiles(smiles)
740
+ # if mol is None:
741
+ # raise ValueError("SMILES 解析失败,请检查输入序列和侧链字典。")
742
+ AllChem.Compute2DCoords(mol)
743
+
744
+ highlight_bonds = []
745
+ bond_colors = {}
746
+
747
+ # ----------------------------------------------------
748
+ # 2. 先标 D 型残基:高亮与α-碳相连的键为蓝色
749
+ d_positions = {i for i, aa in enumerate(sequence, start=1) if aa.islower()}
750
+
751
+ for atom in mol.GetAtoms():
752
+ if atom.GetAtomMapNum() in d_positions:
753
+ # 这个atom就是α-碳,高亮与它相连的所有键
754
+ for b in atom.GetBonds():
755
+ idx = b.GetIdx()
756
+ if idx not in highlight_bonds:
757
+ highlight_bonds.append(idx)
758
+ bond_colors[idx] = (0.0, 0.0, 1.0)
759
+
760
+ # ----------------------------------------------------
761
+ # 3. 再标酶切键:红色(覆盖之前的蓝色)
762
+ if pcs:
763
+ cleavage_sites = set()
764
+ for pat in protease_patterns.values():
765
+ for m in pat.finditer(sequence):
766
+ cut = m.end() # 切在 cut 之后
767
+ if 1 <= cut < len(sequence):
768
+ cleavage_sites.add(cut)
769
+
770
+ for pos in cleavage_sites:
771
+ # 先找 P1 残基的 α-C
772
+ ca = next((a for a in mol.GetAtoms()
773
+ if a.GetAtomMapNum() == pos), None)
774
+ if ca is None:
775
+ continue
776
+
777
+ # 找同残基的羧基碳 (sp², 含 O 双键)
778
+ carbonyl_c = None
779
+ for nb in ca.GetNeighbors():
780
+ if nb.GetSymbol() != "C":
781
+ continue
782
+ # 判断是否有 "=O"
783
+ if any(bond.GetBondType() == Chem.BondType.DOUBLE and
784
+ o.GetSymbol() == "O"
785
+ for bond in nb.GetBonds()
786
+ for o in (bond.GetBeginAtom(), bond.GetEndAtom())):
787
+ carbonyl_c = nb
788
+ break
789
+ if carbonyl_c is None:
790
+ continue
791
+
792
+ # 羧基碳连到的 N 就是下一残基的氮
793
+ peptide_bond = None
794
+ for b in carbonyl_c.GetBonds():
795
+ o_atom = b.GetOtherAtom(carbonyl_c)
796
+ if o_atom.GetSymbol() == "N":
797
+ peptide_bond = b
798
+ break
799
+ if peptide_bond is None:
800
+ continue
801
+
802
+ bidx = peptide_bond.GetIdx()
803
+ if bidx not in highlight_bonds:
804
+ highlight_bonds.append(bidx)
805
+ bond_colors[bidx] = (1.0, 0.0, 0.0) # 红
806
+
807
+ # 4. 设置画布大小
808
+ if len(size) == 1:
809
+ w = h = size[0]
810
+ else:
811
+ w, h = size
812
+
813
+ # 5. MolDraw2DCairo 接收 highlightBondColors
814
+ drawer = rdMolDraw2D.MolDraw2DCairo(w, h)
815
+ # 你也可以通过 drawer.drawOptions() 调整一些样式:bond line width、atom font 等
816
+ drawer.DrawMolecule(
817
+ mol,
818
+ highlightAtoms=[],
819
+ highlightBonds=highlight_bonds,
820
+ highlightAtomColors={},
821
+ highlightBondColors=bond_colors
822
+ )
823
+ drawer.FinishDrawing()
824
+
825
+ # 6. 把输出的 PNG bytes 转成 Tensor
826
+ png_bytes = bytearray(drawer.GetDrawingText())
827
+ byte_tensor = torch.frombuffer(png_bytes, dtype=torch.uint8)
828
+ img = tvio.decode_png(byte_tensor, mode=tvio.ImageReadMode.RGB) # [3, H, W], uint8
829
+ img = tvtF.to_dtype(img, torch.float32)
830
+ img = tvtF.normalize(img, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
831
+ return img
832
+
833
+ if __name__ == '__main__':
834
+ # 假设 xlsx 文件路径为 "data.xlsx"
835
+ # 设置 pad_length 为 50,同时启用正反组合和自组合
836
+ pad_length = 30
837
+ dataset = PeptidePairDataset('r2_case', pad_length, "cls", include_reverse=False, include_self=False, one_way=True)
838
+
839
+ # 打印第一个数据项
840
+ if len(dataset) > 0:
841
+ (encoded_seq1, encoded_seq2), ratio = dataset[0]
842
+ print("第一个样本:")
843
+ print("变种1的编码张量形状:", encoded_seq1.shape)
844
+ print("变种2的编码张量形状:", encoded_seq2.shape)
845
+ print("标签比值(变种2/变种1):", ratio)
846
+ print(f"数据集大小:{len(dataset)}")
847
+ label_pos = 0
848
+ for (_, _), i in dataset:
849
+ label_pos += i
850
+ print(label_pos)
851
+
852
+ else:
853
+ print("未读入组合数据!")
854
+
855
+ # # 测试 PeptidesDataset
856
+ # pad_length = 30
857
+ # dataset = PeptidesDataset(xlsx_file="./dataset/train.xlsx", pad_length=pad_length)
858
+ # print(f"PeptidesDataset 样本总数: {len(dataset)}")
859
+ # if len(dataset) > 0:
860
+ # encoded_seq, label = dataset[0]
861
+ # print("第一个样本:")
862
+ # print("多肽编码张量形状:", encoded_seq.shape)
863
+ # print("标签浓度值(几何平均后):", label)
864
+ # else:
865
+ # print("未读取到有效数据!")
dataset/_r2_case.xlsx ADDED
Binary file (38.5 kB). View file
 
dataset/_test.xlsx ADDED
Binary file (94.6 kB). View file
 
dataset/_train.xlsx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:03bfa373ecd3e21fd68313c0917ba5201985b1453ea8eedcf2e3fe0da8b911eb
3
+ size 150386
dataset/finetune_for_r2_llm copy.json ADDED
@@ -0,0 +1,215 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "KWKIKWPVKWFKML": {
3
+ "1": [
4
+ "kwkikwpvkwfkml",
5
+ "Kwkikwpvkwfkml",
6
+ "kWkikwpvkwfkml",
7
+ "kwKikwpvkwfkml",
8
+ "kwkiKwpvkwfkml",
9
+ "kwkikWpvkwfkml",
10
+ "kwkikwPvkwfkml",
11
+ "kwkikwpVkwfkml",
12
+ "kwkikwpvKwfkml",
13
+ "kwkikwpvkWfkml",
14
+ "kwkikwpvkwFkml",
15
+ "kwkikwpvkwfKml",
16
+ "kwkikwpvkwfkMl",
17
+ "kwkikwpvkwfkmL",
18
+ "KWkikwpvkwfkml",
19
+ "KwkIkwpvkwfkml",
20
+ "KWKikwpvkwfkml",
21
+ "KWKiKwpvkwfkml",
22
+ "KWKikWpvkwfkml",
23
+ "KWKikwPvkwfkml",
24
+ "KWKikwpVkwfkml",
25
+ "KWKikwpvKwfkml",
26
+ "KWKikwpvkWfkml",
27
+ "KWKikwpvkwFkml",
28
+ "KWKikwpvkwfKml",
29
+ "KWKikwpvkwfkMl",
30
+ "KWKikwpvkwfkmL",
31
+ "kwKiKwpvkwFkml",
32
+ "kwKiKwpvkwfKml",
33
+ "kwKiKwpvkwfkMl",
34
+ "kwKiKwpvkwfkmL",
35
+ "kWkIkwpvKwFkml",
36
+ "kWkIkwpvKwfKml",
37
+ "kWkIkwpvKwfkMl",
38
+ "kWkIkwpvKwfkmL",
39
+ "kWKikWpvkwFkml",
40
+ "kWKikWpvkwfKml",
41
+ "kWKikWpvkwfkMl",
42
+ "kWKikWpvkwfkmL",
43
+ "kwKikwPvKwFkml",
44
+ "kwKikwPvKwfKml",
45
+ "kwKikwPvKwfkMl",
46
+ "kwKikwPvKwfkmL",
47
+ "KWKikwpVkwFkml",
48
+ "KWKikwpVkwfKml",
49
+ "KWKikwpVkwfkMl",
50
+ "KWKikwpVkwfkmL",
51
+ "KWkikwpvkWFkml",
52
+ "KWkikwpvkWfKml",
53
+ "KWkikwpvkWfkMl",
54
+ "KWkikwpvkWfkmL",
55
+ "kwkikwpvkwfKML",
56
+ "KWKIKWPVKWFKML",
57
+ "kwKiKWpVkwfkml",
58
+ "kWKiKWpVkwfkml",
59
+ "kwkIKWpVkwfkml",
60
+ "kWkIKWpVkwfkml",
61
+ "kwKiKWpVkwfKml",
62
+ "kWKiKWpVkwfKml",
63
+ "kwkIKWpVkwfKml",
64
+ "kWkIKWpVkwfKml",
65
+ "kwKiKWpVkwfkMl",
66
+ "kWKiKWpVkwfkMl",
67
+ "kwkIKWpVkwfkMl",
68
+ "kWkIKWpVkwfkMl",
69
+ "kwKiKWpVkwfkmL",
70
+ "kWKiKWpVkwfkmL",
71
+ "kwkIKWpVkwfkmL",
72
+ "kWkIKWpVkwfkmL",
73
+ "kwKiKWpVkwfKML",
74
+ "kWKiKWpVkwfKML",
75
+ "kwkIKWpVkwfKML",
76
+ "kWkIKWpVkwfKML",
77
+ "kWKIKWpVkwfkml",
78
+ "kWKIKWpVkwfKml",
79
+ "kWKIKWpVkwfkMl",
80
+ "kWKIKWpVkwfkmL",
81
+ "kWKIKWpVkwfKML",
82
+ "KWKiKWpVkwfkml",
83
+ "KWKiKWpVkwfKml",
84
+ "KWKiKWpVkwfkMl",
85
+ "KWKiKWpVkwfkmL",
86
+ "KWKiKWpVkwfKML",
87
+ "KWKIKWpVkwfkml",
88
+ "KWKIKWpVkwfKml",
89
+ "KWKIKWpVkwfkMl",
90
+ "KWKIKWpVkwfkmL",
91
+ "KWKIKWpVkwfKML",
92
+ "kwkikWPvkwFKML",
93
+ "kWkikWPvkwFKML",
94
+ "kwKikWPvkwFKML",
95
+ "kWKikWPvkwFKML",
96
+ "kwkikWPvkwfKML",
97
+ "kWkikWPvkwfKML",
98
+ "kwKikWPvkwfKML",
99
+ "kWKikWPvkwfKML",
100
+ "kwkikWPvkwfkML",
101
+ "kWkikWPvkwfkML",
102
+ "kwKikWPvkwfkML",
103
+ "kWKikWPvkwfkML",
104
+ "kwkikWPvkwfkmL"
105
+ ],
106
+ "0": [
107
+ "KWKIKWPVKWFKML",
108
+ "kWKIKWPVKWFKML",
109
+ "KwKIKWPVKWFKML",
110
+ "KWkIKWPVKWFKML",
111
+ "KWKIkwPVKWFKML",
112
+ "KWKIKWpVKWFKML",
113
+ "KWKIKWPvKWFKML",
114
+ "KWKIKWPVKWFkML",
115
+ "KWKIKWPVKWfKML",
116
+ "KWKIKWPVKWFkMl",
117
+ "KWKIKWPVKWFkmL",
118
+ "KWKIKWPVKWFKMl",
119
+ "KWKIKWPVKWFKmL",
120
+ "KWKIKWPVKWFKmL",
121
+ "kWKIKWPVKWFKMl",
122
+ "KWkIKWPVKWFKMl",
123
+ "KWKIkwPVKWFKMl",
124
+ "KWKIKWpVKWFKMl",
125
+ "KWKIKWPvKWFKMl",
126
+ "KWKIKWPVKWFkMl",
127
+ "KWKIKWPVKWfKMl",
128
+ "KWKIKWPVKWFkMl",
129
+ "KWKIKWPVKWFkmL",
130
+ "KWKIKWPVKWFKmL",
131
+ "kWKIKWPVKWFKmL",
132
+ "KWkIKWPVKWFKmL",
133
+ "KWKIkwPVKWFKmL",
134
+ "KWKIKWpVKWFKmL",
135
+ "KWKIKWPvKWFKmL",
136
+ "KWKIKWPVKWFkmL",
137
+ "KWKIKWPVKWfKML",
138
+ "kWKIKWPVKWfKML",
139
+ "KWkIKWPVKWfKML",
140
+ "KWKIkwPVKWfKML",
141
+ "KWKIKWpVKWfKML",
142
+ "KWKIKWPvKWfKML",
143
+ "KWKIKWPVKWFkML",
144
+ "KWKIKWPVKWfkML",
145
+ "KWKIKWPVKWfkmL",
146
+ "kWKIKWPVKWfkmL",
147
+ "KWkIKWPVKWfkmL",
148
+ "KWKIkwPVKWfkmL",
149
+ "KWKIKWpVKWfkmL",
150
+ "KWKIKWPvKWfkmL",
151
+ "KWKIKWPVKWFkMl",
152
+ "KWKIKWPVKWFkmL",
153
+ "KWKIKWPVKWfKml",
154
+ "KWKIKWPVKWFkml",
155
+ "KWKIKWPVKWFkMl",
156
+ "KWKIKWPVKWFkmL",
157
+ "KWKIKWPVKWFKml",
158
+ "KWKIKWPVKWFkmL",
159
+ "kWKIKWPVKWFkmL",
160
+ "KWkIKWPVKWFkmL",
161
+ "KWKIkwPVKWFkmL",
162
+ "KWKIKWpVKWFkmL",
163
+ "KWKIKWPvKWFkmL",
164
+ "KWKIKWPVKWFkml",
165
+ "KWKIKWPVKWfkml",
166
+ "KWKIKWPVKWfkmL",
167
+ "kWKIKWPVKWfkml",
168
+ "KWkIKWPVKWfkml",
169
+ "KWKIkwPVKWfkml",
170
+ "KWKIKWpVKWfkml",
171
+ "KWKIKWPvKWfkml",
172
+ "KWKIKWPVKWFkMl",
173
+ "KWKIKWPVKWFkml",
174
+ "KWKIKWPVKWFkmL",
175
+ "KWKIKWPVKWfKml",
176
+ "KWKIKWPVKWFkml",
177
+ "KWKIKWPVKWFkmL",
178
+ "KWKIKWPVKWFkmL",
179
+ "KWKIKWPVKWFKml",
180
+ "KWKIKWPVKWFkmL",
181
+ "KWKIKWPVKWFkml",
182
+ "KWKIKWPVKWFkmL",
183
+ "KWKIKWPVKWfkmL",
184
+ "kWKIKWPVKWFKML",
185
+ "KWkIKWPVKWFKML",
186
+ "KWKIkwPVKWFKML",
187
+ "KWKIKWpVKWFKML",
188
+ "KWKIKWPvKWFKML",
189
+ "KWKIKWPVKWFKmL",
190
+ "KWKIKWPVKWFKMl",
191
+ "KWKIKWPVKWFkML",
192
+ "KWKIKWPVKWfKML",
193
+ "KWKIKWPVKWFkmL",
194
+ "KWKIKWPVKWFkML",
195
+ "KWKIKWPvKWFKmL",
196
+ "kWKIKWPVKWFkmL",
197
+ "KWkIKWPVKWFkML",
198
+ "KWKIKWPVKWfkMl",
199
+ "KWkIKWPVKWFkmL",
200
+ "KWKIKWPVKWFkml",
201
+ "KWKIKWPVKWfkml",
202
+ "KWKIkwPVKWfkMl",
203
+ "KWKIKWPVKWFkmL",
204
+ "KWKIKWPVKWfkmL",
205
+ "KWKIKWPVKWFKml",
206
+ "KWKIKWPVKWFkmL",
207
+ "KWKIKWPVKWFkml",
208
+ "KWKIKWPVKWFkmL",
209
+ "KWKIKWPVKWfkml",
210
+ "KWKIKWPVKWfkmL",
211
+ "KWKIKWPVKWFkmL",
212
+ "KWKIKWPVKWFkml"
213
+ ]
214
+ }
215
+ }
dataset/finetune_for_r2_llm.json ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "KWKIKWPVKWFKML": {
3
+ "1": [
4
+ "kwKIKWPVKWFKML",
5
+ "kWkIKWPVKWFKML",
6
+ "kWKIkWPVKWFKML",
7
+ "kWKIKwPVKWFKML",
8
+ "kWKIKWPvKWFKML",
9
+ "kWKIKWPVkWFKML",
10
+ "kWKIKWPVKwFKML",
11
+ "kWKIKWPVKWfKML",
12
+ "kWKIKWPVKWFKmL",
13
+ "KwkIKWPVKWFKML",
14
+ "KwKIkWPVKWFKML",
15
+ "KwKIKwPVKWFKML",
16
+ "KwKIKWPvKWFKML",
17
+ "KwKIKWPVkWFKML",
18
+ "KwKIKWPVKwFKML",
19
+ "KwKIKWPVKWfKML",
20
+ "KwKIKWPVKWFKmL",
21
+ "KWkIkWPVKWFKML",
22
+ "KWkIKwPVKWFKML",
23
+ "KWkIKWPvKWFKML",
24
+ "KWkIKWPVkWFKML",
25
+ "KWkIKWPVKwFKML",
26
+ "KWkIKWPVKWfKML",
27
+ "KWkIKWPVKWFKmL",
28
+ "KWKIkwPVKWFKML",
29
+ "KWKIkWPvKWFKML",
30
+ "KWKIkWPVkWFKML",
31
+ "KWKIkWPVKwFKML",
32
+ "KWKIkWPVKWfKML",
33
+ "KWKIkWPVKWFKmL",
34
+ "KWKIKwPvKWFKML",
35
+ "KWKIKwPVkWFKML",
36
+ "KWKIKwPVKwFKML",
37
+ "KWKIKwPVKWfKML",
38
+ "KWKIKwPVKWFKmL",
39
+ "KWKIKWPvkWFKML",
40
+ "KWKIKWPvKwFKML",
41
+ "KWKIKWPvKWFKmL",
42
+ "KWKIKWPVkwFKML",
43
+ "KWKIKWPVkWFKmL",
44
+ "KWKIKWPVKwfKML",
45
+ "KWKIKWPVKwFKmL",
46
+ "KWKIKWPVKWfKmL",
47
+ "kwkIKWPVKWFKML",
48
+ "kwKIkWPVKWFKML",
49
+ "kwKIKwPVKWFKML",
50
+ "kwKIKWPvKWFKML",
51
+ "kwKIKWPVkWFKML",
52
+ "kwKIKWPVKwFKML",
53
+ "kwKIKWPVKWfKML",
54
+ "kwKIKWPVKWFKmL",
55
+ "kWkIkWPVKWFKML",
56
+ "kWkIKwPVKWFKML",
57
+ "kWkIKWPvKWFKML",
58
+ "kWkIKWPVkWFKML",
59
+ "kWkIKWPVKwFKML",
60
+ "kWkIKWPVKWfKML",
61
+ "kWkIKWPVKWFKmL",
62
+ "kWKIkwPVKWFKML",
63
+ "kWKIkWPvKWFKML",
64
+ "kWKIkWPVkWFKML",
65
+ "kWKIkWPVKwFKML",
66
+ "kWKIkWPVKWfKML",
67
+ "kWKIkWPVKWFKmL",
68
+ "kWKIKwPvKWFKML",
69
+ "kWKIKwPVkWFKML",
70
+ "kWKIKwPVKwFKML",
71
+ "kWKIKwPVKWfKML",
72
+ "kWKIKwPVKWFKmL",
73
+ "kWKIKWPvkWFKML",
74
+ "kWKIKWPvKwFKML",
75
+ "kWKIKWPvKWFKmL",
76
+ "kWKIKWPVkwFKML",
77
+ "kWKIKWPVkWFKmL",
78
+ "kWKIKWPVKwfKML",
79
+ "kWKIKWPVKwFKmL",
80
+ "kWKIKWPVKWfKmL",
81
+ "kwkIKwPVKWFKML",
82
+ "kwkIKWPvKWFKML",
83
+ "kwkIKWPVkWFKML",
84
+ "kwkIKWPVKwFKML",
85
+ "kwkIKWPVKWfKML",
86
+ "kwkIKWPVKWFKmL",
87
+ "KwKIkwPVKWFKML",
88
+ "KwKIKwPvKWFKML",
89
+ "KwKIKwPVkWFKML",
90
+ "KwKIKwPVKwFKML",
91
+ "KwKIKwPVKWfKML",
92
+ "KwKIKwPVKWFKmL",
93
+ "KwKIKWPvkWFKML",
94
+ "KwKIKWPvKwFKML",
95
+ "KwKIKWPvKWFKmL",
96
+ "KwKIKWPVkwFKML",
97
+ "KwKIKWPVkWFKmL",
98
+ "KwKIKWPVKwfKML",
99
+ "KwKIKWPVKwFKmL",
100
+ "KwKIKWPVKWfKmL"
101
+ ],
102
+ "0": [
103
+ "KWKiKWPVKWfKML",
104
+ "KWKiKWPVKWFKmL",
105
+ "KWKiKWPVkWFKML",
106
+ "KWKiKWPVKwFKML",
107
+ "KWKiKWPvKWFKML",
108
+ "KWKiKwPVKWFKML",
109
+ "kWKiKWPVKWFKML",
110
+ "kWKiKWPvKWFKML",
111
+ "kWKiKWPVkWFKML",
112
+ "kWKiKWPVKwFKML",
113
+ "KWKIKWpVKWfKML",
114
+ "KWKIKWpVKWFKmL",
115
+ "KWKIKWpVKwFKML",
116
+ "KWKIKWpVkWFKML",
117
+ "KWKIKWpVkwFKML",
118
+ "KWKIKWpVKWFkML",
119
+ "KWKIKWpVKWFkMl",
120
+ "kWKIKWpVKWFKML",
121
+ "kWKIKWpVKWfKML",
122
+ "kWKIKWpVkWFKML",
123
+ "kWKIKWpVKwFKML",
124
+ "kWKIKWpVKWFKmL",
125
+ "KWKIKWPVKWFkML",
126
+ "KWKIKWPVKWFkML",
127
+ "KWKIKWPVKWfkML",
128
+ "KWKIKWPVKWfkMl",
129
+ "KWKIKWPVKWfKMl",
130
+ "KWKIKWPVKWFkMl",
131
+ "KWKIKWPVKwFkML",
132
+ "KWKIKWPVKwFkMl",
133
+ "KWKIKWPVkwFkML",
134
+ "KWKIKWPVkwFkMl",
135
+ "KWKIKWpVkwFkML",
136
+ "kWKIKWpVKWFkML",
137
+ "kWKIKWpVKWFkMl",
138
+ "kWKIKWpVkwFkML",
139
+ "KWKiKWPVKWFkML",
140
+ "KWKiKWPVKWFkMl",
141
+ "KWKiKWPVkwFkML",
142
+ "KWKiKWPVkwFkMl",
143
+ "KWKIKWpVKWfkMl",
144
+ "KWKIKWpVKWfkML",
145
+ "KWKIkWPvKWfkML",
146
+ "kWKIKWPvKWfkML",
147
+ "KWKIKwPvKWfKML",
148
+ "KWKIKWPvkWfKML",
149
+ "KWKIKWPvKwfKML",
150
+ "KWKIKWPvKWfkML",
151
+ "KWKIKWPvKWfkMl",
152
+ "KWKIKWPvkWFkMl",
153
+ "KWKIKWPvKWFkMl",
154
+ "KWKIKWpvKWFkML",
155
+ "KWKIKWpvKWFkMl",
156
+ "KWKiKWPvKWFkMl",
157
+ "kWKIKWPVKWFkMl",
158
+ "KwKiKWPVKWFkML",
159
+ "KwKiKWPVKWFkMl",
160
+ "KwKiKWPVKWfkML",
161
+ "KwKiKWPVKWfkMl",
162
+ "KWkIkWpVKWFkML",
163
+ "KWkIkWpVKWFkMl",
164
+ "KWkIkWpVKWfkML",
165
+ "KWkIkWpVKWfkMl",
166
+ "KWKiKWpVKWFKML",
167
+ "KWKiKWpVKWfKML",
168
+ "KWKiKWpVKwFKML",
169
+ "KWKiKWpVKWFkML",
170
+ "KWKiKWpVKWFKmL",
171
+ "KWKIKWpVKWFKMl",
172
+ "kWKIKWpVKWFKMl",
173
+ "KwKIKWpVKWFKMl",
174
+ "KWkIKWpVKWFKMl",
175
+ "KWKIkWpVKWFKMl",
176
+ "KWKIKwPvKWFKMl",
177
+ "KWKIKWpVkWFKMl",
178
+ "KWKIKWpVKwFKMl",
179
+ "kWKIKWpVKwFKMl",
180
+ "kWKIKWpVKWfkML",
181
+ "kWKIKWpVKWfkMl",
182
+ "KwKiKWPVkwfkML",
183
+ "KwKiKWPVkwfkMl",
184
+ "KWKIKWPvkwfKML",
185
+ "KWKIKWPVkwfkML",
186
+ "kWKIKWPVkwfkML",
187
+ "KWKiKWPVkwfkML",
188
+ "KWKiKWPVkwfkMl",
189
+ "KWKIKWpVkwfkML",
190
+ "KWKIKWpVkwfkMl",
191
+ "kWKIKWpVkwfkML",
192
+ "kWKIKWpVkwfkMl",
193
+ "KWKIKWPVkWFkML",
194
+ "KWKIKWPVkWFkMl"
195
+ ]
196
+ }
197
+ }
dataset/r2_case.xlsx ADDED
Binary file (50.7 kB). View file
 
dataset/stability.xlsx ADDED
Binary file (97.1 kB). View file
 
dataset/test.xlsx ADDED
Binary file (11.5 kB). View file
 
dataset/test_.xlsx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb17d223ff62391058b5c257977abf929a9cbb6c8cf29c7d7f15aeb6a585b7b9
3
+ size 101949
dataset/test__.xlsx ADDED
Binary file (23.7 kB). View file
 
dataset/train.xlsx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2f19bd66bb781214298e586c07500e911fa8d50666a4918d91b656e397632a9
3
+ size 228312
dataset/train_set.json ADDED
@@ -0,0 +1,1736 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "GIMSSLMKKLAAHIAK": {
3
+ "1": [
4
+ "GIMSSLMkKLAAHIAK",
5
+ "GIMSSLMKkLAAHIAK",
6
+ "GIMSSLMKKLAAHIAk",
7
+ "GIMSSLMkkLAAHIAK",
8
+ "GIMSSLMkKLAAHIAk",
9
+ "GIMSSLMKkLAAHIAk",
10
+ "GIMSSLMkkLAAHIAk"
11
+ ],
12
+ "0": []
13
+ },
14
+ "ILGTILGLLKSL": {
15
+ "1": [],
16
+ "0": [
17
+ "ILGTILGLLkSL",
18
+ "ilgtilgllksl"
19
+ ]
20
+ },
21
+ "KRLFKKLLKYLRKF": {
22
+ "1": [
23
+ "KRLFkkLLKYLRkF",
24
+ "krLFkkLLKYLRkF",
25
+ "krLFkkLLkYLrkF"
26
+ ],
27
+ "0": [
28
+ "KRLFKKLLKYLRkF"
29
+ ]
30
+ },
31
+ "ILGTILGLLKGL": {
32
+ "1": [
33
+ "ilgtilgllkgl"
34
+ ],
35
+ "0": [
36
+ "ILGTILGLLkGL"
37
+ ]
38
+ },
39
+ "IDWKKLLDAAKQIL": {
40
+ "1": [
41
+ "idwkklldaakqil"
42
+ ],
43
+ "0": [
44
+ "IDWkkLLDAAkQIL"
45
+ ]
46
+ },
47
+ "VWRRWRRFWRR": {
48
+ "1": [],
49
+ "0": [
50
+ "vwrrwrrfwrr",
51
+ "VWrrWrrFWrr"
52
+ ]
53
+ },
54
+ "FLKLLKKLL": {
55
+ "1": [
56
+ "fLKLLKKLL",
57
+ "FlKLLKKLL",
58
+ "FLkLLKKLL",
59
+ "flkllkkll"
60
+ ],
61
+ "0": [
62
+ "FLKlLKKLL",
63
+ "FLKLlKKLL",
64
+ "FLKLLkKLL",
65
+ "FLKLLKkLL",
66
+ "FLKLLKKlL",
67
+ "FLKLLKKLl"
68
+ ]
69
+ },
70
+ "KKVVFWVKFK": {
71
+ "1": [
72
+ "KKVVFWVKFk"
73
+ ],
74
+ "0": [
75
+ "KKVVFWVKfK",
76
+ "KKVVFWVkFK",
77
+ "KKVVFWvKFK",
78
+ "KKVVFwVKFK",
79
+ "KKVVfWVKFK",
80
+ "KKVvFWVKFK",
81
+ "KKvVFWVKFK",
82
+ "KkVVFWVKFK",
83
+ "kKVVFWVKFK"
84
+ ]
85
+ },
86
+ "KRIVKLILKWLR": {
87
+ "1": [
88
+ "KRIVkLILKWLR",
89
+ "KRIVKlILKWLR"
90
+ ],
91
+ "0": []
92
+ },
93
+ "KKVVFKVKFKK": {
94
+ "1": [
95
+ "kKVVFKVKFKk"
96
+ ],
97
+ "0": [
98
+ "kkVVFKVKFKK",
99
+ "KKVVFKVKFkk",
100
+ "kkVVFKVKFkk",
101
+ "KKVVFkVKFKK",
102
+ "kkVVFkVKFkk",
103
+ "kkvvfkvkfkk"
104
+ ]
105
+ },
106
+ "KWKSFLKTFKSAKKTVLHTALKAISS": {
107
+ "1": [
108
+ "KWKSFLKTFKSAkKTVLHTALKAISS"
109
+ ],
110
+ "0": [
111
+ "KWKSFLKTFKSAKkTVLHTALKAISS",
112
+ "KWKSFLKTFKsAKkTVLHTALKAISS",
113
+ "KWKSFLKTFKSAKktVLHTALKAISS",
114
+ "KWKSFLKTFKsAKktVLHTALKAISS",
115
+ "KWKSFLKTFKSaKKTVLHTALKAISS",
116
+ "KWKSFLKTfKSaKKTVLHTALKAISS",
117
+ "KWKSFLKTFKSaKKTvLHTALKAISS",
118
+ "KWKSFLKTfKSaKKTvLHTALKAISS",
119
+ "kwksflktfksakktvlhtalkaiss"
120
+ ]
121
+ },
122
+ "FLPLIIGALSSLLPKIF": {
123
+ "1": [],
124
+ "0": [
125
+ "FLPLIIGALSSLLPKiF",
126
+ "FLPLiiGALSSLLPKiF"
127
+ ]
128
+ },
129
+ "KLKKLLKKWLKLLKKLLK": {
130
+ "1": [
131
+ "KLKKLlKKWLKlLKKLLk",
132
+ "KLKKlLKKWlKLLKkLLK",
133
+ "KLKkLLKkWLKlLKKlLK",
134
+ "KLkKLlKKwLKlLKkLLk",
135
+ "KlKkLlKkWlKlLkKlLk",
136
+ "KLKKLLKKWlkllkkllk"
137
+ ],
138
+ "0": []
139
+ },
140
+ "KKAAAAAAAAAAAAWAAAAAAKKKK": {
141
+ "1": [
142
+ "kkAAAAAAAAAAAAWAAAAAAKKKK",
143
+ "KKAAAAAAAAAAAAwaAAAAAKKKK",
144
+ "KKAAAAAAAAAAAAWAaaAAAKKKK",
145
+ "KKAAAAAAAAAAAAWAAAaaAKKKK"
146
+ ],
147
+ "0": [
148
+ "KKaaAAAAAAAAAAWAAAAAAKKKK",
149
+ "KKAAaaAAAAAAAAWAAAAAAKKKK",
150
+ "KKAAAAaaAAAAAAWAAAAAAKKKK",
151
+ "KKAAAAAAaaAAAAWAAAAAAKKKK",
152
+ "KKAAAAAAAAaaAAWAAAAAAKKKK",
153
+ "KKAAAAAAAAAAaaWAAAAAAKKKK",
154
+ "KKAAAAAAAAAAAAWAAAAAakKKK",
155
+ "KKAAAAAAAAAAAAWAAAAAAKkkK"
156
+ ]
157
+ },
158
+ "FVPWFSKFLGRIL": {
159
+ "1": [],
160
+ "0": [
161
+ "FVPWFSkFLGRIL",
162
+ "FVPWFSKfLGRIL",
163
+ "FVPWFSKFlGRIL",
164
+ "FVPWFSKFLGrIL",
165
+ "FVPWFSKFLGRiL",
166
+ "FVPWFSKFLGRIl"
167
+ ]
168
+ },
169
+ "IRIKIRIK": {
170
+ "1": [
171
+ "irikirik",
172
+ "IRIkIrIK"
173
+ ],
174
+ "0": [
175
+ "IrIkIrIk"
176
+ ]
177
+ },
178
+ "IIRKIIRK": {
179
+ "1": [
180
+ "iirkiirk",
181
+ "IirKIirK"
182
+ ],
183
+ "0": []
184
+ },
185
+ "KKLFKKILKYL": {
186
+ "1": [
187
+ "KKLfKKILKYL",
188
+ "KKLFKKILkYL",
189
+ "KKLFKKIlKYL",
190
+ "KKLFkKILKYL",
191
+ "KKlFKKILKYL",
192
+ "KkLFKKILKYL",
193
+ "KKLFKkILKYL",
194
+ "kKLFKKILKYL",
195
+ "KKLFKKIlkYL",
196
+ "KKlFKkILkYL",
197
+ "KKLFKKilkYL",
198
+ "kklfkkilkyl",
199
+ "kkLfKKILKYL",
200
+ "KKLFKKilkyl",
201
+ "KKLFkkilkyl",
202
+ "KKLfkkilkyl",
203
+ "KKlfkkilkyl",
204
+ "Kklfkkilkyl",
205
+ "kklfKKILKYL",
206
+ "kklfkKILKYL",
207
+ "kklfkkILKYL",
208
+ "kklfkkiLKYL",
209
+ "kklfkkilKYL",
210
+ "kklfkkilkYL",
211
+ "kklfkkilkyL"
212
+ ],
213
+ "0": [
214
+ "KKLFKkilkyl",
215
+ "KKLFKKiLKYL",
216
+ "KKLFKKILKyL",
217
+ "KKLFKKILKYl",
218
+ "KKlFKKILkYL",
219
+ "KKLFKKIlkyl"
220
+ ]
221
+ },
222
+ "KFFKRLLKSVRRAVKKFRK": {
223
+ "1": [],
224
+ "0": [
225
+ "kFFkrLLkSVrrAVkkFrk",
226
+ "kffkrllksvrravkkfrk"
227
+ ]
228
+ },
229
+ "RWRWRWK": {
230
+ "1": [
231
+ "rWRWRWK",
232
+ "rWRWRwK",
233
+ "rWRWrWK",
234
+ "rWRwRWK",
235
+ "rWrWRWK",
236
+ "rwRWRWK",
237
+ "rWRWrwK",
238
+ "rWRwRwK",
239
+ "rWrWRwK",
240
+ "rwRWRwK",
241
+ "rWRwrWK",
242
+ "rWrWrWK",
243
+ "rwRWrWK",
244
+ "rWrwRWK",
245
+ "rwRwRWK",
246
+ "rwrWRWK",
247
+ "rWRwrwK",
248
+ "rWrWrwK",
249
+ "rwRWrwK",
250
+ "rWrwRwK",
251
+ "rwrWRwK",
252
+ "rWrwrWK",
253
+ "rwRwrWK",
254
+ "rwrWrWK",
255
+ "rwrwRWK",
256
+ "rWrwrwK",
257
+ "rwRwrwK",
258
+ "rwrWrwK",
259
+ "rwrwRwK",
260
+ "rwrwrWK",
261
+ "rwrwrwK"
262
+ ],
263
+ "0": [
264
+ "rwRwRwK"
265
+ ]
266
+ },
267
+ "KWKSFLKTFKSLKKTVLHTLLKAISS": {
268
+ "1": [
269
+ "KWKSFLkTFKSLKKTVLHTLLKAISS",
270
+ "KWKSFLKTFKSLKkTVLHTLLKAISS",
271
+ "KWKSFLKTFKSLKKTVLHTLLkAISS",
272
+ "KWKSFLkTFKSLKkTVLHTLLKAISS",
273
+ "KWKSFLKTFKSLKkTVLHTLLkAISS",
274
+ "KWKSFlKTFKSLKKTVLHTLLKAISS",
275
+ "KWKSFLKTFKSlKKTVLHTLLKAISS",
276
+ "KWKSFLKTFKSLKKTVLHTlLKAISS",
277
+ "KWKSFLKTFKSlKKTVLHTlLKAISS",
278
+ "KWKSFlKTFKSlKKTVLHTlLKAISS"
279
+ ],
280
+ "0": [
281
+ "KWKSFLkTFKSLKkTVLHTLLkAISS",
282
+ "KWKSFLkTFkSLKkTVLHTLLkAISS",
283
+ "KWkSFLkTFkSLKkTVLHTLLkAISS",
284
+ "kWkSFLkTFkSLKkTVLHTLLkAISS",
285
+ "KWKSFlKTFKSlKKTVLHTLLKAISS",
286
+ "KWKSFlKTFKSlKKTVlHTlLKAISS",
287
+ "KWKSFlKTFKSlKKTVlHTllKAISS"
288
+ ]
289
+ },
290
+ "GWLDVAKKIGKAAFNVAKNFL": {
291
+ "1": [],
292
+ "0": [
293
+ "GWLDvAKKIGKAAFNvAKNFL",
294
+ "GWLDVAKKIGKAAFNvAKNFL"
295
+ ]
296
+ },
297
+ "GFGMALKLLKKVL": {
298
+ "1": [
299
+ "GfGmalkllkkvl",
300
+ "GfGMALKLLKKVL"
301
+ ],
302
+ "0": [
303
+ "GFGMALKLLKKVl",
304
+ "GFGMALKLLKKvL",
305
+ "GFGMALKLLKkVL",
306
+ "GFGMALKLLkKVL",
307
+ "GFGMALKLlKKVL",
308
+ "GFGMALKlLKKVL",
309
+ "GFGMALkLLKKVL",
310
+ "GFGMAlKLLKKVL",
311
+ "GFGMaLKLLKKVL",
312
+ "GFGmALKLLKKVL"
313
+ ]
314
+ },
315
+ "RGLRRLGRKIAHGVKKYGPTVLRIIRIA": {
316
+ "1": [],
317
+ "0": [
318
+ "rglrrlgrkiahgvkkygptvlriiria",
319
+ "RGLRRLGRKIAHGVKKYGptvlriiria"
320
+ ]
321
+ },
322
+ "KVLGRLVKVLGRLV": {
323
+ "1": [
324
+ "kVLGRLVKVLGRLV"
325
+ ],
326
+ "0": [
327
+ "KVLGRLVkVLGRLV",
328
+ "kVLGRLVkVLGRLV"
329
+ ]
330
+ },
331
+ "RRLFRRILRWL": {
332
+ "1": [
333
+ "RRLfRRILRWL",
334
+ "RRLFrRILRWL",
335
+ "rrlfrrilrwl"
336
+ ],
337
+ "0": [
338
+ "rRLFRRILRWL",
339
+ "RrLFRRILRWL",
340
+ "RRlFRRILRWL",
341
+ "RRLFRrILRWL",
342
+ "RRLFRRiLRWL",
343
+ "RRLFRRIlRWL",
344
+ "RRLFRRILrWL",
345
+ "RRLFRRILRwL",
346
+ "RRLFRRILRWl"
347
+ ]
348
+ },
349
+ "KWKSFLKTFKSAVKTVLHTALKAISS": {
350
+ "1": [
351
+ "KWKSFLKTFKSAvKTVLHTALKAISS",
352
+ "KWKSFLKTFKsAVKTVLHTALKAISS"
353
+ ],
354
+ "0": [
355
+ "kwksflktfksavktvlhtalkaiss"
356
+ ]
357
+ },
358
+ "RRWVRRVRRVWRRVVRVVRRWVRR": {
359
+ "1": [],
360
+ "0": [
361
+ "RRWVRRvRRVWRRVvRvVRRWvRR",
362
+ "RRWVRRvRRvWRRVvRvvRRWvRR",
363
+ "RRWvRRvRRvWRRvvRvvRRWvRR"
364
+ ]
365
+ },
366
+ "TVGGLVKWILKTVKKFA": {
367
+ "1": [
368
+ "tvgglvkwilktvkkfa",
369
+ "TVGGLVKWILkTVKKFA"
370
+ ],
371
+ "0": [
372
+ "TVGGLVkWILkTVKkFA"
373
+ ]
374
+ },
375
+ "INLKALAALAKKIL": {
376
+ "1": [],
377
+ "0": [
378
+ "iNLKALAALAKKIL",
379
+ "InLKALAALAKKIL",
380
+ "inLKALAALAKKIL",
381
+ "inlkalaalakkil"
382
+ ]
383
+ },
384
+ "FLSLIPKAIKAVGVKAKKF": {
385
+ "1": [],
386
+ "0": [
387
+ "FLSLIPkAIkAVGVkAkkF",
388
+ "FLSLIPkAIKAVGVKAKKF"
389
+ ]
390
+ },
391
+ "KKLLKLLKLLL": {
392
+ "1": [
393
+ "kkllkllklll",
394
+ "KkLLKLLKLLL",
395
+ "KkLLkLLKLLL",
396
+ "KkLlKLLKLLL",
397
+ "kKLLKLLKLLl",
398
+ "kkLLKLLKLLl",
399
+ "KkllKLLKLLL",
400
+ "kkLLkLLKLLL",
401
+ "KkllKLlKLLL"
402
+ ],
403
+ "0": [
404
+ "kkLLKLLKLLL",
405
+ "KKLLKllKLLL",
406
+ "KKLLkllKLLL",
407
+ "KkllKlLKLLL",
408
+ "KKLLkllkLLL",
409
+ "KKllKllKLLL",
410
+ "KKlLkLlKlLL",
411
+ "KKLlkLLklLL",
412
+ "KklLKLLKllL",
413
+ "kkLLKLLKLll",
414
+ "kkLLkLLKLLl",
415
+ "KKllKLLklLL",
416
+ "KklLKlLKlLL",
417
+ "KKllKLlKLlL",
418
+ "KKLlkLLkLLl",
419
+ "KkllKllKLLL",
420
+ "KKllKllKlLL",
421
+ "kkLLkLLKLll",
422
+ "kkLLkLLkLLl",
423
+ "kKLLkllKLLl",
424
+ "KKLlkllkLLL",
425
+ "KkLlKlLkLlL",
426
+ "kKlLkLlKlLl"
427
+ ]
428
+ },
429
+ "KKVVFKVKFK": {
430
+ "1": [],
431
+ "0": [
432
+ "KKVVFKVKFk",
433
+ "kKVVFKVKFk",
434
+ "kkVVFKVKFk",
435
+ "KKVVfkvKFK",
436
+ "kKVVfkvKFk"
437
+ ]
438
+ },
439
+ "LKLLKKLLKKLLKLL": {
440
+ "1": [
441
+ "LKlLKkLlkKLLkLL"
442
+ ],
443
+ "0": []
444
+ },
445
+ "KLKLLKLLKLLKLLK": {
446
+ "1": [],
447
+ "0": [
448
+ "KLkLLkLlkLLKlLK"
449
+ ]
450
+ },
451
+ "KKKLLLLLLLLLKKK": {
452
+ "1": [
453
+ "KKkLLlLllLLLkKK"
454
+ ],
455
+ "0": []
456
+ },
457
+ "KKFKKTAKWLIKSAWLLLKSLALKMK": {
458
+ "1": [
459
+ "kkfkktakwliksawlllkslalkmk"
460
+ ],
461
+ "0": []
462
+ },
463
+ "WWWLRRRW": {
464
+ "1": [],
465
+ "0": [
466
+ "wwwlrrrw"
467
+ ]
468
+ },
469
+ "RRRWWWWV": {
470
+ "1": [],
471
+ "0": [
472
+ "rrrwwwwv"
473
+ ]
474
+ },
475
+ "KWFRVYRGIYRRR": {
476
+ "1": [],
477
+ "0": [
478
+ "kwfrvyrgiyrrr"
479
+ ]
480
+ },
481
+ "RRRYIGRYVRFWK": {
482
+ "1": [],
483
+ "0": [
484
+ "rrryigryvrfwk"
485
+ ]
486
+ },
487
+ "GKIIKLKASLKLL": {
488
+ "1": [
489
+ "gkiiklkaslkll"
490
+ ],
491
+ "0": []
492
+ },
493
+ "KLFKKLFKKLFK": {
494
+ "1": [],
495
+ "0": [
496
+ "kLFkkLFkkLFk"
497
+ ]
498
+ },
499
+ "GFFALIPKIISSPLFKTLLSAV": {
500
+ "1": [],
501
+ "0": [
502
+ "GFFALIpKIISSPLFKTllSAV"
503
+ ]
504
+ },
505
+ "KGFFALIPKIISSPLFKTLLSAV": {
506
+ "1": [],
507
+ "0": [
508
+ "KGFFALIpKIISSPLFKTllSAV"
509
+ ]
510
+ },
511
+ "RGLRRLGRKIAHGVKKYG": {
512
+ "1": [
513
+ "rglrrlgrkiahgvkkyg"
514
+ ],
515
+ "0": []
516
+ },
517
+ "FLGGLIKIVPAMICAVTKKC": {
518
+ "1": [
519
+ "flGGlikivpamicavtkkc"
520
+ ],
521
+ "0": []
522
+ },
523
+ "AKRLKKLAKKIWKWK": {
524
+ "1": [],
525
+ "0": [
526
+ "AkRLkkLAkkIWkWk"
527
+ ]
528
+ },
529
+ "VDKPPYLPRPRPIRRPGGR": {
530
+ "1": [
531
+ "VDkPPYLPrPrPIrrPGGr"
532
+ ],
533
+ "0": [
534
+ "VDKPPYLPrPRPIrRPGGR",
535
+ "VDKPPYLPrPRPIRrPGGR",
536
+ "VDKPPYLPrPRPIRRPGGr",
537
+ "VDKPPYLPRPrPIrRPGGR",
538
+ "VDKPPYLPRPrPIRrPGGR",
539
+ "VDKPPYLPRPrPIRRPGGr",
540
+ "VDKPPYLPRPRPIrRPGGr",
541
+ "VDKPPYLPRPRPIRrPGGr"
542
+ ]
543
+ },
544
+ "GIGAVLKVLTTGLPALISWIKRKRQQ": {
545
+ "1": [
546
+ "GIGAVlKVLTTGlPALISWiKRKRQQ",
547
+ "gigavlkvlttglpaliswikrkrqq"
548
+ ],
549
+ "0": [
550
+ "GIGAvLKvLTTGLPALiSWIkRKRQQ"
551
+ ]
552
+ },
553
+ "FWGALAKGALKLIPSLFSSFSKKD": {
554
+ "1": [
555
+ "fwGalakGalklipslfssfskkd"
556
+ ],
557
+ "0": []
558
+ },
559
+ "IRVKIRVKIRVK": {
560
+ "1": [
561
+ "irvkirvkirvk"
562
+ ],
563
+ "0": []
564
+ },
565
+ "LIKKALAALAKLNI": {
566
+ "1": [],
567
+ "0": [
568
+ "likkalaalaklni"
569
+ ]
570
+ },
571
+ "RSMRLSFRARGYGFR": {
572
+ "1": [
573
+ "rsmrlsfrarGyGfr"
574
+ ],
575
+ "0": []
576
+ },
577
+ "GLLKRIKTLL": {
578
+ "1": [],
579
+ "0": [
580
+ "GLLkRIkTLL",
581
+ "Gllkriktll"
582
+ ]
583
+ },
584
+ "KKLFKKILRYL": {
585
+ "1": [
586
+ "KKLfKKILRYL"
587
+ ],
588
+ "0": [
589
+ "KKLFKkilryl",
590
+ "kklfkkilryl"
591
+ ]
592
+ },
593
+ "FQWQRNMRKVR": {
594
+ "1": [
595
+ "fqwqrnmrkvr"
596
+ ],
597
+ "0": []
598
+ },
599
+ "KKKKKKAAFAAWAAFAA": {
600
+ "1": [],
601
+ "0": [
602
+ "kkkkkkaafaawaafaa"
603
+ ]
604
+ },
605
+ "RRWWRF": {
606
+ "1": [],
607
+ "0": [
608
+ "rrwwrf"
609
+ ]
610
+ },
611
+ "KWKSFLKTFKSALKTVLHTALKAISS": {
612
+ "1": [
613
+ "KWKSFLKTFKSAlKTVLHTALKAISS"
614
+ ],
615
+ "0": []
616
+ },
617
+ "KWKSFLKTFKSAAKTVLHTALKAISS": {
618
+ "1": [
619
+ "KWKSFLKTFKSAaKTVLHTALKAISS"
620
+ ],
621
+ "0": []
622
+ },
623
+ "KWKSFLKTFKSASKTVLHTALKAISS": {
624
+ "1": [],
625
+ "0": [
626
+ "KWKSFLKTFKSAsKTVLHTALKAISS"
627
+ ]
628
+ },
629
+ "KWKSFLKTFKLAVKTVLHTALKAISS": {
630
+ "1": [
631
+ "KWKSFLKTFKlAVKTVLHTALKAISS"
632
+ ],
633
+ "0": []
634
+ },
635
+ "KWKSFLKTFKVAVKTVLHTALKAISS": {
636
+ "1": [
637
+ "KWKSFLKTFKvAVKTVLHTALKAISS"
638
+ ],
639
+ "0": []
640
+ },
641
+ "KWKSFLKTFKAAVKTVLHTALKAISS": {
642
+ "1": [
643
+ "KWKSFLKTFKaAVKTVLHTALKAISS"
644
+ ],
645
+ "0": []
646
+ },
647
+ "KWKSFLKTFKKAVKTVLHTALKAISS": {
648
+ "1": [
649
+ "KWKSFLKTFKkAVKTVLHTALKAISS"
650
+ ],
651
+ "0": []
652
+ },
653
+ "GFKMALKLLKKVL": {
654
+ "1": [],
655
+ "0": [
656
+ "GFkMALKLLKKVL",
657
+ "GfkMALKLLKKVL"
658
+ ]
659
+ },
660
+ "AFGMALKLLKKVL": {
661
+ "1": [],
662
+ "0": [
663
+ "aFGMALKLLKKVL"
664
+ ]
665
+ },
666
+ "RRLLRLLRLLL": {
667
+ "1": [
668
+ "rrLLrLLrLLL"
669
+ ],
670
+ "0": []
671
+ },
672
+ "KKIIKIIKIII": {
673
+ "1": [
674
+ "kkIIkIIkIII"
675
+ ],
676
+ "0": []
677
+ },
678
+ "RRIIRIIRIII": {
679
+ "1": [
680
+ "rrIIrIIrIII"
681
+ ],
682
+ "0": []
683
+ },
684
+ "KRFKKFFKKVKKSVKKRLKKIFKKPMVIGVTIPF": {
685
+ "1": [],
686
+ "0": [
687
+ "krfkkffkkvkksvkkrlkkifkkpmviGvtipf"
688
+ ]
689
+ },
690
+ "KKRLKKIFKKPMVIGVTIPF": {
691
+ "1": [],
692
+ "0": [
693
+ "kkrlkkifkkpmviGvtipf"
694
+ ]
695
+ },
696
+ "RLFRRVKKVAGKIAKRIWK": {
697
+ "1": [],
698
+ "0": [
699
+ "rlfrrvkkvagkiakriwk"
700
+ ]
701
+ },
702
+ "FIRRIARLLRRIF": {
703
+ "1": [],
704
+ "0": [
705
+ "firriarllrrif"
706
+ ]
707
+ },
708
+ "GIGAVLKVLALISWIKRKR": {
709
+ "1": [],
710
+ "0": [
711
+ "GIGAvLKvLAlISWIkRKR"
712
+ ]
713
+ },
714
+ "WKKLKKLLKKLKKL": {
715
+ "1": [],
716
+ "0": [
717
+ "Wkklkkllkklkkl"
718
+ ]
719
+ },
720
+ "KFWSLLKKALRLWANVL": {
721
+ "1": [
722
+ "kFwSLLkKALRLwANVL"
723
+ ],
724
+ "0": []
725
+ },
726
+ "KFWKLLKKALRLWAKVL": {
727
+ "1": [
728
+ "kFwKLLkKALrLwAkVL"
729
+ ],
730
+ "0": [
731
+ "kFWKlLKkAlrLWAkVL"
732
+ ]
733
+ },
734
+ "WFKKLLKKALRLWKKVL": {
735
+ "1": [
736
+ "wFKKlLKkAlrLWKkVL"
737
+ ],
738
+ "0": []
739
+ },
740
+ "ILLKKLLKKI": {
741
+ "1": [
742
+ "illkkllkki"
743
+ ],
744
+ "0": []
745
+ },
746
+ "GRFKRFRKKFKKLFKKLS": {
747
+ "1": [
748
+ "GRfKRfRKKfKKLfKKLS"
749
+ ],
750
+ "0": [
751
+ "grfkrfrkkfkklfkkls"
752
+ ]
753
+ },
754
+ "RAGLQFPVGRVHRLLRK": {
755
+ "1": [
756
+ "raglqfpvgrvhrllrk"
757
+ ],
758
+ "0": []
759
+ },
760
+ "KLKLLLLLKLK": {
761
+ "1": [
762
+ "klklllllklk"
763
+ ],
764
+ "0": []
765
+ },
766
+ "KLKLLLKLK": {
767
+ "1": [
768
+ "klklllklk"
769
+ ],
770
+ "0": []
771
+ },
772
+ "FIKRIARLLRKIF": {
773
+ "1": [],
774
+ "0": [
775
+ "fikriarllrkif"
776
+ ]
777
+ },
778
+ "INLKAIAALAKKLL": {
779
+ "1": [],
780
+ "0": [
781
+ "inlkaiaalakkll"
782
+ ]
783
+ },
784
+ "FLPLIGRVLSGIL": {
785
+ "1": [],
786
+ "0": [
787
+ "flpligrvlsgil"
788
+ ]
789
+ },
790
+ "KLLKKAGKLLKKAGKLLKKAG": {
791
+ "1": [],
792
+ "0": [
793
+ "KlLkKaGkLlKkAGKlLkKaG"
794
+ ]
795
+ },
796
+ "LLAKKKGLLAKKKGLLAKKKG": {
797
+ "1": [
798
+ "LlAkKkGlLaKkKgLlAkKkG"
799
+ ],
800
+ "0": []
801
+ },
802
+ "RPFTRAQWFAIQHISPRTIAMRAINNYRWR": {
803
+ "1": [],
804
+ "0": [
805
+ "rpftraqwfaiqhisprtiamrainnyrwr"
806
+ ]
807
+ },
808
+ "RLWLAIWRR": {
809
+ "1": [
810
+ "rlwlaiwrr"
811
+ ],
812
+ "0": []
813
+ },
814
+ "KLWLAIWKK": {
815
+ "1": [
816
+ "klwlaiwkk"
817
+ ],
818
+ "0": []
819
+ },
820
+ "FLKLLKKLLFLKLLKKLL": {
821
+ "1": [
822
+ "fLKLLKKLLfLKLLKKLL"
823
+ ],
824
+ "0": []
825
+ },
826
+ "VDKPPYLPRPRPPRRIYNR": {
827
+ "1": [
828
+ "VDKPPYLPRPrpprriynr",
829
+ "VDKPPYLPRPRpPRRIYNR",
830
+ "VDKPPYLPRPRPPRrIYNr"
831
+ ],
832
+ "0": [
833
+ "VDKPPYLPRPRPPRriynr",
834
+ "VDKPPYLPRPRpprriynr",
835
+ "VDKPPYLPRPrPPRRIYNR",
836
+ "VDKPPYLPRpRPPRRIYNR",
837
+ "VDKPPYLPrPRPPRRIYNR",
838
+ "VDKPPYLpRPRPPRRIYNR",
839
+ "VDKPPYlPRPRPPRRIYNR",
840
+ "VDKPPyLPRPRPPRRIYNR",
841
+ "VDKPpYLPRPRPPRRIYNR",
842
+ "VDKppYLPRPRPPRRIYNR",
843
+ "VDKpPYLPRPRPPRRIYNR",
844
+ "vdkppylprprpprriynr",
845
+ "VDKPPYLPRPRPPRRIYNr",
846
+ "VDKPPYLPRPRPPRrIYNR"
847
+ ]
848
+ },
849
+ "VRLIVAVRIWRR": {
850
+ "1": [],
851
+ "0": [
852
+ "vrlivavriwrr"
853
+ ]
854
+ },
855
+ "VRLRWWRRRWRR": {
856
+ "1": [],
857
+ "0": [
858
+ "vrlrwwrrrwrr"
859
+ ]
860
+ },
861
+ "RRW": {
862
+ "1": [],
863
+ "0": [
864
+ "rRW",
865
+ "RrW",
866
+ "RRw",
867
+ "rrW",
868
+ "Rrw",
869
+ "rRw",
870
+ "rrw"
871
+ ]
872
+ },
873
+ "FLGTVLKVAAKVLPAALCQIFKKC": {
874
+ "1": [
875
+ "FlGTVlKVAAKVlPAAlCQIFKKC"
876
+ ],
877
+ "0": [
878
+ "FLGTVLkVAAkVLPAALCQIFkkC"
879
+ ]
880
+ },
881
+ "FLGTVLKVLAKVLPAALCQIFKKC": {
882
+ "1": [
883
+ "FlGTVlKVlAKVlPAAlCQIFKKC"
884
+ ],
885
+ "0": []
886
+ },
887
+ "FLGTVLRVAARVLPAALCQIFRRC": {
888
+ "1": [],
889
+ "0": [
890
+ "FLGTVLrVAArVLPAALCQIFrrC"
891
+ ]
892
+ },
893
+ "RWKIFKKIEKMGRNIRDGIVKAGPAIQVLGSAKAI": {
894
+ "1": [],
895
+ "0": [
896
+ "rwkifkkiekmgrnirdgivkagpaiqvlgsakai"
897
+ ]
898
+ },
899
+ "GPLGVRGKRLWDIVRRWVGWL": {
900
+ "1": [
901
+ "GPlGvRGKRLWDIVRRWVGWL"
902
+ ],
903
+ "0": []
904
+ },
905
+ "RIVQRIKKWLR": {
906
+ "1": [
907
+ "rivqrikkwlr"
908
+ ],
909
+ "0": []
910
+ },
911
+ "KRIWQRIK": {
912
+ "1": [
913
+ "kriwqrik"
914
+ ],
915
+ "0": []
916
+ },
917
+ "KRIWQRIKDF": {
918
+ "1": [
919
+ "kriwqrikdf"
920
+ ],
921
+ "0": []
922
+ },
923
+ "KYKKALKKLAKLL": {
924
+ "1": [
925
+ "kykkalkklakll"
926
+ ],
927
+ "0": []
928
+ },
929
+ "VQWRAIRVRVIR": {
930
+ "1": [
931
+ "vqwrairvrvir"
932
+ ],
933
+ "0": []
934
+ },
935
+ "GFAWNVCVYRNGVRVCHRRAN": {
936
+ "1": [],
937
+ "0": [
938
+ "GfawnvcvyrnGvrvchrran"
939
+ ]
940
+ },
941
+ "RKRWWRWWKWWKR": {
942
+ "1": [],
943
+ "0": [
944
+ "RKrWWrWwkWWkR"
945
+ ]
946
+ },
947
+ "WRWWKWW": {
948
+ "1": [],
949
+ "0": [
950
+ "WrWwkWW"
951
+ ]
952
+ },
953
+ "WWRWWKWW": {
954
+ "1": [],
955
+ "0": [
956
+ "WWrWwkWW"
957
+ ]
958
+ },
959
+ "RRGKKLLLLLKKKG": {
960
+ "1": [
961
+ "rrgkklllllkkkg"
962
+ ],
963
+ "0": []
964
+ },
965
+ "LLWIALRKK": {
966
+ "1": [
967
+ "llwialrkk"
968
+ ],
969
+ "0": []
970
+ },
971
+ "PRPRPRP": {
972
+ "1": [],
973
+ "0": [
974
+ "prprprp"
975
+ ]
976
+ },
977
+ "KWLKKWLKWLKK": {
978
+ "1": [],
979
+ "0": [
980
+ "kwLkkwLkwLkk"
981
+ ]
982
+ },
983
+ "ILRWPWWPWRRK": {
984
+ "1": [],
985
+ "0": [
986
+ "ilrwpwwpwrrk"
987
+ ]
988
+ },
989
+ "KRKIFLRTKILV": {
990
+ "1": [
991
+ "KrKiFlRtKiLv"
992
+ ],
993
+ "0": [
994
+ "kRkIfLrTkIlV"
995
+ ]
996
+ },
997
+ "VLIKTRLFIKRK": {
998
+ "1": [
999
+ "vLiKtRlFiKrK"
1000
+ ],
1001
+ "0": []
1002
+ },
1003
+ "KWKLFKKIEKVGQNIRDGIIKAGPAVAVVGQATQIAK": {
1004
+ "1": [],
1005
+ "0": [
1006
+ "kwklfkkiekvgqnirdgiikagpavavvgqatqiak"
1007
+ ]
1008
+ },
1009
+ "GIGKFLHSAKKFGKAFVGEIMNS": {
1010
+ "1": [
1011
+ "gigkflhsakkfgkafvgeimns"
1012
+ ],
1013
+ "0": []
1014
+ },
1015
+ "KWKLFKKIEKVGQGIGAVLKVLTTGL": {
1016
+ "1": [],
1017
+ "0": [
1018
+ "kwklfkkiekvgqgigavlkvlttgl"
1019
+ ]
1020
+ },
1021
+ "KWKLFKKIGIGAVLKVLTTGLPALIS": {
1022
+ "1": [
1023
+ "kwklfkkigigavlkvlttglpalis"
1024
+ ],
1025
+ "0": []
1026
+ },
1027
+ "KWKLFKKGIGAVLKV": {
1028
+ "1": [
1029
+ "kwklfkkgigavlkv"
1030
+ ],
1031
+ "0": []
1032
+ },
1033
+ "KWKLFKKIGAVLKVL": {
1034
+ "1": [
1035
+ "kwklfkkigavlkvl"
1036
+ ],
1037
+ "0": []
1038
+ },
1039
+ "KWKLFKKGAVLKVLT": {
1040
+ "1": [
1041
+ "kwklfkkgavlkvlt"
1042
+ ],
1043
+ "0": []
1044
+ },
1045
+ "KWKLFKKAVLKVLTT": {
1046
+ "1": [
1047
+ "kwklfkkavlkvltt"
1048
+ ],
1049
+ "0": []
1050
+ },
1051
+ "KWKLFKKVLKVLTTG": {
1052
+ "1": [
1053
+ "kwklfkkvlkvlttg"
1054
+ ],
1055
+ "0": []
1056
+ },
1057
+ "GSKKPVPIIYCNRRTGKCQRM": {
1058
+ "1": [],
1059
+ "0": [
1060
+ "gskkpvpiiycnrrtgkcqrm"
1061
+ ]
1062
+ },
1063
+ "RRWQWRMKK": {
1064
+ "1": [
1065
+ "rrwqwrmkk"
1066
+ ],
1067
+ "0": []
1068
+ },
1069
+ "FKCRRWQWRMKKLGA": {
1070
+ "1": [
1071
+ "fkcrrwqwrmkklga"
1072
+ ],
1073
+ "0": []
1074
+ },
1075
+ "PKLLKTFLSKWIG": {
1076
+ "1": [],
1077
+ "0": [
1078
+ "pkllktflskwig",
1079
+ "pkllktflskwiG"
1080
+ ]
1081
+ },
1082
+ "KLPLIGRVLSGIL": {
1083
+ "1": [
1084
+ "klpligrvlsgil"
1085
+ ],
1086
+ "0": []
1087
+ },
1088
+ "KKHRKHRKHRKHGGSGGSKNLRRIIRKGIHIIKKYG": {
1089
+ "1": [],
1090
+ "0": [
1091
+ "kkhrkhrkhrkhggsggsknlrriirkgihiikkyg"
1092
+ ]
1093
+ },
1094
+ "FKRIVQRIKDFLRNLV": {
1095
+ "1": [],
1096
+ "0": [
1097
+ "FKRiVQRiKDFlRNLV"
1098
+ ]
1099
+ },
1100
+ "GWGSFFKKAAHVGKHVGKAALTHYL": {
1101
+ "1": [],
1102
+ "0": [
1103
+ "gwgsffkkaahvgkhvgkaalthyl",
1104
+ "GwGsffkkaahvGkhvGkaalthyl"
1105
+ ]
1106
+ },
1107
+ "RRGWVLALVLRYGRR": {
1108
+ "1": [],
1109
+ "0": [
1110
+ "RRGWVLALVlRYGRR"
1111
+ ]
1112
+ },
1113
+ "RRGWVLALYLRYGRR": {
1114
+ "1": [],
1115
+ "0": [
1116
+ "RRGWVLALYlRYGRR"
1117
+ ]
1118
+ },
1119
+ "RRGWALRLVLAY": {
1120
+ "1": [],
1121
+ "0": [
1122
+ "RRGWALRLVlAY"
1123
+ ]
1124
+ },
1125
+ "KWKKLLKKPLLKKLLKKL": {
1126
+ "1": [
1127
+ "kwkkllkkpllkkllkkl"
1128
+ ],
1129
+ "0": []
1130
+ },
1131
+ "NKKAGLFVVQFPKKY": {
1132
+ "1": [
1133
+ "nkkaglfvvqfpkky"
1134
+ ],
1135
+ "0": []
1136
+ },
1137
+ "LVKKLLKLAMGFG": {
1138
+ "1": [
1139
+ "lvkkllklamgfg"
1140
+ ],
1141
+ "0": []
1142
+ },
1143
+ "WLRRIKAWLRRIKA": {
1144
+ "1": [
1145
+ "wlrrikawlrrika"
1146
+ ],
1147
+ "0": []
1148
+ },
1149
+ "RRGWARRLAFAFGRR": {
1150
+ "1": [
1151
+ "rrgwarrlafafgrr"
1152
+ ],
1153
+ "0": []
1154
+ },
1155
+ "GKKLLKKLKKLLKKG": {
1156
+ "1": [],
1157
+ "0": [
1158
+ "GKKllKKlKKllKKG"
1159
+ ]
1160
+ },
1161
+ "GLLSVLGSVAKHVLPHVVPVIAEHL": {
1162
+ "1": [],
1163
+ "0": [
1164
+ "GllsvlGsvakhvlphvvpviaehl"
1165
+ ]
1166
+ },
1167
+ "EFKRIVQRIKDFLRNLV": {
1168
+ "1": [],
1169
+ "0": [
1170
+ "EfKRiVQRiKDfLRNLV"
1171
+ ]
1172
+ },
1173
+ "GLFDVIKKVASVIGGL": {
1174
+ "1": [
1175
+ "GlfdvikkvasviGGl"
1176
+ ],
1177
+ "0": []
1178
+ },
1179
+ "GIGKFLKKAKKFGKAFVKILKK": {
1180
+ "1": [
1181
+ "GiGkflkkakkfGkafvkilkk"
1182
+ ],
1183
+ "0": []
1184
+ },
1185
+ "GFKKLLKGAAKALVKTVLF": {
1186
+ "1": [],
1187
+ "0": [
1188
+ "GFKkLLKGAAKALVKTVLF"
1189
+ ]
1190
+ },
1191
+ "GFKDLLKKAAKALVKTVLF": {
1192
+ "1": [],
1193
+ "0": [
1194
+ "GFKDLLKkAAKALVKTVLF"
1195
+ ]
1196
+ },
1197
+ "GFKDLLKGAKKALVKTVLF": {
1198
+ "1": [],
1199
+ "0": [
1200
+ "GFKDLLKGAKkALVKTVLF"
1201
+ ]
1202
+ },
1203
+ "GFKDLLKGAAKALKKTVLF": {
1204
+ "1": [],
1205
+ "0": [
1206
+ "GFKDLLKGAAKALkKTVLF"
1207
+ ]
1208
+ },
1209
+ "GFKDLLKGAAKALVKTVKF": {
1210
+ "1": [],
1211
+ "0": [
1212
+ "GFKDLLKGAAKALVKTVkF"
1213
+ ]
1214
+ },
1215
+ "KLWKKWKKWLK": {
1216
+ "1": [],
1217
+ "0": [
1218
+ "klwkkwkkwlk"
1219
+ ]
1220
+ },
1221
+ "RLWRRWRRWLR": {
1222
+ "1": [
1223
+ "rlwrrwrrwlr"
1224
+ ],
1225
+ "0": []
1226
+ },
1227
+ "GMWSKILGHLIR": {
1228
+ "1": [
1229
+ "GmwskilGhlir"
1230
+ ],
1231
+ "0": [
1232
+ "GMWSKIlGHLIR",
1233
+ "GMWSKiLGHLIR",
1234
+ "GMWSkILGHLIR"
1235
+ ]
1236
+ },
1237
+ "GKWMSLLKHILK": {
1238
+ "1": [
1239
+ "Gkwmsllkhilk"
1240
+ ],
1241
+ "0": [
1242
+ "GKWMSLLKhILK",
1243
+ "GKwMSLLKHILK"
1244
+ ]
1245
+ },
1246
+ "GVCRCVCRRGVCRCVCRR": {
1247
+ "1": [
1248
+ "GvcrcvcrrGvcrcvcrr"
1249
+ ],
1250
+ "0": []
1251
+ },
1252
+ "RGGRLCYCRRRFCVCVGR": {
1253
+ "1": [
1254
+ "rGGrlcycrrrfcvcvGr"
1255
+ ],
1256
+ "0": []
1257
+ },
1258
+ "RRWCFRVCYRGFCYRKCR": {
1259
+ "1": [],
1260
+ "0": [
1261
+ "rrwcfrvcyrGfcyrkcr"
1262
+ ]
1263
+ },
1264
+ "GLFVGLAKVAAHVVPAIAEHF": {
1265
+ "1": [],
1266
+ "0": [
1267
+ "GlfvGlakvaahvvpaiaehf"
1268
+ ]
1269
+ },
1270
+ "ILGKLLKTAAGLLSNL": {
1271
+ "1": [],
1272
+ "0": [
1273
+ "ILGKLLkTAAGLLSNL"
1274
+ ]
1275
+ },
1276
+ "ILGKLLSTAAKLLSNL": {
1277
+ "1": [],
1278
+ "0": [
1279
+ "ILGKLLSTAAkLLSNL"
1280
+ ]
1281
+ },
1282
+ "ILGKLLKTAAKLLSNL": {
1283
+ "1": [],
1284
+ "0": [
1285
+ "ILGKLLkTAAkLLSNL"
1286
+ ]
1287
+ },
1288
+ "WLLKRWKKLL": {
1289
+ "1": [
1290
+ "wllkrwkkll"
1291
+ ],
1292
+ "0": []
1293
+ },
1294
+ "KLLKWWKKLL": {
1295
+ "1": [
1296
+ "kllkwwkkll"
1297
+ ],
1298
+ "0": []
1299
+ },
1300
+ "RRIRPRPPRLPRPRPRPLPYPRP": {
1301
+ "1": [],
1302
+ "0": [
1303
+ "rrIRPRPPRLPRPRPRPLPYPRP"
1304
+ ]
1305
+ },
1306
+ "KRWWKWWRR": {
1307
+ "1": [
1308
+ "krwwkwwrr"
1309
+ ],
1310
+ "0": []
1311
+ },
1312
+ "GIMSSLMKKLKKIIAK": {
1313
+ "1": [
1314
+ "Gimsslmkklkkiiak"
1315
+ ],
1316
+ "0": []
1317
+ },
1318
+ "GILSSLLKKLKKIIAK": {
1319
+ "1": [
1320
+ "Gilssllkklkkiiak"
1321
+ ],
1322
+ "0": []
1323
+ },
1324
+ "GILSSLWKKLKKIIAK": {
1325
+ "1": [],
1326
+ "0": [
1327
+ "Gilsslwkklkkiiak"
1328
+ ]
1329
+ },
1330
+ "FFFLSRIF": {
1331
+ "1": [],
1332
+ "0": [
1333
+ "ffflsrif"
1334
+ ]
1335
+ },
1336
+ "FIRSLFFF": {
1337
+ "1": [
1338
+ "firslfff"
1339
+ ],
1340
+ "0": []
1341
+ },
1342
+ "IKIPSFFRNILKKVGKEAVSLIAGALKQS": {
1343
+ "1": [],
1344
+ "0": [
1345
+ "IKIPSFFrNILKKVGKEAVSLIAGALKQS"
1346
+ ]
1347
+ },
1348
+ "WWWLRKIW": {
1349
+ "1": [
1350
+ "wwwlrkiw"
1351
+ ],
1352
+ "0": []
1353
+ },
1354
+ "LLGMIPVAIKAISALSKL": {
1355
+ "1": [
1356
+ "LlGMIPVAIKAISALSKL"
1357
+ ],
1358
+ "0": []
1359
+ },
1360
+ "RLLRKFFRKLKKSV": {
1361
+ "1": [],
1362
+ "0": [
1363
+ "rllrkffrklkksv"
1364
+ ]
1365
+ },
1366
+ "GGLRSLGRKILRAWKKYGPIIVPIIRIG": {
1367
+ "1": [
1368
+ "GGlrslGrkilrawkkyGpiivpiiriG"
1369
+ ],
1370
+ "0": []
1371
+ },
1372
+ "WKIVFWWRR": {
1373
+ "1": [],
1374
+ "0": [
1375
+ "wkivfwwrr"
1376
+ ]
1377
+ },
1378
+ "RRWRIVVIRVRR": {
1379
+ "1": [
1380
+ "rrwrivvirvrr"
1381
+ ],
1382
+ "0": []
1383
+ },
1384
+ "GFGSLLGKALRLGANVL": {
1385
+ "1": [
1386
+ "GfGsllGkalrlGanvl"
1387
+ ],
1388
+ "0": []
1389
+ },
1390
+ "GFGSLLGKALRLWKKVL": {
1391
+ "1": [],
1392
+ "0": [
1393
+ "GFGSLLGKALRLwKkVL",
1394
+ "GFGSLLGKAlrLwKkVL"
1395
+ ]
1396
+ },
1397
+ "GKWKKILGKLIR": {
1398
+ "1": [],
1399
+ "0": [
1400
+ "GkwkkilGklir"
1401
+ ]
1402
+ },
1403
+ "KKWRKWLKWLAKK": {
1404
+ "1": [],
1405
+ "0": [
1406
+ "kkwrkwlkwlakk"
1407
+ ]
1408
+ },
1409
+ "KWRRWIRWL": {
1410
+ "1": [],
1411
+ "0": [
1412
+ "kwrrwirwl"
1413
+ ]
1414
+ },
1415
+ "RRWVRRVRRWVRRVVRVVRRWVRR": {
1416
+ "1": [
1417
+ "RRWvRRvRRWvRRvvRvvRRWvRR"
1418
+ ],
1419
+ "0": []
1420
+ },
1421
+ "VFRLKKWIQKVI": {
1422
+ "1": [
1423
+ "vfrlkkwiqkvi"
1424
+ ],
1425
+ "0": []
1426
+ },
1427
+ "IVKQIWKKLRFV": {
1428
+ "1": [
1429
+ "ivkqiwkklrfv"
1430
+ ],
1431
+ "0": []
1432
+ },
1433
+ "LPLIAGLWGKIW": {
1434
+ "1": [],
1435
+ "0": [
1436
+ "LPLIAGLwGKIw"
1437
+ ]
1438
+ },
1439
+ "FVQWFSKFLGRIL": {
1440
+ "1": [],
1441
+ "0": [
1442
+ "fqvqwfskflgril"
1443
+ ]
1444
+ },
1445
+ "FVPWFSKFLPRIL": {
1446
+ "1": [],
1447
+ "0": [
1448
+ "FVPWFSKFLpRIL"
1449
+ ]
1450
+ },
1451
+ "FFHHIFRAIVHVAKTIHRLVTG": {
1452
+ "1": [
1453
+ "FFHHIFRaIVHVaKTIHRLVTG"
1454
+ ],
1455
+ "0": []
1456
+ },
1457
+ "HFLKTLVNLAKKIL": {
1458
+ "1": [],
1459
+ "0": [
1460
+ "HFLkTLVNLAKKIL"
1461
+ ]
1462
+ },
1463
+ "HFLGKLVNLAKKIL": {
1464
+ "1": [],
1465
+ "0": [
1466
+ "HFLGkLVNLAKKIL"
1467
+ ]
1468
+ },
1469
+ "HFLGTLKNLAKKIL": {
1470
+ "1": [],
1471
+ "0": [
1472
+ "HFLGTLkNLAKKIL"
1473
+ ]
1474
+ },
1475
+ "HFLGTLVKLAKKIL": {
1476
+ "1": [],
1477
+ "0": [
1478
+ "HFLGTLVkLAKKIL"
1479
+ ]
1480
+ },
1481
+ "HFLGTLVNLAKKIL": {
1482
+ "1": [],
1483
+ "0": [
1484
+ "HFLGTLVNLAkKIL",
1485
+ "HFLGTLVNLAKkIL"
1486
+ ]
1487
+ },
1488
+ "ACPIFTKIQGTYRGRAKCR": {
1489
+ "1": [],
1490
+ "0": [
1491
+ "ACPiFTKiQGTYrGrAKCR"
1492
+ ]
1493
+ },
1494
+ "KLALKLALKALKAAKLA": {
1495
+ "1": [
1496
+ "KLalKLALKALKAAKLA"
1497
+ ],
1498
+ "0": [
1499
+ "klALKLALKALKAAKLA",
1500
+ "KLaLKLALKALKAAKLA",
1501
+ "KLALklALKALKAAKlA",
1502
+ "KLALklALKALKAAKLA",
1503
+ "KLALKLalKALKAALKLA",
1504
+ "KLALKLALkaLKAALKLA",
1505
+ "KLALKLALKAlkAALKLA",
1506
+ "KLALKLALKALKaaLKLA",
1507
+ "KLALKLALKALKAAlkLA",
1508
+ "KLALKLALKALKAALKla"
1509
+ ]
1510
+ },
1511
+ "KWKLFKKIPKFLHLAKKF": {
1512
+ "1": [],
1513
+ "0": [
1514
+ "KWKLFKKIpKFLHLAKKF"
1515
+ ]
1516
+ },
1517
+ "FFGSVLKLIPKIL": {
1518
+ "1": [],
1519
+ "0": [
1520
+ "ffGsvlklipkil"
1521
+ ]
1522
+ },
1523
+ "IKLSPKTKDNLKKVLKGAIKGAIAVAKMV": {
1524
+ "1": [
1525
+ "IKLSPkTKDNLKKVLKGAIKGAIAVAKMV"
1526
+ ],
1527
+ "0": []
1528
+ },
1529
+ "IKLSPETKKNLKKVLKGAIKGAIAVAKMV": {
1530
+ "1": [
1531
+ "IKLSPETKkNLKKVLKGAIKGAIAVAKMV"
1532
+ ],
1533
+ "0": []
1534
+ },
1535
+ "IKLSPKTKKNLKKVLKGAIKGAIAVAKMV": {
1536
+ "1": [
1537
+ "IKLSPkTKkNLKKVLKGAIKGAIAVAKMV"
1538
+ ],
1539
+ "0": []
1540
+ },
1541
+ "GLKKIFKAGLGSLVKGIAAHVAS": {
1542
+ "1": [],
1543
+ "0": [
1544
+ "GLKkIFKAGLGSLVKGIAAHVAS"
1545
+ ]
1546
+ },
1547
+ "GLKKIFKKGLGSLVKGIAAHVAS": {
1548
+ "1": [
1549
+ "GLKkIFKKGLGSLVKGIAAHVAS"
1550
+ ],
1551
+ "0": []
1552
+ },
1553
+ "GLKKIFKAGLGSLKKGIAAHVAS": {
1554
+ "1": [],
1555
+ "0": [
1556
+ "GLKkIFKAGLGSLKKGIAAHVAS"
1557
+ ]
1558
+ },
1559
+ "GLKKIFKAGLGSLVKGIKAHVAS": {
1560
+ "1": [],
1561
+ "0": [
1562
+ "GLKkIFKAGLGSLVKGIKAHVAS"
1563
+ ]
1564
+ },
1565
+ "ILGKLLSTAAGLLSKL": {
1566
+ "1": [
1567
+ "ILGKLLSTAAGLLSkL"
1568
+ ],
1569
+ "0": []
1570
+ },
1571
+ "ILGKLLSTAAKLLSKL": {
1572
+ "1": [],
1573
+ "0": [
1574
+ "ILGKLLSTAAkLLSKL"
1575
+ ]
1576
+ },
1577
+ "GFKRIVQRIKDFLRNLV": {
1578
+ "1": [],
1579
+ "0": [
1580
+ "GFKRiVQRiKDFlRNLV"
1581
+ ]
1582
+ },
1583
+ "GLKALKKVFKGIHKAIKLINNHVQ": {
1584
+ "1": [],
1585
+ "0": [
1586
+ "GLkALKKVFkGIHkAIKLINNHVQ"
1587
+ ]
1588
+ },
1589
+ "KFFKKLKNSVKKRAKKFFKKPRVIGVSIPF": {
1590
+ "1": [],
1591
+ "0": [
1592
+ "kffkklknsvkkrakkffkkprvigvsipf"
1593
+ ]
1594
+ },
1595
+ "KFFKKLKKAVKKGFKKFAKV": {
1596
+ "1": [],
1597
+ "0": [
1598
+ "kffkklkkavkkGfkkfakv"
1599
+ ]
1600
+ },
1601
+ "WGIRRILKYGKRS": {
1602
+ "1": [
1603
+ "wglrrllkygkrs"
1604
+ ],
1605
+ "0": []
1606
+ },
1607
+ "IKKILSKIKKLL": {
1608
+ "1": [],
1609
+ "0": [
1610
+ "IKKILSkIKKLL"
1611
+ ]
1612
+ },
1613
+ "IKKIVSKIKKVLK": {
1614
+ "1": [],
1615
+ "0": [
1616
+ "IkKIVSKIKKVLK"
1617
+ ]
1618
+ },
1619
+ "KGKPRPYPPRPPPHPRPIRV": {
1620
+ "1": [],
1621
+ "0": [
1622
+ "kgkprpypprppphprpirv"
1623
+ ]
1624
+ },
1625
+ "GKWMKLLKKILK": {
1626
+ "1": [],
1627
+ "0": [
1628
+ "Gkwmkllkkilk"
1629
+ ]
1630
+ },
1631
+ "GKWVKLLKKILK": {
1632
+ "1": [],
1633
+ "0": [
1634
+ "Gkwvkllkkilk"
1635
+ ]
1636
+ },
1637
+ "KWMKLLKKILK": {
1638
+ "1": [],
1639
+ "0": [
1640
+ "kwmkllkkilk"
1641
+ ]
1642
+ },
1643
+ "LRRLLlRWLRRLLRR": {
1644
+ "1": [],
1645
+ "0": [
1646
+ "LRRllRWlRRLLRR"
1647
+ ]
1648
+ },
1649
+ "ILKKIWKPIKKLF": {
1650
+ "1": [],
1651
+ "0": [
1652
+ "ILKKIWKpIKKLF"
1653
+ ]
1654
+ },
1655
+ "RWLKLPGRWLKL": {
1656
+ "1": [],
1657
+ "0": [
1658
+ "RWLKLpGRWLKL"
1659
+ ]
1660
+ },
1661
+ "RWFKFPGRWFKF": {
1662
+ "1": [],
1663
+ "0": [
1664
+ "RWFKFpGRWFKF"
1665
+ ]
1666
+ },
1667
+ "RWLRLPGRWLRL": {
1668
+ "1": [],
1669
+ "0": [
1670
+ "RWLRLpGRWLRL"
1671
+ ]
1672
+ },
1673
+ "RWFRFPGRWFRF": {
1674
+ "1": [],
1675
+ "0": [
1676
+ "RWFRFpGRWFRF"
1677
+ ]
1678
+ },
1679
+ "RWLHLPGRWLHL": {
1680
+ "1": [],
1681
+ "0": [
1682
+ "RWLHLpGRWLHL"
1683
+ ]
1684
+ },
1685
+ "RWFHFPGRWFHF": {
1686
+ "1": [],
1687
+ "0": [
1688
+ "RWFHFpGRWFHF"
1689
+ ]
1690
+ },
1691
+ "GIFSKLAPKKIKNLLISGLKG": {
1692
+ "1": [],
1693
+ "0": [
1694
+ "GIFSKLApKKIKNLLISGLKG"
1695
+ ]
1696
+ },
1697
+ "WGRRGWGPGRRYVRW": {
1698
+ "1": [
1699
+ "WGRRGWGpGRRYVRW"
1700
+ ],
1701
+ "0": []
1702
+ },
1703
+ "KKYRYHLKPF": {
1704
+ "1": [
1705
+ "kkyryhlkpf"
1706
+ ],
1707
+ "0": []
1708
+ },
1709
+ "RFLRRIFFFF": {
1710
+ "1": [],
1711
+ "0": [
1712
+ "rflrriffff"
1713
+ ]
1714
+ },
1715
+ "FFFFLRRIF": {
1716
+ "1": [
1717
+ "FFFFLrrIF"
1718
+ ],
1719
+ "0": [
1720
+ "FFFFLrRIF",
1721
+ "FFFFLRrIF"
1722
+ ]
1723
+ },
1724
+ "WLLWIALRKKR": {
1725
+ "1": [
1726
+ "wllwialrkkr"
1727
+ ],
1728
+ "0": []
1729
+ },
1730
+ "WLVWIWRRR": {
1731
+ "1": [
1732
+ "wlvwiwrrr"
1733
+ ],
1734
+ "0": []
1735
+ }
1736
+ }
dataset/train_set_llm_aug.json ADDED
@@ -0,0 +1,2719 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "GIMSSLMKKLAAHIAK": {
3
+ "1": [
4
+ "GIMSSLMkKLAAHIAK",
5
+ "GIMSSLMKkLAAHIAK",
6
+ "GIMSSLMKKLAAHIAk",
7
+ "GIMSSLMkkLAAHIAK",
8
+ "GIMSSLMkKLAAHIAk",
9
+ "GIMSSLMKkLAAHIAk",
10
+ "GIMSSLMkkLAAHIAk",
11
+ "gIMSSLMkKLAAHIAK",
12
+ "GiMSSLMKkLAAHIAK",
13
+ "GIMsSLMKKlAAHIAK",
14
+ "GIMSSlmKkLAAHIAK",
15
+ "GIMSsLMkKLAAHIaK"
16
+ ],
17
+ "0": [
18
+ "gIMSSLMKKLAAHIAK",
19
+ "GImSSLMKKLAAHIAK",
20
+ "GIMsSLMKKLAAHIAK",
21
+ "GIMSSlMKKLAAHIAK",
22
+ "GIMSSLMkklAAHIAK"
23
+ ]
24
+ },
25
+ "ILGTILGLLKSL": {
26
+ "1": [
27
+ "iLGTILGLLKSL",
28
+ "ILgTILGLLKSL",
29
+ "ILGtILGLLKSL",
30
+ "ILGTILGLLKsL",
31
+ "ILGTILGLLKSl"
32
+ ],
33
+ "0": [
34
+ "ILGTILGLLkSL",
35
+ "ilgtilgllksl",
36
+ "ILGTILGLLksL",
37
+ "ILGTILGLLkSl",
38
+ "ILGTIlGLLkSL",
39
+ "ILGTiLGLLkSL"
40
+ ]
41
+ },
42
+ "KRLFKKLLKYLRKF": {
43
+ "1": [
44
+ "KRLFkkLLKYLRkF",
45
+ "krLFkkLLKYLRkF",
46
+ "krLFkkLLkYLrkF",
47
+ "KRlFkkLLKYLRkF",
48
+ "KRLfkkLLKYLRkF",
49
+ "KRLFkkllKYLRkF",
50
+ "KRLFkkLLkYLrkF",
51
+ "KRLFkkLLKyLRkF"
52
+ ],
53
+ "0": [
54
+ "KRLFKKLLKYLRkF",
55
+ "KRLFkKLLKYLRkF",
56
+ "KRLFKkLLKYLRkF",
57
+ "KRlFKKLLKYLRkF",
58
+ "kRLFkKLLKYLRKF",
59
+ "KRLFKKLLkYLRKF"
60
+ ]
61
+ },
62
+ "ILGTILGLLKGL": {
63
+ "1": [
64
+ "ilgtilgllkgl",
65
+ "IlGtiLgllkgl",
66
+ "ILgTilgllkgl",
67
+ "ilgtiLgllkgL",
68
+ "ilGTilgllkgl",
69
+ "ILgtilGllkgl"
70
+ ],
71
+ "0": [
72
+ "ILGTILGLLkGL",
73
+ "ILGTiLgLLKGL",
74
+ "ilgTILGLLKGL",
75
+ "ILgtiLGLLKGL",
76
+ "ILGTILGlLKGL",
77
+ "ILGTILgLLKGl"
78
+ ]
79
+ },
80
+ "IDWKKLLDAAKQIL": {
81
+ "1": [
82
+ "idwkklldaakqil",
83
+ "IDwkkllDaakQil",
84
+ "IDwkkllDAaKQIl",
85
+ "idwKKlldaaKqil",
86
+ "iDwkklLDAakqil",
87
+ "IDwkkLldaakqIl"
88
+ ],
89
+ "0": [
90
+ "IDWkkLLDAAkQIL",
91
+ "iDWKKLLDAAKQIL",
92
+ "IdWKKLLDAAKQIL",
93
+ "IDWKKLLdAAKQIL",
94
+ "IDWKKLLDAaKQIL",
95
+ "IDWKKLLDAAkQIL"
96
+ ]
97
+ },
98
+ "VWRRWRRFWRR": {
99
+ "1": [
100
+ "vWRRWRRFWRR",
101
+ "VwRRWRRFWRR",
102
+ "VWRRwRRFWRR",
103
+ "VWRRWRRfWRR",
104
+ "VWRRWRRFwRR"
105
+ ],
106
+ "0": [
107
+ "vwrrwrrfwrr",
108
+ "VWrrWrrFWrr",
109
+ "VWrrWrrFWRR",
110
+ "VWRRWrrFWrr",
111
+ "VWrrWRRFWrr",
112
+ "VwrrWrrFWrr",
113
+ "VWrrwrrFWrr"
114
+ ]
115
+ },
116
+ "FLKLLKKLL": {
117
+ "1": [
118
+ "fLKLLKKLL",
119
+ "FlKLLKKLL",
120
+ "FLkLLKKLL",
121
+ "flkllkkll",
122
+ "flKLLKKLL",
123
+ "fLkLLKKLL",
124
+ "FlkLLKKLL",
125
+ "flkLLKKLL",
126
+ "flkLLKKll"
127
+ ],
128
+ "0": [
129
+ "FLKlLKKLL",
130
+ "FLKLlKKLL",
131
+ "FLKLLkKLL",
132
+ "FLKLLKkLL",
133
+ "FLKLLKKlL",
134
+ "FLKLLKKLl",
135
+ "FLKllKKLL",
136
+ "FLKlLkKLL",
137
+ "FLKLlkKLL",
138
+ "FLKllkKLL",
139
+ "FLKllKKll"
140
+ ]
141
+ },
142
+ "KKVVFWVKFK": {
143
+ "1": [
144
+ "KKVVFWVKFk",
145
+ "KKVVFWVKfk",
146
+ "KKVVFWVkFk",
147
+ "KKVVFWvKFk",
148
+ "KKVVFwVKFk",
149
+ "KKVVfWVKFk"
150
+ ],
151
+ "0": [
152
+ "KKVVFWVKfK",
153
+ "KKVVFWVkFK",
154
+ "KKVVFWvKFK",
155
+ "KKVVFwVKFK",
156
+ "KKVVfWVKFK",
157
+ "KKVvFWVKFK",
158
+ "KKvVFWVKFK",
159
+ "KkVVFWVKFK",
160
+ "kKVVFWVKFK",
161
+ "kkVVFWVKFK",
162
+ "KKvvFWVKFK",
163
+ "KKVVfWVKfK",
164
+ "KKVVFwvKFK",
165
+ "KKVVFWVkfK"
166
+ ]
167
+ },
168
+ "KRIVKLILKWLR": {
169
+ "1": [
170
+ "KRIVkLILKWLR",
171
+ "KRIVKlILKWLR",
172
+ "KRIVklILKWLR",
173
+ "KRIVkLIlKWLR",
174
+ "KRIVkLILkWLR",
175
+ "KRIVKlilKWLR",
176
+ "KRIVkLiLkWLR"
177
+ ],
178
+ "0": [
179
+ "kRIVKLILKWLR",
180
+ "KRivKLILKWLR",
181
+ "KrIVKLILKWLR",
182
+ "KRIVkliLKWLR",
183
+ "KRIVKLiLKWLR"
184
+ ]
185
+ },
186
+ "KKVVFKVKFKK": {
187
+ "1": [
188
+ "kKVVFKVKFKk",
189
+ "kKVVFKVKFKK",
190
+ "KKVVFKVKFKk",
191
+ "KkVVFKVKFKK",
192
+ "KKVVFKVKFkK",
193
+ "kKVVFKVKFkK"
194
+ ],
195
+ "0": [
196
+ "kkVVFKVKFKK",
197
+ "KKVVFKVKFkk",
198
+ "kkVVFKVKFkk",
199
+ "KKVVFkVKFKK",
200
+ "kkVVFkVKFkk",
201
+ "kkvvfkvkfkk",
202
+ "KKvVFKVKFKK",
203
+ "KKVVfKVKFKK",
204
+ "KkVvFKVKFKK",
205
+ "KKVvFKVkFKK"
206
+ ]
207
+ },
208
+ "KWKSFLKTFKSAKKTVLHTALKAISS": {
209
+ "1": [
210
+ "kWKSFLKTFKSAKKTVLHTALKAISS",
211
+ "KwKSFLKTFKSAKKTVLHTALKAISS",
212
+ "KWkSFLKTFKSAKKTVLHTALKAISS",
213
+ "KWKsFLKTFKSAKKTVLHTALKAISS",
214
+ "KWKSFLkTFKSAKKTVLHTALKAISS"
215
+ ],
216
+ "0": [
217
+ "KWKSFLKTFKSAKkTVLHTALKAISS",
218
+ "KWKSFLKTFKsAKkTVLHTALKAISS",
219
+ "KWKSFLKTFKSAKktVLHTALKAISS",
220
+ "KWKSFLKTFKsAKktVLHTALKAISS",
221
+ "KWKSFLKTFKSaKKTVLHTALKAISS",
222
+ "KWKSFLKTfKSaKKTVLHTALKAISS",
223
+ "KWKSFLKTFKSaKKTvLHTALKAISS",
224
+ "KWKSFLKTfKSaKKTvLHTALKAISS",
225
+ "KWKSFLKTFKSAkKTVLHTALKAISS",
226
+ "kwksflktfksakktvlhtalkaiss",
227
+ "KWKSFLKTfKSAKKTVLHTALKAISS",
228
+ "KWKSFLKTFkSAKKTVLHTALKAISS",
229
+ "KWKSFLKTFKSAKKtVLHTALKAISS",
230
+ "KWKSFLKTFKSAKKTvLHTALKAISS",
231
+ "KWKSFLKTfkSAKKTVLHTALKAISS"
232
+ ]
233
+ },
234
+ "FLPLIIGALSSLLPKIF": {
235
+ "1": [
236
+ "fLPLIIGALSSLLPKIF",
237
+ "FLPLIIGALsSLLPKIF",
238
+ "FLPLIIGALSSLLPkIF",
239
+ "FLPLiIGALSSLLPKIF",
240
+ "FLPLIIgaLSSLLPKIF"
241
+ ],
242
+ "0": [
243
+ "FLPLIIGALSSLLPKiF",
244
+ "FLPLiiGALSSLLPKiF",
245
+ "FLPLIIGALSSLLPkiF",
246
+ "FLPLiIGALSSLLPKiF",
247
+ "FLPLIIGaLSSLLPKiF",
248
+ "FLPLIIGALSSllPKiF",
249
+ "FLPLiiGALSSLLPkiF"
250
+ ]
251
+ },
252
+ "KLKKLLKKWLKLLKKLLK": {
253
+ "1": [
254
+ "KLKKLlKKWLKlLKKLLk",
255
+ "KLKKlLKKWlKLLKkLLK",
256
+ "KLKkLLKkWLKlLKKlLK",
257
+ "KLkKLlKKwLKlLKkLLk",
258
+ "KlKkLlKkWlKlLkKlLk",
259
+ "KLKKLLKKWlkllkkllk",
260
+ "klKKLLKKWLKLLKKLLK",
261
+ "KLKKLLkkWLKLLKKLLK",
262
+ "kLKKLLKKWLKLLKKLLk",
263
+ "KLKKLLKKWLkLLKKLLK"
264
+ ],
265
+ "0": [
266
+ "KlkKLLKKWLKLLKKLLK",
267
+ "KLKklLKKWLKLLKKLLK",
268
+ "KLKKLLKKWLKLLKKlLK",
269
+ "klkKLLKKWLKLLKKLLK",
270
+ "KLKKLLKKWLKLLKKLlk"
271
+ ]
272
+ },
273
+ "KKAAAAAAAAAAAAWAAAAAAKKKK": {
274
+ "1": [
275
+ "kkAAAAAAAAAAAAWAAAAAAKKKK",
276
+ "KKAAAAAAAAAAAAwaAAAAAKKKK",
277
+ "KKAAAAAAAAAAAAWAaaAAAKKKK",
278
+ "KKAAAAAAAAAAAAWAAAaaAKKKK",
279
+ "KKAAAAAAAAAAAAWAAAAaaKKKK",
280
+ "kKAAAAAAAAAAAAwAAAAAAKKKK",
281
+ "KKAAAAAAAAAAAAwAAAaAAKKKK"
282
+ ],
283
+ "0": [
284
+ "KKaaAAAAAAAAAAWAAAAAAKKKK",
285
+ "KKAAaaAAAAAAAAWAAAAAAKKKK",
286
+ "KKAAAAaaAAAAAAWAAAAAAKKKK",
287
+ "KKAAAAAAaaAAAAWAAAAAAKKKK",
288
+ "KKAAAAAAAAaaAAWAAAAAAKKKK",
289
+ "KKAAAAAAAAAAaaWAAAAAAKKKK",
290
+ "KKAAAAAAAAAAAAWAAAAAakKKK",
291
+ "KKAAAAAAAAAAAAWAAAAAAKkkK",
292
+ "KKAAAAAAAaaAAAWAAAAAAKKKK",
293
+ "KKAAAAAAAAAaaAWAAAAAAKKKK"
294
+ ]
295
+ },
296
+ "FVPWFSKFLGRIL": {
297
+ "1": [
298
+ "fVPWFSKFLGRIL",
299
+ "FvPWFSKFLGRIL",
300
+ "FVpWFSKFLGRIL",
301
+ "FVPwFSKFLGRIL",
302
+ "FVPWfSKFLGRIL"
303
+ ],
304
+ "0": [
305
+ "FVPWFSkFLGRIL",
306
+ "FVPWFSKfLGRIL",
307
+ "FVPWFSKFlGRIL",
308
+ "FVPWFSKFLGrIL",
309
+ "FVPWFSKFLGRiL",
310
+ "FVPWFSKFLGRIl",
311
+ "FVPWFSKflGRIL",
312
+ "FVPWFSKfLGrIL",
313
+ "FVPWFSKFlGrIL",
314
+ "FVPWFSKFLgriL"
315
+ ]
316
+ },
317
+ "IRIKIRIK": {
318
+ "1": [
319
+ "irikirik",
320
+ "IRIkIrIK",
321
+ "irikIRIK",
322
+ "IRIKirik",
323
+ "IRikirIK",
324
+ "IRIkiriK",
325
+ "irIKIRik"
326
+ ],
327
+ "0": [
328
+ "IrIkIrIk",
329
+ "iRiKiRiK",
330
+ "iRIkIriK",
331
+ "IRiKirIk",
332
+ "iRIKIRIK",
333
+ "IRIKIRIk"
334
+ ]
335
+ },
336
+ "IIRKIIRK": {
337
+ "1": [
338
+ "iirkiirk",
339
+ "IirKIirK",
340
+ "IIRKiirk",
341
+ "IiRkIiRk",
342
+ "iIrKiIrK",
343
+ "Iirkiirk",
344
+ "iirkiirK"
345
+ ],
346
+ "0": [
347
+ "iIRKIIRK",
348
+ "IIRkIIRK",
349
+ "IirKIIRK",
350
+ "IIRKiiRK",
351
+ "iiRKiirk"
352
+ ]
353
+ },
354
+ "KKLFKKILKYL": {
355
+ "1": [
356
+ "KKLfKKILKYL",
357
+ "KKLFKKILkYL",
358
+ "KKLFKKIlKYL",
359
+ "KKLFkKILKYL",
360
+ "KKlFKKILKYL",
361
+ "KkLFKKILKYL",
362
+ "KKLFKkILKYL",
363
+ "kKLFKKILKYL",
364
+ "KKLFKKIlkYL",
365
+ "KKlFKkILkYL",
366
+ "KKLFKKilkYL",
367
+ "kklfkkilkyl",
368
+ "kkLfKKILKYL",
369
+ "KKLFKKilkyl",
370
+ "KKLFkkilkyl",
371
+ "KKLfkkilkyl",
372
+ "KKlfkkilkyl",
373
+ "Kklfkkilkyl",
374
+ "kklfKKILKYL",
375
+ "kklfkKILKYL",
376
+ "kklfkkILKYL",
377
+ "kklfkkiLKYL",
378
+ "kklfkkilKYL",
379
+ "kklfkkilkYL",
380
+ "kklfkkilkyL",
381
+ "KkLFkKILKYL",
382
+ "kKLFKKILkYL",
383
+ "KKLFKKIlKyL",
384
+ "KKLfKkILKYL",
385
+ "KKlFKKiLKYL"
386
+ ],
387
+ "0": [
388
+ "KKLFKkilkyl",
389
+ "KKLFKKiLKYL",
390
+ "KKLFKKILKyL",
391
+ "KKLFKKILKYl",
392
+ "KKlFKKILkYL",
393
+ "KKLFKKIlkyl",
394
+ "KKLFKKILkyL",
395
+ "kKLFKKILKYl",
396
+ "KkLFKKILKyL",
397
+ "KKLfKKILkyl",
398
+ "KKLFKKiLkyL"
399
+ ]
400
+ },
401
+ "KFFKRLLKSVRRAVKKFRK": {
402
+ "1": [
403
+ "KffkRLLKSVRRAVKKFRK",
404
+ "kFfkrLLkSVrrAVKKfrK",
405
+ "KFfKRLlKSvrRAVKkFRK",
406
+ "kFFKrLLkSVRravKkFrK",
407
+ "KfFKRlLKSVRRAVKKfRK"
408
+ ],
409
+ "0": [
410
+ "kFFkrLLkSVrrAVkkFrk",
411
+ "kffkrllksvrravkkfrk",
412
+ "kffkrLLksvRRaVKKfrk",
413
+ "KFFKrlLkSvrrAVKKfRk",
414
+ "kffKrlLkSVrravkKFRk",
415
+ "KfFkRllKSVrrAvkKfrK",
416
+ "kFFKrlLksvrraVkkfRk"
417
+ ]
418
+ },
419
+ "KWKSFLKTFKSLKKTVLHTLLKAISS": {
420
+ "1": [
421
+ "KWKSFLkTFKSLKKTVLHTLLKAISS",
422
+ "KWKSFLKTFKSLKkTVLHTLLKAISS",
423
+ "KWKSFLKTFKSLKKTVLHTLLkAISS",
424
+ "KWKSFLkTFKSLKkTVLHTLLKAISS",
425
+ "KWKSFLKTFKSLKkTVLHTLLkAISS",
426
+ "KWKSFlKTFKSLKKTVLHTLLKAISS",
427
+ "KWKSFLKTFKSlKKTVLHTLLKAISS",
428
+ "KWKSFLKTFKSLKKTVLHTlLKAISS",
429
+ "KWKSFLKTFKSlKKTVLHTlLKAISS",
430
+ "KWKSFlkTFKSLKKTVLHTLLKAISS",
431
+ "KWKSFLkTFKSLKKTVLHTLLkAISS",
432
+ "KWKSFLKTFKSlKKTVLHTLLkAISS",
433
+ "KWKSFLKTFKSLkKTVLHTlLKAISS",
434
+ "KWKSFLKTFKSLKKTVLHTlLkAISS"
435
+ ],
436
+ "0": [
437
+ "KWKSFLkTFKSLKkTVLHTLLkAISS",
438
+ "KWKSFLkTFkSLKkTVLHTLLkAISS",
439
+ "KWkSFLkTFkSLKkTVLHTLLkAISS",
440
+ "kWkSFLkTFkSLKkTVLHTLLkAISS",
441
+ "KWKSFlKTFKSlKKTVLHTLLKAISS",
442
+ "KWKSFlKTFKSlKKTVLHTlLKAISS",
443
+ "KWKSFlKTFKSlKKTVlHTlLKAISS",
444
+ "KWKSFlKTFKSlKKTVlHTllKAISS",
445
+ "KWKSFlkTFKSlKKTVLHTLLKAISS",
446
+ "KWKSFlKTFkSLKKTVLHTLLKAISS",
447
+ "KWKSFLKTFkSlkKTVLHTLLKAISS",
448
+ "KWKSFLKTFKsLkKTVLHTLLKAISS",
449
+ "kwkSFLKTFKSLKKTVLHTLLKAISS"
450
+ ]
451
+ },
452
+ "GWLDVAKKIGKAAFNVAKNFL": {
453
+ "1": [
454
+ "gWLDVAKKIGKAAFNVAKNFL",
455
+ "GwLDVAKKIGKAAFNVAKNFL",
456
+ "GWlDVAKKIGKAAFNVAKNFL",
457
+ "GWLdVAKKIGKAAFNVAKNFL",
458
+ "GWLDvAKKIGKAAFNVAKNFL"
459
+ ],
460
+ "0": [
461
+ "GWLDvAKKIGKAAFNvAKNFL",
462
+ "GWLDVAKKIGKAAFNvAKNFL",
463
+ "gWLDVAKKIGKAAFNvAKNFL",
464
+ "GwLDVAKKIGKAAFNvAKNFL",
465
+ "GWlDVAKKIGKAAFNvAKNFL",
466
+ "GWLdVAKKIGKAAFNvAKNFL",
467
+ "GWLDVaKKIGKAAFNvAKNFL"
468
+ ]
469
+ },
470
+ "GFGMALKLLKKVL": {
471
+ "1": [
472
+ "GfGmalkllkkvl",
473
+ "GfGMALKLLKKVL",
474
+ "gfGMALKLLKKVL",
475
+ "GfgMALKLLKKVL",
476
+ "GfGmALKLLKKVL",
477
+ "GfGMalkllkkvl",
478
+ "GfGMALKLLKKvl"
479
+ ],
480
+ "0": [
481
+ "GFGMALKLLKKVl",
482
+ "GFGMALKLLKKvL",
483
+ "GFGMALKLLKkVL",
484
+ "GFGMALKLLkKVL",
485
+ "GFGMALKLlKKVL",
486
+ "GFGMALKlLKKVL",
487
+ "GFGMALkLLKKVL",
488
+ "GFGMAlKLLKKVL",
489
+ "GFGMaLKLLKKVL",
490
+ "GFGmALKLLKKVL",
491
+ "gFGMALKLLKKVL",
492
+ "GFgMALKLLKKVL",
493
+ "gFgMALKLLKKVL",
494
+ "GFGmaLKLLKKVL",
495
+ "gFgmalkllkkvl"
496
+ ]
497
+ },
498
+ "RGLRRLGRKIAHGVKKYGPTVLRIIRIA": {
499
+ "1": [
500
+ "rGLRRLGRKIAHGVKKYGPTVLRIIRIA",
501
+ "RGLRRLGRKiAHGVKKYGPTVLRIIRIA",
502
+ "RGLRRLGRKIAHGVkkYGPTVLRIIRIA",
503
+ "RGLRRLGRKIAHGVKKYGPtVLRIIRIa",
504
+ "RGLRrLGRKiahGVKKYGPTVLRIIRIA"
505
+ ],
506
+ "0": [
507
+ "rglrrlgrkiahgvkkygptvlriiria",
508
+ "RGLRRLGRKIAHGVKKYGptvlriiria",
509
+ "RGLRRLGRKIAHgvkkygptvlriiria",
510
+ "RGLRRLGRKIAHGVKKYgpTVLRIIRIA",
511
+ "rglrrlGRKIAHGVKKYGPTVLRIIRIA",
512
+ "RGLRRlgrkiahgvkkYGptvlriiria"
513
+ ]
514
+ },
515
+ "KVLGRLVKVLGRLV": {
516
+ "1": [
517
+ "kVLGRLVKVLGRLV",
518
+ "kvLGRLVKVLGRLV",
519
+ "kVlGRLVKVLGRLV",
520
+ "kVLgRLVKVLGRLV",
521
+ "kVLGrLVKVLGRLV",
522
+ "kVLGRlVKVLGRLV"
523
+ ],
524
+ "0": [
525
+ "KVLGRLVkVLGRLV",
526
+ "kVLGRLVkVLGRLV",
527
+ "KvLGRLVkVLGRLV",
528
+ "KVlGRLVkVLGRLV",
529
+ "KVLgRLVkVLGRLV",
530
+ "KVLGrLVkVLGRLV",
531
+ "KVLGRlVkVLGRLV"
532
+ ]
533
+ },
534
+ "RRLFRRILRWL": {
535
+ "1": [
536
+ "RRLfRRILRWL",
537
+ "RRLFrRILRWL",
538
+ "rrlfrrilrwl",
539
+ "RRLfrRILRWL",
540
+ "RRLfRRILRwL",
541
+ "RRLFrRILrWL",
542
+ "rrlfrRILRWL",
543
+ "RRLfrRILRwL"
544
+ ],
545
+ "0": [
546
+ "rRLFRRILRWL",
547
+ "RrLFRRILRWL",
548
+ "RRlFRRILRWL",
549
+ "RRLFRrILRWL",
550
+ "RRLFRRiLRWL",
551
+ "RRLFRRIlRWL",
552
+ "RRLFRRILrWL",
553
+ "RRLFRRILRwL",
554
+ "RRLFRRILRWl",
555
+ "rRLFRrILRWL",
556
+ "RrLFRRiLRWL",
557
+ "RRlFRRIlRWL",
558
+ "RRLFRrILrWL",
559
+ "RRLFRRILRwl"
560
+ ]
561
+ },
562
+ "KWKSFLKTFKSAVKTVLHTALKAISS": {
563
+ "1": [
564
+ "KWKSFLKTFKSAvKTVLHTALKAISS",
565
+ "KWKSFLKTFKsAVKTVLHTALKAISS",
566
+ "KWKSFLkTfKSAVKTVLHTALKAISS",
567
+ "kWKSFLKTFKSAvkTVLHTALKAISS",
568
+ "KWKSFLKTFKSAVKTVLhTaLKAISS",
569
+ "KwKSfLKTFKsavKTVLHTALKAISS",
570
+ "KWKsFLKtFKSAVKtVLHTALKAISS"
571
+ ],
572
+ "0": [
573
+ "kwksflktfksavktvlhtalkaiss",
574
+ "KWKSFLKTFKSAvKTVLhtaLKAISS",
575
+ "kwksfLKTFKSAVKTVLHTALKAISS",
576
+ "KWKSFlktfkSAVKTVLHTALKAISS",
577
+ "kWKsfLKTFKSAVKTVLHTalkaiss",
578
+ "KwKsFLKTFksAVKtVLHTaLKAISs"
579
+ ]
580
+ },
581
+ "RRWVRRVRRVWRRVVRVVRRWVRR": {
582
+ "1": [
583
+ "rRWVRRVRRVWRRVVRVVRRWVrR",
584
+ "RRwVRRVRRVwRRVVRVVRRWVRR",
585
+ "RRWVRrVRRVWRRVVrVVRRWVrR",
586
+ "RRWVrRVRRVWRRVVRVVRRwVRR",
587
+ "rRWVRRVRRVWrrVVRVVRRWVRr"
588
+ ],
589
+ "0": [
590
+ "RRWVRRvRRVWRRVvRvVRRWvRR",
591
+ "RRWVRRvRRvWRRVvRvvRRWvRR",
592
+ "RRWvRRvRRvWRRvvRvvRRWvRR",
593
+ "RRWvRRVRRVWRRVvRVvRRWvRR",
594
+ "RRWVRRVRRvWRRvVRvvRRWvRR"
595
+ ]
596
+ },
597
+ "TVGGLVKWILKTVKKFA": {
598
+ "1": [
599
+ "tvgglvkwilktvkkfa",
600
+ "TVGGLVKWILkTVKKFA",
601
+ "tVGGLVKWILKTVKKFA",
602
+ "TVgGLVKWILKTVKKFA",
603
+ "TVGGlVKWILKTVKKFA",
604
+ "TVGGLvKWILKTVKKFA",
605
+ "TVGGLVKWILKTVkKFA"
606
+ ],
607
+ "0": [
608
+ "TVGGLVkWILkTVKkFA",
609
+ "TVGGLVkWILkTVKKFA",
610
+ "TVGGLVkWILKTVKkFA",
611
+ "TVGGLVKWILkTVKkFA",
612
+ "TVGGLVkWILkTVKkfA",
613
+ "tVGGLVkWILkTVKkFA"
614
+ ]
615
+ },
616
+ "INLKALAALAKKIL": {
617
+ "1": [
618
+ "INLKAlAALAKKIL",
619
+ "INLKALaALAKKIL",
620
+ "INLKALAALaKKIL",
621
+ "INLKaLAALAKKIL",
622
+ "INLkALaALAKKIL"
623
+ ],
624
+ "0": [
625
+ "iNLKALAALAKKIL",
626
+ "InLKALAALAKKIL",
627
+ "inLKALAALAKKIL",
628
+ "inlkalaalakkil",
629
+ "iNlKALAALAKKIL",
630
+ "iNLKAaAALAKKIL",
631
+ "iNlkALAALAKKIL",
632
+ "InLkALAALAKKIL",
633
+ "inlKALAALAKKIL"
634
+ ]
635
+ },
636
+ "FLSLIPKAIKAVGVKAKKF": {
637
+ "1": [
638
+ "FlSLIPKAIKAVGVKAKKF",
639
+ "FLsLIPKAIKAVGVKAKKF",
640
+ "FLSLiPKAIKAVGVKAKKF",
641
+ "FLSLIPkAkKAVGVKAKKF"
642
+ ],
643
+ "0": [
644
+ "FLSLIPkAIkAVGVkAkkF",
645
+ "FLSLIPkAIKAVGVKAKKF",
646
+ "FLSLIPkAIkAVGVKAkKF",
647
+ "FLSLIPKaiKAVGVKAKKF",
648
+ "FLSLIPKAIkAVgVKAKKF",
649
+ "FLSLIPkAIKAVGvkAKKF",
650
+ "fLSLIPKAIKAVGVKAKkF"
651
+ ]
652
+ },
653
+ "KKLLKLLKLLL": {
654
+ "1": [
655
+ "kkllkllklll",
656
+ "KkLLKLLKLLL",
657
+ "KkLLkLLKLLL",
658
+ "KkLlKLLKLLL",
659
+ "kKLLKLLKLLl",
660
+ "kkLLKLLKLLl",
661
+ "KkllKLLKLLL",
662
+ "kkLLkLLKLLL",
663
+ "KkllKLlKLLL",
664
+ "KkLLKLLkLLL",
665
+ "kklLKLLKLLL",
666
+ "KkLLkLLKLLl",
667
+ "kkLLKLlKLLL",
668
+ "kKlLKLLKLLL"
669
+ ],
670
+ "0": [
671
+ "kkLLKLLKLLL",
672
+ "KKLLKllKLLL",
673
+ "KKLLkllKLLL",
674
+ "KkllKlLKLLL",
675
+ "KKLLkllkLLL",
676
+ "KKllKllKLLL",
677
+ "KKlLkLlKlLL",
678
+ "KKLlkLLklLL",
679
+ "KklLKLLKllL",
680
+ "kkLLKLLKLll",
681
+ "kkLLkLLKLLl",
682
+ "KKllKLLklLL",
683
+ "KklLKlLKlLL",
684
+ "KKllKLlKLlL",
685
+ "KKLlkLLkLLl",
686
+ "KkllKllKLLL",
687
+ "KKllKllKlLL",
688
+ "kkLLkLLKLll",
689
+ "kkLLkLLkLLl",
690
+ "kKLLkllKLLl",
691
+ "KKLlkllkLLL",
692
+ "KkLlKlLkLlL",
693
+ "kKlLkLlKlLl",
694
+ "kKLLKLLKLLL",
695
+ "KKLLKLLKLLl",
696
+ "KKllKLLKLLL",
697
+ "KKLLkLLKLLL",
698
+ "KKLLKlLKLLL"
699
+ ]
700
+ },
701
+ "KKVVFKVKFK": {
702
+ "1": [
703
+ "KKVVfKvKFK",
704
+ "KKVvFKVKFK",
705
+ "kKVVFkVKFK",
706
+ "KKVvFkVkFK",
707
+ "kKvvFKVKFK"
708
+ ],
709
+ "0": [
710
+ "KKVVFKVKFk",
711
+ "kKVVFKVKFk",
712
+ "kkVVFKVKFk",
713
+ "KKVVfkvKFK",
714
+ "kKVVfkvKFK",
715
+ "kkVVFkVkFK",
716
+ "KKVVFKVKfK",
717
+ "kKVVfkKFKk",
718
+ "KKVvFkKfKf",
719
+ "kKVvFkvKfK"
720
+ ]
721
+ },
722
+ "LKLLKKLLKKLLKLL": {
723
+ "1": [
724
+ "LKlLKkLlkKLLkLL",
725
+ "lkLlKKlLKkLLKLL",
726
+ "LkLLkKLlKKlLKLl",
727
+ "LKLlKkLlKkLlKlL",
728
+ "lKLLkKLLkKLLkLL"
729
+ ],
730
+ "0": [
731
+ "KKkLLlLllLLLkKK",
732
+ "LKllkkllKKLLKLL",
733
+ "LkLlKkLlKkLLKLL",
734
+ "lKLLkKLLkklLKLL",
735
+ "lklLKKLLKKLlkll",
736
+ "LKLlkkllkKLLKLL"
737
+ ]
738
+ },
739
+ "KLKLLKLLKLLKLLK": {
740
+ "1": [
741
+ "kLKLLKLLKLLKLLK",
742
+ "KlKLLKLLKLLKLLK",
743
+ "KLKLLkLLKLLKLLK",
744
+ "KLKLLKLLkLLKLLK",
745
+ "KLKLLKLLKLLkLLK"
746
+ ],
747
+ "0": [
748
+ "KLkLLkLlkLLKlLK",
749
+ "KLKLLKLLKLLKLLk",
750
+ "kLkLlKLKlLKLKLL",
751
+ "KkLLkLLlkLkLLKk",
752
+ "KlkLkLKLLkLlKLk",
753
+ "klKLklKLLklKLLk"
754
+ ]
755
+ },
756
+ "KKKLLLLLLLLLKKK": {
757
+ "1": [
758
+ "KKKLLLLlllllKKK",
759
+ "kkkLLLlllLLLkkk"
760
+ ],
761
+ "0": [
762
+ "KKKLLlLLlLLlKKK",
763
+ "kkklllLLLLLLkkk"
764
+ ]
765
+ },
766
+ "KKFKKTAKWLIKSAWLLLKSLALKMK": {
767
+ "1": [
768
+ "kkfkktakwliksawlllkslalkmk",
769
+ "KKFkktaKwliksawlllkslalkmk",
770
+ "kkfKKtaKwliksawlllkslalkmk",
771
+ "kkfkkTAKwliksawlllkslalkmk",
772
+ "kkfkktaKWliksawlllkslalkmk",
773
+ "kkfkktaKwLIksawlllkslalkmk"
774
+ ],
775
+ "0": [
776
+ "KKFKKTAkwliksawlllkslalkmk",
777
+ "KKFKKTAKwliksawlllkslalkmk",
778
+ "KKFKKTAKwlIksawlllkslalkmk",
779
+ "KKFKKTAKwliKSAWLLLKSLALKMK",
780
+ "KKFKKTAKwliksawlllKSLALKMK"
781
+ ]
782
+ },
783
+ "WWWLRRRW": {
784
+ "1": [
785
+ "wWWLRRRW",
786
+ "WwWLRRRW",
787
+ "WWwLRRRW",
788
+ "WWWlRRRW",
789
+ "WWWLrRRW"
790
+ ],
791
+ "0": [
792
+ "wwwlrrrw",
793
+ "Wwwlrrrw",
794
+ "wWwlrrrw",
795
+ "wwWlrrrw",
796
+ "wwwLrrrw",
797
+ "wwwlRrrw"
798
+ ]
799
+ },
800
+ "RRRWWWWV": {
801
+ "1": [
802
+ "rRRWWWWV",
803
+ "RRRWWWWv",
804
+ "RRRwWWWV",
805
+ "RrrWWWWV",
806
+ "RRRwwWWv"
807
+ ],
808
+ "0": [
809
+ "rrrwwwwv",
810
+ "rrRWWWWv",
811
+ "RRRwwwwV",
812
+ "rrrWWWwv",
813
+ "RrwWWWWv",
814
+ "rrRwWWwv"
815
+ ]
816
+ },
817
+ "KWFRVYRGIYRRR": {
818
+ "1": [
819
+ "KwFRVYRGIYRRR",
820
+ "KWfRVYRGIYRRR",
821
+ "KWFRvYRGIYRRR",
822
+ "KWFRVyrGIYRRR",
823
+ "KWFrVYRGiYRRR"
824
+ ],
825
+ "0": [
826
+ "kwfrvyrgiyrrr",
827
+ "kwfrvyrgiyrrR",
828
+ "kWfrvyrgiyrrr",
829
+ "kwfRvyrgiyrrr",
830
+ "kwfrvyRgiyrrr",
831
+ "kwfrvyrgIyRrr"
832
+ ]
833
+ },
834
+ "RRRYIGRYVRFWK": {
835
+ "1": [
836
+ "RRRYIGRYvRFWK",
837
+ "rRRYIGRYVRFWK",
838
+ "RRRyIGRYVRFWK",
839
+ "RRRYIGRyVRFWK",
840
+ "RRRYIGRYVrFWK"
841
+ ],
842
+ "0": [
843
+ "rrryigryvrfwk",
844
+ "rrryigrYVRFWK",
845
+ "RRRYIgryVRFWK",
846
+ "rrryiGRYvRFWK",
847
+ "rRRyIGRYVrfwK",
848
+ "RRRYIGRyVrFwK"
849
+ ]
850
+ },
851
+ "GKIIKLKASLKLL": {
852
+ "1": [
853
+ "gkiiklkaslkll",
854
+ "GkIIKLKASLKLL",
855
+ "GKiIKLKASLKLL",
856
+ "GKIiKLKASLKLL",
857
+ "GKIIkLKASLKLL",
858
+ "GKIIKlKASLKLL"
859
+ ],
860
+ "0": [
861
+ "GKIIKLkASLKLL",
862
+ "GKIIKLKaSLKLL",
863
+ "GKIIKLKAsLKLL",
864
+ "GKIIKLKASlKLL",
865
+ "GKIIKLKASLkLL"
866
+ ]
867
+ },
868
+ "KLFKKLFKKLFK": {
869
+ "1": [
870
+ "KlFkKlFkKlFk",
871
+ "KlfKKlFKKlFk",
872
+ "kLfKkLfKkLfK",
873
+ "KLFkKLFkKLFk",
874
+ "kLFKkLFKkLFK"
875
+ ],
876
+ "0": [
877
+ "kLFkkLFkkLFk",
878
+ "klfkkfkkfkkk",
879
+ "KlfklfklfklK",
880
+ "klkklkkklkkk"
881
+ ]
882
+ },
883
+ "GFFALIPKIISSPLFKTLLSAV": {
884
+ "1": [
885
+ "gFFALIPKIISSPLFKTLLSAV",
886
+ "GFFALIPkiISSPLFKTLLSAV",
887
+ "GFFALIPKIIsSPLFKTLLSAV",
888
+ "GFFALiPKIISSPLFKTLLSAV",
889
+ "GFFALIPKIISSPLFKtLLSAV"
890
+ ],
891
+ "0": [
892
+ "GFFALIpKIISSPLFKTllSAV",
893
+ "GFFALIPKIISSPLFKTLLsaV",
894
+ "GFFALIPKIISSPlFKTLLSAV",
895
+ "GfFALIPKIISSPLfKTLLSAV",
896
+ "GFFALIPKIISSPLFKTlLSAV",
897
+ "GFFALIPkIISSPLFKTLLSAV"
898
+ ]
899
+ },
900
+ "KGFFALIPKIISSPLFKTLLSAV": {
901
+ "1": [
902
+ "kGFFALIPKIISSPLFKTLLSAV",
903
+ "KGfFALIPKIISSPLFKTLLSAV",
904
+ "KGFFALIPkIISSPLFKTLLSAV",
905
+ "KGFFALIPKIISsPLFKTLLSAV",
906
+ "KGFFALIPKIISSPLfKTLLSAV"
907
+ ],
908
+ "0": [
909
+ "KGFFALIpKIISSPLFKTllSAV",
910
+ "KGFFALIPKIISSPLFKTllSAv",
911
+ "KGFFALIPKIISSPLFKTlLSAV",
912
+ "KGFFALIpKIISSPLFKTlLSAV",
913
+ "KGFFALIpKIISSPLFktLLSAV",
914
+ "KGfFALIPKIISsPLFKTllSAV"
915
+ ]
916
+ },
917
+ "RGLRRLGRKIAHGVKKYG": {
918
+ "1": [
919
+ "rglrrlgrkiahgvkkyg",
920
+ "rGLRRLGRKIAHGVKKYG",
921
+ "RgLRRLGRKIAHGVKKYG",
922
+ "RGlRRLGRKIAHGVKKYG",
923
+ "RGLrRLGRKIAHGVKKYG",
924
+ "RGLRrLGRKIAHGVKKYG"
925
+ ],
926
+ "0": [
927
+ "RGLRRlGRKIAHGVKKYG",
928
+ "RGLRRLgRKIAHGVKKYG",
929
+ "RGLRRLGrKIAHGVKKYG",
930
+ "RGLRRLGRkIAHGVKKYG",
931
+ "RGLRRLGRKiAHGVKKYG"
932
+ ]
933
+ },
934
+ "FLGGLIKIVPAMICAVTKKC": {
935
+ "1": [
936
+ "flGGlikivpamicavtkkc",
937
+ "flGGLikivpamicavtkkc",
938
+ "flGGliKivpamicavtkkc",
939
+ "flGGlikivpamicavtKkc",
940
+ "flGGlikivpamicavtkkC"
941
+ ],
942
+ "0": [
943
+ "FLGGlikivpamicavtkkc",
944
+ "flgglikivpamicavtkkc",
945
+ "flGgLIKivpamicavtkkc",
946
+ "fLGGlikivpamicavtkkc",
947
+ "FlGGLikivpamicavtkkc"
948
+ ]
949
+ },
950
+ "AKRLKKLAKKIWKWK": {
951
+ "1": [
952
+ "aKRLKKLAKKIWKWK",
953
+ "AKRlKKLAKKIWKWK",
954
+ "AKRLKKLAKkIWKWK",
955
+ "AKRLKklAKKIWKWK",
956
+ "aKRLKKlAKKIWKkK"
957
+ ],
958
+ "0": [
959
+ "AkRLkkLAkkIWkWk",
960
+ "AKrLkkkAKkIWkWk",
961
+ "AkRLKkLAKKkwKWK",
962
+ "akRLkKLAkkIWKWK",
963
+ "AkRLkklAKKIWKWk",
964
+ "akrLkKkAKkIWKWk"
965
+ ]
966
+ },
967
+ "VDKPPYLPRPRPIRRPGGR": {
968
+ "1": [
969
+ "VDkPPYLPRPRPIRRPGGR",
970
+ "VDKpPYLPRPRPIRRPGGR",
971
+ "VDKPPyLPRPRPIRRPGGR",
972
+ "VDKPPYLPRPRPIRRPGgR",
973
+ "VDKPPYLPRPRPIRRPgGR"
974
+ ],
975
+ "0": [
976
+ "VDKPPYLPrPRPIrRPGGR",
977
+ "VDKPPYLPrPRPIRrPGGR",
978
+ "VDKPPYLPrPRPIRRPGGr",
979
+ "VDKPPYLPRPrPIrRPGGR",
980
+ "VDKPPYLPRPrPIRrPGGR",
981
+ "VDKPPYLPRPrPIRRPGGr",
982
+ "VDKPPYLPRPRPIrRPGGr",
983
+ "VDKPPYLPRPRPIRrPGGr",
984
+ "VDkPPYLPrPrPIrrPGGr",
985
+ "VDKPPYLPRPRPIrrPGGR",
986
+ "VDKPPYLPRPRPIrrPGGr",
987
+ "VDKPPYLPrPrPIRrPGGR",
988
+ "VDKPPYLPRPrPIrrPGGR"
989
+ ]
990
+ },
991
+ "GIGAVLKVLTTGLPALISWIKRKRQQ": {
992
+ "1": [
993
+ "GIGAVlKVLTTGlPALISWiKRKRQQ",
994
+ "gigavlkvlttglpaliswikrkrqq",
995
+ "GIGAvLKVLTTgLPALISwIKRKRQQ",
996
+ "GIGAVLKVlTTGLPALISWIKRkRQQ",
997
+ "GIGAVLKvLTTGLPAlISWiKRKRQQ",
998
+ "gIGAVLkVLTTGLPALiSWIKRKRQQ",
999
+ "GIGaVlKVLTTGlPALISWikRKRQQ"
1000
+ ],
1001
+ "0": [
1002
+ "GIGAVLKVLTTgLPALIsWIKRKRQQ",
1003
+ "GIGAVLKvLTTGLpALISWIKRKRqQ",
1004
+ "gIgAVLKVLTTGLPALISWiKRKRQQ",
1005
+ "GIGAVLKVLTtGLPALISWIKrKRQQ",
1006
+ "GIGAVlKVLTtGLPALiSWIKRKRQq"
1007
+ ]
1008
+ },
1009
+ "FWGALAKGALKLIPSLFSSFSKKD": {
1010
+ "1": [
1011
+ "fwGalakGalklipslfssfskkd",
1012
+ "fwgalakgalklipslfssfskkd",
1013
+ "FwGalakGalklIPSLFSSFSKKD",
1014
+ "FWgALaKGALKliPSlFssfskkd",
1015
+ "fwGAlakgalKLIPsLfSSFSKkD",
1016
+ "FwgaLAkgaLKlipsLfssfSKKd"
1017
+ ],
1018
+ "0": [
1019
+ "FwgalakGAlklIpslFsSFSkKd",
1020
+ "FWGALaKGalkLIPsLFSSfSkkD",
1021
+ "fWGalAKgaLklIpSLfssFSKKd",
1022
+ "FWgALAkgaLkliPSLFSsfSkkD",
1023
+ "FWgaLAKgaLKLIpslFSSfskkd"
1024
+ ]
1025
+ },
1026
+ "IRVKIRVKIRVK": {
1027
+ "1": [
1028
+ "irvkirvkirvk",
1029
+ "irvkirvKirvk",
1030
+ "iRvKiRvKiRvK",
1031
+ "IRvkIRvkIRvk",
1032
+ "IRVkiRVkiRVk",
1033
+ "irvkiRvKirVK"
1034
+ ],
1035
+ "0": [
1036
+ "IrVkIrVkIrVk",
1037
+ "irVKIRVKIRVK",
1038
+ "iRvkIrVKIRVk",
1039
+ "IRvKIrVKiRvK",
1040
+ "iRvKirvKIRVk"
1041
+ ]
1042
+ },
1043
+ "LIKKALAALAKLNI": {
1044
+ "1": [
1045
+ "lIKKALAALAKLNI",
1046
+ "LIkKALAALAKLNI",
1047
+ "LIKkALAALAKLNI",
1048
+ "LIKKAlAALAKLNI",
1049
+ "LIKKALAAlAKLNI"
1050
+ ],
1051
+ "0": [
1052
+ "likkalaalaklni",
1053
+ "likkaLAALAKLNI",
1054
+ "LIKKalAALAKLNI",
1055
+ "LIKKALaaLAKLNI",
1056
+ "LIKKALAALaKLNI",
1057
+ "lIKKALAALAKLNi"
1058
+ ]
1059
+ },
1060
+ "RSMRLSFRARGYGFR": {
1061
+ "1": [
1062
+ "rsmrlsfrarGyGfr",
1063
+ "rsmrlSfRARGyGfR",
1064
+ "RSmRLSFRARGyGfr",
1065
+ "RSMRLsfRaRGygFR",
1066
+ "RSmRLsfRARgyGFR",
1067
+ "RSmrLSFRaRGYGfR"
1068
+ ],
1069
+ "0": [
1070
+ "RSMRLSFRaRgYGFR",
1071
+ "rsmRLSFRARGygFr",
1072
+ "RSmrlSFRARgYgFr",
1073
+ "RSMRLsFrarGyGFr",
1074
+ "rsmRLSFrARGygfR"
1075
+ ]
1076
+ },
1077
+ "GLLKRIKTLL": {
1078
+ "1": [
1079
+ "Gllkriktll",
1080
+ "gllkriktll",
1081
+ "gLlkriktll",
1082
+ "gllkriKtll",
1083
+ "gllkrikTlL",
1084
+ "gllkRikTll"
1085
+ ],
1086
+ "0": [
1087
+ "GLLkRIkTLL",
1088
+ "GLlKRIKTLL",
1089
+ "GLLKRIkTLl",
1090
+ "GLLKrIKTLL",
1091
+ "GllkriktLl",
1092
+ "GLLKriKTLL"
1093
+ ]
1094
+ },
1095
+ "KKLFKKILRYL": {
1096
+ "1": [
1097
+ "KKLfKKILRYL",
1098
+ "KKLFKKiLRYL",
1099
+ "KKLFKKILRyL",
1100
+ "KKlFKKILRYL",
1101
+ "KKLFKKIlRYL",
1102
+ "KKLFKKILRYl"
1103
+ ],
1104
+ "0": [
1105
+ "KKLFKkilryl",
1106
+ "kklfkkilryl",
1107
+ "kkLFKKILRYL",
1108
+ "KKLFKKILryL",
1109
+ "KKlfkkiLRYL",
1110
+ "kKlFKKILRYL",
1111
+ "kklFkkilryl"
1112
+ ]
1113
+ },
1114
+ "FQWQRNMRKVR": {
1115
+ "1": [
1116
+ "fqwqrnmrkvr",
1117
+ "Fqwqrnmrkvr",
1118
+ "fQwqrnmrkvr",
1119
+ "fqWqrnmrkvr",
1120
+ "fqwQrnmrkvr",
1121
+ "fqwqRnmrkvr"
1122
+ ],
1123
+ "0": [
1124
+ "fQWQRNMRKVR",
1125
+ "FqWQRNMRKVR",
1126
+ "FQwQRNMRKVR",
1127
+ "FQWqRNMRKVR",
1128
+ "FQWQrNMRKVR"
1129
+ ]
1130
+ },
1131
+ "KKKKKKAAFAAWAAFAA": {
1132
+ "1": [
1133
+ "kkkkkkAAFAAWAAFAA",
1134
+ "KKKKKKaafaaWaafaa",
1135
+ "KkKkKkAaFaAwAaFaA",
1136
+ "KKKKKKAAFAAwaafaa",
1137
+ "KKKKKKAAfaawaafAA"
1138
+ ],
1139
+ "0": [
1140
+ "kkkkkkaafaawaafaa",
1141
+ "KKKKKKAAFAAwAAFAA",
1142
+ "kkKKKKAAFAAWAAFAA",
1143
+ "KKKKKKAAFAAWAAFaa",
1144
+ "KKKKKKaaFaaWaaFaa",
1145
+ "KKkkkkAAFAAWAAFAA"
1146
+ ]
1147
+ },
1148
+ "RRWWRF": {
1149
+ "1": [
1150
+ "rRWWRF",
1151
+ "RrWWRF",
1152
+ "RRWwRF",
1153
+ "RRWWrF",
1154
+ "RRWWRf",
1155
+ "rrWWRF",
1156
+ "rRWwRF",
1157
+ "rRWWrF",
1158
+ "rRWWRf",
1159
+ "RrWwRF"
1160
+ ],
1161
+ "0": [
1162
+ "rrwwrf",
1163
+ "RRwWRF",
1164
+ "rRwWRF",
1165
+ "RrwWRF",
1166
+ "RRwwRF",
1167
+ "RRwWrF",
1168
+ "RRwWRf"
1169
+ ]
1170
+ },
1171
+ "KWKSFLKTFKSALKTVLHTALKAISS": {
1172
+ "1": [
1173
+ "KWKSFLKTFKSAlKTVLHTALKAISS",
1174
+ "KWKSFlKTFKSALKTVLHTALKAISS",
1175
+ "KWKSFLKtFKSALKTVLHTALKAISS",
1176
+ "KWKSFLKTFKSaLKTVLHTALKAISS",
1177
+ "KWKSFLKTFKSALKTVlHTALKAISS",
1178
+ "KWKSFLKTFKSALKTVLHtALKAISS"
1179
+ ],
1180
+ "0": [
1181
+ "kWKSFLKTFKSALKTVLHTALKAISS",
1182
+ "KwKSFLKTFKSALKTVLHTALKAISS",
1183
+ "KWkSFLKTFKSALKTVLHTALKAISS",
1184
+ "KWKSfLKTFKSALKTVLHTALKAISS"
1185
+ ]
1186
+ },
1187
+ "KWKSFLKTFKSAAKTVLHTALKAISS": {
1188
+ "1": [
1189
+ "KWKSFLKTFKSAaKTVLHTALKAISS",
1190
+ "KWKSFLKTFKsaAKTVLHTALKAISS",
1191
+ "KWKSFLKTFKSAAkTVLHTALKAISS",
1192
+ "KWKSFLKTFKSAAKTvLHTALKAISS",
1193
+ "KWKSFLKTFKSAAKTVLHTaLKAISS",
1194
+ "KWKSFLKTFKSAAKTVLHTALKaISS"
1195
+ ],
1196
+ "0": [
1197
+ "kWKSFLKTFKSAAKTVLHTALKAISS",
1198
+ "KwKSFLKTFKSAAKTVLHTALKAISS",
1199
+ "KWKSfLKTFKSAAKTVLHTALKAISS",
1200
+ "KWKSFLKTFKSAAKTVLHTALKAIsS",
1201
+ "KWKSFLKTFKSAAKTVLHTALKAISs"
1202
+ ]
1203
+ },
1204
+ "KWKSFLKTFKSASKTVLHTALKAISS": {
1205
+ "1": [
1206
+ "kWKSFLKTFKSASKTVLHTALKAISS",
1207
+ "KwKSFLKTFKSASKTVLHTALKAISS",
1208
+ "KWKsFLKTFKSASKTVLHTALKAISS",
1209
+ "KWKSFlKTFKSASKTVLHTALKAISS",
1210
+ "KWKSFLKtFKSASKTVLHTALKAISS"
1211
+ ],
1212
+ "0": [
1213
+ "KWKSFLKTFKSAsKTVLHTALKAISS",
1214
+ "kWKSFLKTFKSAsKTVLHTALKAISS",
1215
+ "KWkSFLKTFKSAsKTVLHTALKAISS",
1216
+ "KWKSfLKTFKSAsKTVLHTALKAISS",
1217
+ "KWKSFLkTFKSAsKTVLHTALKAISS",
1218
+ "KWKSFLKTfKSAsKTVLHTALKAISS"
1219
+ ]
1220
+ },
1221
+ "KWKSFLKTFKLAVKTVLHTALKAISS": {
1222
+ "1": [
1223
+ "KWKSFLKTFKlAVKTVLHTALKAISS",
1224
+ "KWKSFLKtFKLAVKTVLHTALKAISS",
1225
+ "KWKSFLKTFKLAVKtvLHTALKAISS",
1226
+ "KWKSFLKTFKLAvKTVLHTALKAISS",
1227
+ "kWKSFLKTFKLAVKTVLHTALKAISS",
1228
+ "KWKsFLKTFKLAVKTVLHTALKAISS"
1229
+ ],
1230
+ "0": [
1231
+ "KWKSFLKTFkLAVKTVLHTALKAISS",
1232
+ "KWKSFLKTFKLAVKTVLhTALKAISS",
1233
+ "KWKSFlKTFKLAVKTVLHTALKAISS",
1234
+ "KWKSFLKTFKLAVKTVLHTALkAISS",
1235
+ "KWKSFLKTFKLAVKTVLHTAlKAISS"
1236
+ ]
1237
+ },
1238
+ "KWKSFLKTFKVAVKTVLHTALKAISS": {
1239
+ "1": [
1240
+ "KWKSFLKTFKvAVKTVLHTALKAISS",
1241
+ "kWKSFLKTFKVAVKTVLHTALKAISS",
1242
+ "KWKSfLKTFKVAVKTVLHTALKAISS",
1243
+ "KWKSFLKTFKVaVKTVLHTALKAISS",
1244
+ "KWKSFLKTFKVAVKtVLHTALKAISS",
1245
+ "KWKSFLKTFKVAVKTVLHtALKAISS"
1246
+ ],
1247
+ "0": [
1248
+ "KWKSFlKTFKVAVKTVLHTALKAISS",
1249
+ "KWKSFLkTfKVAVKTVLHTALKAISS",
1250
+ "KWKSFLKTFKVavKTVLHTALKAISS",
1251
+ "KWKSFLKTFKVAVKTVlHTALKAISS",
1252
+ "KWKSFLKTFKVAVKTVLHTALKaiSS"
1253
+ ]
1254
+ },
1255
+ "KWKSFLKTFKAAVKTVLHTALKAISS": {
1256
+ "1": [
1257
+ "KWKSFLKTFKaAVKTVLHTALKAISS",
1258
+ "KWKSFLKtFKAAVKTVLHTALKAISS",
1259
+ "KWKSFLKTFkAAVKTVLHTALKAISS",
1260
+ "KWKSFLKTFKAvVKTVLHTALKAISS",
1261
+ "KWKSFLKTFKAAVKtVLHTALKAISS",
1262
+ "KWKSFLKTFKAAVKTvLHTALKAISS"
1263
+ ],
1264
+ "0": [
1265
+ "kWKSFLKTFKAAVKTVLHTALKAISS",
1266
+ "KWKsFLKTFKAAVKTVLHTALKAISS",
1267
+ "KWKSFlKTFKAAVKTVLHTALKAISS",
1268
+ "KWKSFLKTfKAAVKTVLHTALKAISS",
1269
+ "KWKSFLKTFKAAVKTVLhTALKAISS"
1270
+ ]
1271
+ },
1272
+ "KWKSFLKTFKKAVKTVLHTALKAISS": {
1273
+ "1": [
1274
+ "KWKSFLKTFKkAVKTVLHTALKAISS",
1275
+ "kWKSFLKTFKKAVKTVLHTALKAISS",
1276
+ "KWkSFLKTFKKAVKTVLHTALKAISS",
1277
+ "KWKSFLkTFKKAVKTVLHTALKAISS",
1278
+ "KWKSFLKTFKKAVkTVLHTALKAISS"
1279
+ ],
1280
+ "0": [
1281
+ "KwKSFLKTFKKAVKTVLHTALKAISS",
1282
+ "KWKsFLKTFKKAVKTVLHTALKAISS",
1283
+ "KWKSfLKTFKKAVKTVLHTALKAISS",
1284
+ "KWKSFlKTFKKAVKTVLHTALKAISS",
1285
+ "KWKSFLKtFKKAVKTVLHTALKAISS"
1286
+ ]
1287
+ },
1288
+ "GFKMALKLLKKVL": {
1289
+ "1": [
1290
+ "GFKMALKLLKKvl",
1291
+ "GFKMALklLKKVL",
1292
+ "GFKMALKLLKkVl",
1293
+ "GFKMALKLLKkvl"
1294
+ ],
1295
+ "0": [
1296
+ "GFkMALKLLKKVL",
1297
+ "GfkMALKLLKKVL",
1298
+ "GfKMALKLLKKVL",
1299
+ "GFkMaLKLLKKVL",
1300
+ "GFKMALKLLkKVL",
1301
+ "GfkMaLKLLKKVL",
1302
+ "GfKmaLKLLKKVL"
1303
+ ]
1304
+ },
1305
+ "AFGMALKLLKKVL": {
1306
+ "1": [
1307
+ "AFGMALKLLKKvL",
1308
+ "AFGMALKLLKKVl",
1309
+ "AFGmALKLLKKVL",
1310
+ "AFGMaLKLLKKVL",
1311
+ "AFGMAlKLLKKVL"
1312
+ ],
1313
+ "0": [
1314
+ "aFGMALKLLKKVL",
1315
+ "aFGMALkLLKKVL",
1316
+ "AFgMALKLLKKVL",
1317
+ "aFGMALKLLKKvL",
1318
+ "AFGMALKllKKVL",
1319
+ "AFGMALKLLkkVL"
1320
+ ]
1321
+ },
1322
+ "RRLLRLLRLLL": {
1323
+ "1": [
1324
+ "rrLLrLLrLLL",
1325
+ "rrlLrLLrLLL",
1326
+ "rrLlrLLrLLL",
1327
+ "rRLLrlLrLLL",
1328
+ "rrLLrLlrLLL",
1329
+ "rRlLrLLrLLL"
1330
+ ],
1331
+ "0": [
1332
+ "rRLLRLLRLLL",
1333
+ "RrLLRLLRLLL",
1334
+ "RRlLRLLRLLL",
1335
+ "rRlLRLLRLLL",
1336
+ "RRLLrLLRLLL"
1337
+ ]
1338
+ },
1339
+ "KKIIKIIKIII": {
1340
+ "1": [
1341
+ "kkIIkIIkIII",
1342
+ "kkiIkIIkIII",
1343
+ "kkIikIIkIII",
1344
+ "kkIIkiIkIII",
1345
+ "kkIIkIikIII",
1346
+ "kkIIkIIkiII"
1347
+ ],
1348
+ "0": [
1349
+ "kKIIKIIKIII",
1350
+ "KkIIKIIKIII",
1351
+ "KKIIKIIkIII",
1352
+ "KKiiKIIKIII",
1353
+ "KKIIKIIKIIi"
1354
+ ]
1355
+ },
1356
+ "RRIIRIIRIII": {
1357
+ "1": [
1358
+ "RRIIRIIRIII",
1359
+ "RRIIRIIRIIi",
1360
+ "RRIiRIIRIII",
1361
+ "RRIIRiIRIII",
1362
+ "RRIIRIiRIII"
1363
+ ],
1364
+ "0": [
1365
+ "rrIIrIIrIII",
1366
+ "rRIIRIIRIII",
1367
+ "RrIIRIIRIII",
1368
+ "RRIIrIIRIII",
1369
+ "RRIIRIIrIII",
1370
+ "rRIIrIIRIII"
1371
+ ]
1372
+ },
1373
+ "ALWKKLLKK": {
1374
+ "1": [
1375
+ "AlWkkllkk",
1376
+ "aLWkkllkk",
1377
+ "AlWkkllkK",
1378
+ "ALWkkllkk",
1379
+ "alwkkllKK",
1380
+ "alWkkllKk"
1381
+ ],
1382
+ "0": [
1383
+ "ALwkkLLKK",
1384
+ "aLwKkLLKK",
1385
+ "ALWKKllKK",
1386
+ "ALWKKLLkk",
1387
+ "ALwKKLLKK"
1388
+ ]
1389
+ },
1390
+ "KRFKKFFKKVKKSVKKRLKKIFKKPMVIGVTIPF": {
1391
+ "1": [
1392
+ "kRFKKFFKKVKKSVKKRLKKIFKKPMVIGVTIPF",
1393
+ "KRFKKFFKKVKKSVKKRLKKIFKKPMVIGVTIpF",
1394
+ "KRFKKFFKKVKKSVKKRlKKIFKKPMVIGVTIPF",
1395
+ "KRFKKFFKKvKKSVKKRLKkIFKKPMVIGVTIPF",
1396
+ "KRFKKFFKKVKKSVKKRLKKIFKKPMVIGvtIPF"
1397
+ ],
1398
+ "0": [
1399
+ "krfkkffkkvkksvkkrlkkifkkpmviGvtipf",
1400
+ "Krfkkffkkvkksvkkrlkkifkkpmvigvtipf",
1401
+ "krfkkffkkvkksvkkrLkkifkkpmviGvtipf",
1402
+ "krfkkffkkVkksvkkrlkkifkkpmvigvtipF",
1403
+ "krfkkffkkvkksvKkrlkkifkkpmviGvtipf"
1404
+ ]
1405
+ },
1406
+ "KKRLKKIFKKPMVIGVTIPF": {
1407
+ "1": [
1408
+ "kKRLKKIFKKPMVIGVTIPF",
1409
+ "KKRLKKIFKKPMVIGVTIPf",
1410
+ "kkRLKKIFKKPMVIGVTIPF",
1411
+ "KKRLKKIFKKPMVIGVTIpf",
1412
+ "kKRLKKIFKKPMVIGVTIPf"
1413
+ ],
1414
+ "0": [
1415
+ "kkrlkkifkkpmviGvtipf",
1416
+ "Kkrlkkifkkpmvigvtipf",
1417
+ "kkrlKkifkkpmvigvtipf",
1418
+ "kkrlkkifkKpmvigvtipf",
1419
+ "kkrlkkifkkpmvIgvtipf",
1420
+ "kkrlkkifkkpmvigVtipf"
1421
+ ]
1422
+ },
1423
+ "RLFRRVKKVAGKIAKRIWK": {
1424
+ "1": [
1425
+ "rLFRRVKKVAGKIAKRIWK",
1426
+ "RLfrRVKKVAGKIAKRIWK",
1427
+ "RLFRRVKKVAGKiAKRIWK",
1428
+ "RLFRRVKKVAGKIAKrIWK",
1429
+ "RLFRRvkkVAGKIAKRIWK"
1430
+ ],
1431
+ "0": [
1432
+ "rlfrrvkkvagkiakriwk",
1433
+ "rlFrrVKKVAGKIAKRIWK",
1434
+ "RLFrRVKKVAGKIAkRIWK",
1435
+ "RLFRRvKKVagkIAKRIWK",
1436
+ "RLFRRVKKVAgkiakriWK",
1437
+ "RLFRRVKKvAGKIAKrIwK"
1438
+ ]
1439
+ },
1440
+ "FIRRIARLLRRIF": {
1441
+ "1": [
1442
+ "fIRRIARLLRRIF",
1443
+ "FiRRIARLLRRIF",
1444
+ "FIrRIARLLRRIF",
1445
+ "FIrrIARLLRRIF",
1446
+ "FIRRiARLLRRIF"
1447
+ ],
1448
+ "0": [
1449
+ "firriarllrrif",
1450
+ "fiRRIARLLRRIF",
1451
+ "firRIARLLRRIF",
1452
+ "firrIARLLRRIF",
1453
+ "firriARLLRRIF",
1454
+ "firriaRLLRRIF"
1455
+ ]
1456
+ },
1457
+ "GIGAVLKVLALISWIKRKR": {
1458
+ "1": [
1459
+ "gIGAVLKVLALISWIKRKR",
1460
+ "GIGaVLKVLALISWIKRKR",
1461
+ "GIGAVLkVLALISWIKRKR",
1462
+ "GIGAVLKVLAlISWIKRKR",
1463
+ "GIGAVLKVLALISWiKRKR"
1464
+ ],
1465
+ "0": [
1466
+ "GIGAvLKvLAlISWIkRKR",
1467
+ "GIGAvLKVLALISWIKRKR",
1468
+ "GIGAVLKvLALISWIKRKR",
1469
+ "GIGAVLKVLAlISWIKRKR"
1470
+ ]
1471
+ },
1472
+ "FKCRRWQWRMKKLG": {
1473
+ "1": [
1474
+ "fkcrrwqwrmkklg",
1475
+ "Fkcrrwqwrmkklg",
1476
+ "fKcrrwqwrmkklg",
1477
+ "fkCrrwqwrmkklg",
1478
+ "fkcRrwqwrmkklg",
1479
+ "fkcrRwqwrmkklg"
1480
+ ],
1481
+ "0": [
1482
+ "fKCRRWQWRMKKLG",
1483
+ "FkCRRWQWRMKKLG",
1484
+ "FKcRRWQWRMKKLG",
1485
+ "FKCrRWQWRMKKLG",
1486
+ "FKCRrWQWRMKKLG"
1487
+ ]
1488
+ },
1489
+ "WKKLKKLLKKLKKL": {
1490
+ "1": [
1491
+ "WKKlKKLLKKLKKL",
1492
+ "WKKLKKlLKKLKKL",
1493
+ "WKKLKKLlKKLKKL",
1494
+ "WKKLKKLLKKlKKL",
1495
+ "WKKLKKLLKKLKKl"
1496
+ ],
1497
+ "0": [
1498
+ "Wkklkkllkklkkl",
1499
+ "wKKLKKLLKKLKKL",
1500
+ "wkKLKKLLKKLKKL",
1501
+ "wkkLKKLLKKLKKL",
1502
+ "wkklKKLLKKLKKL",
1503
+ "wkklkKLLKKLKKL"
1504
+ ]
1505
+ },
1506
+ "KFWSLLKKALRLWANVL": {
1507
+ "1": [
1508
+ "kFwSLLkKALRLwANVL",
1509
+ "kFwSLLkKALRLwANvL",
1510
+ "KFwSLLkKALRLwANVL",
1511
+ "kFwSLLKKALRLwANVL",
1512
+ "kFWSLLkKALRLwANVL",
1513
+ "kFWsLLkKALRLwANVL"
1514
+ ],
1515
+ "0": [
1516
+ "KFWSLLKKALRLWANVL",
1517
+ "kFWSLLKKALRLWANVL",
1518
+ "kfWSLLKKALRLWANVL",
1519
+ "KFWSLLKKALRLWANvL",
1520
+ "KFWSllKKALRLWANVL"
1521
+ ]
1522
+ },
1523
+ "KFWKLLKKALRLWAKVL": {
1524
+ "1": [
1525
+ "kFwKLLkKALrLwAkVL",
1526
+ "KfWkLlKkAlRlWAKVL",
1527
+ "kFWKLLkKAlRLwAKvL",
1528
+ "KfWKLLKkALrLWaKVl",
1529
+ "KFwKLlKKaLRlWAkvL",
1530
+ "kfWKLLkkALrLWAKvL"
1531
+ ],
1532
+ "0": [
1533
+ "kFWKlLKkAlrLWAkVL",
1534
+ "kFWKLlkkalRLWAKVL",
1535
+ "KfwkllKKALRLWAKvL",
1536
+ "KFwkllkKALRLWAKVl",
1537
+ "kfwKLLKKALRLWAkvl",
1538
+ "KFWKLLKKalrlwaKVL"
1539
+ ]
1540
+ },
1541
+ "WFKKLLKKALRLWKKVL": {
1542
+ "1": [
1543
+ "wFKKlLKkAlrLWKkVL",
1544
+ "wFKKlLKKAlrlWKkVL",
1545
+ "wFKKlLKkAlRlWKkVL",
1546
+ "wFKKlLkkAlrLWKkVL",
1547
+ "wfKKlLKkAlrLWKkVL",
1548
+ "wFKKlLKkALrLWkkVL"
1549
+ ],
1550
+ "0": [
1551
+ "WFKKLlKKALRLWKKVL",
1552
+ "WFKKLLKkaLRLWKKVL",
1553
+ "WFkKLLKKALRLWKKVL",
1554
+ "WFKKlLKKALRLWKKVL",
1555
+ "WFKKLLKKALrlWKkVL"
1556
+ ]
1557
+ },
1558
+ "ACPIFTKIQGTYRGRAKCR": {
1559
+ "1": [
1560
+ "aCPIFTKIQGTYRGRAKCR",
1561
+ "AcPIFTKIQGTYRGRAKCR",
1562
+ "ACpIFTKIQGTYRGRAKCR",
1563
+ "ACPIfTKIQGTYRGRAKCR",
1564
+ "ACPIFtKIQGTYRGRAKCR"
1565
+ ],
1566
+ "0": [
1567
+ "ACPiFTKiQGTYrGrAKCR",
1568
+ "ACPiFTKiQGTYrGrAKCr",
1569
+ "aCPiFTKiQGTYrGrAKCR",
1570
+ "AcPiFTKiQGTYrGrAKCR",
1571
+ "ACPifTKiQGTYrGrAKCR",
1572
+ "ACPiFTKiQGTYrGrAkCR"
1573
+ ]
1574
+ },
1575
+ "ILLKKLLKKI": {
1576
+ "1": [
1577
+ "illkkllkki",
1578
+ "Illkkllkki",
1579
+ "iLlkkllkki",
1580
+ "ilLkkllkki",
1581
+ "illKkllkki",
1582
+ "illkKllkki"
1583
+ ],
1584
+ "0": [
1585
+ "iLLKKLLKKI",
1586
+ "IlLKKLLKKI",
1587
+ "ILlKKLLKKI",
1588
+ "ILLkKLLKKI",
1589
+ "ILLKkLLKKI"
1590
+ ]
1591
+ },
1592
+ "GRFKRFRKKFKKLFKKLS": {
1593
+ "1": [
1594
+ "grfkrfrkkfkklfkkls",
1595
+ "Grfkrfrkkfkklfkkls",
1596
+ "gRfkrfrkkfkklfkkls",
1597
+ "grfkrfrkkfkklfkklS",
1598
+ "grfkrFrkkfKklfkkls"
1599
+ ],
1600
+ "0": [
1601
+ "gRFKRFRKKFKKLFKKLS",
1602
+ "GRFKRFRKKFKKLFKKLs",
1603
+ "gRFKRFRKKFKKLFKKLs",
1604
+ "grfKRFRKKFKKLFKKLS",
1605
+ "grfkRFRKKFKKLFKKLS"
1606
+ ]
1607
+ },
1608
+ "RAGLQFPVGRVHRLLRK": {
1609
+ "1": [
1610
+ "raglqfpvgrvhrllrk",
1611
+ "Raglqfpvgrvhrllrk",
1612
+ "rAglqfpvgrvhrllrk",
1613
+ "rAgLqfpvgrvhrllrk",
1614
+ "RaglqfpvgrVhrllrk",
1615
+ "raglqfpvgrVhrllrk"
1616
+ ],
1617
+ "0": [
1618
+ "rAGLQFPVGRVHRLLRK",
1619
+ "RaglQFPVGRVHRLLRK",
1620
+ "RAGLQfpVGRVHRLLRK",
1621
+ "RAGLQFPvgRVHRLLRK",
1622
+ "RAGLQFPVGRvhRLLRK"
1623
+ ]
1624
+ },
1625
+ "KLKLLLLLKLK": {
1626
+ "1": [
1627
+ "klklllllklk",
1628
+ "KLklllllklk",
1629
+ "klKLllllklk",
1630
+ "KLKlLllLklk",
1631
+ "klkllLlklKk",
1632
+ "klKlLlLLKlk"
1633
+ ],
1634
+ "0": [
1635
+ "kLklLLlllkK",
1636
+ "KLkLlllLkLk",
1637
+ "KlklllllKLK",
1638
+ "kLKLLlLKkLk",
1639
+ "KLKLllLLkKk"
1640
+ ]
1641
+ },
1642
+ "KLKLLLKLK": {
1643
+ "1": [
1644
+ "klklllklk",
1645
+ "kLKLLLKLK",
1646
+ "KLKllLKLK",
1647
+ "kLkLlLkLk",
1648
+ "kLKlllkLK"
1649
+ ],
1650
+ "0": [
1651
+ "KLkLLLKLK",
1652
+ "KlKlllklK",
1653
+ "kLKLllKLk",
1654
+ "KLKLLkLKk"
1655
+ ]
1656
+ },
1657
+ "FIKRIARLLRKIF": {
1658
+ "1": [
1659
+ "fIKRIARLLRKIF",
1660
+ "FIKRIArLLRKIF",
1661
+ "FIKrIARLLRKIF",
1662
+ "fIKRIARLLRKIf",
1663
+ "FIkRIARLLrKIF"
1664
+ ],
1665
+ "0": [
1666
+ "fikriarllrkif",
1667
+ "Fikriarllrkif",
1668
+ "fikriarllrkiF",
1669
+ "fikriArllrkif",
1670
+ "fIkrIarllrkif",
1671
+ "fiKriarLlrkif"
1672
+ ]
1673
+ },
1674
+ "INLKAIAALAKKLL": {
1675
+ "1": [
1676
+ "iNLKAIAALAKKLL",
1677
+ "InLKAIAALAKKLL",
1678
+ "INlKAIAALAKKLL",
1679
+ "INLkAIAALAKKLL",
1680
+ "INLKaIAALAKKLL"
1681
+ ],
1682
+ "0": [
1683
+ "inlkaiaalakkll",
1684
+ "Inlkaiaalakkll",
1685
+ "iNlkaiaalakkll",
1686
+ "inLkaiaalakkll",
1687
+ "inlKaiaalakkll",
1688
+ "inlkAiaalaakll"
1689
+ ]
1690
+ },
1691
+ "FLPLIGRVLSGIL": {
1692
+ "1": [
1693
+ "fLPLIGRVLSGIL",
1694
+ "FLPlIGRVLSGIL",
1695
+ "FLPliGRVLSGIL",
1696
+ "FLpLIGRVLSGIL",
1697
+ "FlPLIGRvLSGIL"
1698
+ ],
1699
+ "0": [
1700
+ "flpligrvlsgil",
1701
+ "FLPLiGRVLSGIL",
1702
+ "FLPLIGrVLSGIL",
1703
+ "FLPLigrvLSGIL",
1704
+ "FLPLIGRVLsGIL",
1705
+ "flPLIGRvlsGIL"
1706
+ ]
1707
+ },
1708
+ "KLLKKAGKLLKKAGKLLKKAG": {
1709
+ "1": [
1710
+ "KlLkKaGkLlKkAGkLlKkAG",
1711
+ "kLlKkAGKlLkKaGkLlKkAG",
1712
+ "KLLkkaGKLLkkaGKLLkkaG",
1713
+ "KlkKKAGKlkKKAGKlkKKAG",
1714
+ "KkLKKAGKkLKKAGKkLKKAG"
1715
+ ],
1716
+ "0": [
1717
+ "KlLkKaGkLlKkAGKlLkKaG",
1718
+ "KlLkKaGKLLKKAGkLlKkAG",
1719
+ "KkLKKAGKlLKKAGkLlKkAG",
1720
+ "kLlKkAGKlLKKAGKlLkKaG",
1721
+ "KlLkKaGkLLKKAGkLlKkAG",
1722
+ "KlLKKAGkLlKkAGKlLkKaG"
1723
+ ]
1724
+ },
1725
+ "LLAKKKGLLAKKKGLLAKKKG": {
1726
+ "1": [
1727
+ "LlAkKkGlLaKkKgLlAkKkG",
1728
+ "LlAkKkGlLaKkKgLlAkKKg",
1729
+ "LlAkKkGlLaKkKgLlAKkkG",
1730
+ "LlAkKkGlLaKkKgllAkKkG",
1731
+ "lLAkKkGlLaKkKgLlAkKkG",
1732
+ "LlAkKkGllaKkKgLlAkKkG"
1733
+ ],
1734
+ "0": [
1735
+ "llakkkgllaKKKGLLAKKKG",
1736
+ "LLAKKKGLLAKkkgllakkkg",
1737
+ "LlAKkkgLLaKkKgLlAkKkg",
1738
+ "llAkKKglLAkKKglLAkKkG"
1739
+ ]
1740
+ },
1741
+ "RPFTRAQWFAIQHISPRTIAMRAINNYRWR": {
1742
+ "1": [
1743
+ "rPFTRAQWFAIQHISPRTIAMRAINNYRWR",
1744
+ "RpFTRAQWFAIQHISPRTIAMRAINNYRWR",
1745
+ "RPFTRaQWFAIQHISPRTIAMRAINNYRWR",
1746
+ "RPFTRAQWFAiQHISPRTIAMRAINNYRWR",
1747
+ "RPftRAQWFAIQHISPRTIAMRAINNYRWR"
1748
+ ],
1749
+ "0": [
1750
+ "rpftraqwfaiqhisprtiamrainnyrwr",
1751
+ "rpftraqwfaIQHISPRTIAMRAINNYRWR",
1752
+ "RPFTRaqwfaiqhisprtiamrainNYRWR",
1753
+ "rpftraqwfaIQHISPRTIAmrainnyrwr",
1754
+ "rPfrAqWfAiQhIsPrTiAmRainNynRwR",
1755
+ "RpFtRaQwFaIqHispRtIaMRAINNYRWR"
1756
+ ]
1757
+ },
1758
+ "RLWLAIWRR": {
1759
+ "1": [
1760
+ "rlwlaiwrr",
1761
+ "rLwlaiwrr",
1762
+ "rlwLaiwrr",
1763
+ "rlwlAiwrr",
1764
+ "rlwlaIwrr",
1765
+ "rlwlaiwRr"
1766
+ ],
1767
+ "0": [
1768
+ "rLWLAIWRR",
1769
+ "RlWLAIWRR",
1770
+ "RLwLAIWRR",
1771
+ "RLWlAIWRR",
1772
+ "RLWLaIWRR"
1773
+ ]
1774
+ },
1775
+ "KLWLAIWKK": {
1776
+ "1": [
1777
+ "klwlaiwkk",
1778
+ "klwlaIWKK",
1779
+ "KLWLAiWKK",
1780
+ "KlWLAIwKK",
1781
+ "klWLAIWKK",
1782
+ "KLwlaIWKK"
1783
+ ],
1784
+ "0": [
1785
+ "KLWLAIwKK",
1786
+ "KlwlAIWKK",
1787
+ "KLWLaiWKK",
1788
+ "kLWlAIWKK",
1789
+ "kLWLalWKK"
1790
+ ]
1791
+ },
1792
+ "LKWLKKL": {
1793
+ "1": [
1794
+ "lkwlkkl",
1795
+ "LKWlKKL",
1796
+ "lKwLKKl",
1797
+ "LkWLKKl",
1798
+ "LKWlkkl",
1799
+ "lkwLKkl"
1800
+ ],
1801
+ "0": [
1802
+ "LkWLkkL",
1803
+ "lKwlKKl",
1804
+ "LKWLKKl",
1805
+ "lkwlKKL",
1806
+ "lKWLKkL",
1807
+ "LkWLKkl",
1808
+ "lKwLkKL"
1809
+ ]
1810
+ },
1811
+ "LRWLRRL": {
1812
+ "1": [
1813
+ "lrwlrrl",
1814
+ "lRwlrrl",
1815
+ "lrwlRrl",
1816
+ "lrwlrRl",
1817
+ "lRwlRrl",
1818
+ "lRwlrRl"
1819
+ ],
1820
+ "0": [
1821
+ "LrWLrrL",
1822
+ "lRwlRRl",
1823
+ "Lrwlrrl",
1824
+ "lrWlrrl",
1825
+ "lrwLrrl",
1826
+ "lrwlrrL",
1827
+ "LrwLrrl"
1828
+ ]
1829
+ },
1830
+ "FLKLLKKLLFLKLLKKLL": {
1831
+ "1": [
1832
+ "fLKLLKKLLfLKLLKKLL",
1833
+ "fLKLlKKLLfLKLLKKLL",
1834
+ "FLkLLKKLLflKLLKKLL",
1835
+ "flKLlKKLLfLKLLKKLL",
1836
+ "fLKLLkkLLfLKLLKKLL",
1837
+ "fLKLLKKLLfLkLLKKLL"
1838
+ ],
1839
+ "0": [
1840
+ "FLklLKKLLFLKLLKKLL",
1841
+ "FLKLLKKllFLKLLKKLL",
1842
+ "FlkLLKKLLFLKLLKKLL",
1843
+ "FLKllKKLLfLKLLKKLL",
1844
+ "FLKLLKKLLFLKLLkKLL"
1845
+ ]
1846
+ },
1847
+ "VDKPPYLPRPRPPRRIYNR": {
1848
+ "1": [
1849
+ "VDKPPYLPRPRPPRriynr",
1850
+ "VDKPPYLPRPRpprriynr",
1851
+ "VDKPPYLPRPrpprriynr",
1852
+ "VDKPPYLPRPRpPRRIYNR",
1853
+ "VDKPPYLPRPrPPRRIYNR",
1854
+ "VDKPPyLPRPRPPRRIYNR",
1855
+ "VDKPPYLPRPRPPrriynr",
1856
+ "VDKPPYLPRpRPPrriynr",
1857
+ "VDKPPYLPRPRPprriynr",
1858
+ "VDKPPyLPRPRPPrRIYNR",
1859
+ "VDKPPYLPrpRPPRRIYNR"
1860
+ ],
1861
+ "0": [
1862
+ "VDKPPYLPRpRPPRRIYNR",
1863
+ "VDKPPYLPrPRPPRRIYNR",
1864
+ "VDKPPYLpRPRPPRRIYNR",
1865
+ "VDKPPYlPRPRPPRRIYNR",
1866
+ "VDKPpYLPRPRPPRRIYNR",
1867
+ "VDKppYLPRPRPPRRIYNR",
1868
+ "VDKpPYLPRPRPPRRIYNR",
1869
+ "vdkppylprprpprriynr",
1870
+ "vDKPPYLPRPRPPRRIYNR",
1871
+ "VDKpPYLPRPRPPRRIYNr",
1872
+ "VDKPpyLPRPRPPRRIYNR",
1873
+ "VDKPPYLpRPRPpRRIYNR",
1874
+ "VDKPPYLPRpRPPRRIYnR"
1875
+ ]
1876
+ },
1877
+ "VRLIVAVRIWRR": {
1878
+ "1": [
1879
+ "VRLIVAVRIWRR",
1880
+ "vRLIVAVRIWRR",
1881
+ "VRlIVAVRIWRR",
1882
+ "VRLIvAVRIWRR",
1883
+ "VRLIVAvRIWRR"
1884
+ ],
1885
+ "0": [
1886
+ "vrlivavriwrr",
1887
+ "Vrlivavriwrr",
1888
+ "vRlivavriwrr",
1889
+ "vrLivavriwrr",
1890
+ "vrlIvavriwrr",
1891
+ "vrliVavriwrr"
1892
+ ]
1893
+ },
1894
+ "VRLRWWRRRWRR": {
1895
+ "1": [
1896
+ "vRLRWWRRRWRR",
1897
+ "VRlRWWRRRWRR",
1898
+ "VRLRwWRRRWRR",
1899
+ "VRLRWwRRRWRR",
1900
+ "vRlRwwRRRWRR"
1901
+ ],
1902
+ "0": [
1903
+ "vrlrwwrrrwrr",
1904
+ "vrlrwwrrrwrR",
1905
+ "vrlrwwrrrwRr",
1906
+ "Vrlrwwrrrwrr",
1907
+ "VRlrwwrrrwrr",
1908
+ "VrLrWWrrrWrr"
1909
+ ]
1910
+ },
1911
+ "RRW": {
1912
+ "1": [],
1913
+ "0": [
1914
+ "rRW",
1915
+ "RrW",
1916
+ "RRw",
1917
+ "rrW",
1918
+ "Rrw",
1919
+ "rRw",
1920
+ "rrw"
1921
+ ]
1922
+ },
1923
+ "FLGTVLKVAAKVLPAALCQIFKKC": {
1924
+ "1": [
1925
+ "FlGTVlKVAAKVlPAAlCQIFKKC",
1926
+ "FlGTVlKVAAKVlPAALCQIFKKC",
1927
+ "FlGTVlKVAAKVLPAAlCQIFKKC",
1928
+ "FlGTVLKVAAKVlPAAlCQIFKKC",
1929
+ "FLGTVlKVAAKVlPAAlCQIFKKC",
1930
+ "FlGTVlKVAAKVLPAALCQIFKKC"
1931
+ ],
1932
+ "0": [
1933
+ "FLGTVLkVAAkVLPAALCQIFkkC",
1934
+ "FLGTVLkVAAkVLPAALCQIFKkC",
1935
+ "FLGTVLkVAAKVLPAALCQIFkkC",
1936
+ "FLGTVLKVAAkVLPAALCQIFkkC",
1937
+ "FLGTVLkVAAkVLPAALCQIFKKC"
1938
+ ]
1939
+ },
1940
+ "FLGTVLKVLAKVLPAALCQIFKKC": {
1941
+ "1": [
1942
+ "FlGTVlKVlAKVlPAAlCQIFKKC",
1943
+ "FLGTVlKVlAKVlPAAlCQIFKKC",
1944
+ "FlGTVLKVlAKVlPAAlCQIFKKC",
1945
+ "FlGTVlKVLAKVlPAAlCQIFKKC",
1946
+ "FlGTVlKVlAKVLPAAlCQIFKKC",
1947
+ "FlGTVlKVlAKVlPAALCQIFKKC"
1948
+ ],
1949
+ "0": [
1950
+ "fLGTVLKVLAKVLPAALCQIFKKC",
1951
+ "FLgTVLKVLAKVLPAALCQIFKKC",
1952
+ "FLGtVLKVLAKVLPAALCQIFKKC",
1953
+ "FLGTvLKVLAKVLPAALCQIFKKC",
1954
+ "FLGTVLkVLAKVLPAALCQIFKKC"
1955
+ ]
1956
+ },
1957
+ "FLGTVLRVAARVLPAALCQIFRRC": {
1958
+ "1": [
1959
+ "FLGtvLRVAARVLPAALCQIFRRC",
1960
+ "FLGTVLRvaarVLPAALCQIFRRC",
1961
+ "FLGTVLRVAARvlpAALCQIFRRC",
1962
+ "fLGTVLRVAARVLPAALcqIFRRC",
1963
+ "FLGTVLrvAARVLPAALCQiFRRC"
1964
+ ],
1965
+ "0": [
1966
+ "FLGTVLrVAArVLPAALCQIFrrC",
1967
+ "FLGTVlrVAARVLPAALCQIFRRC",
1968
+ "FLGTVLrVaaRVLPAALCQIFRRC",
1969
+ "FLGTVLrVAARVLPAALCQIFrRC",
1970
+ "FLGTVLRVAaRVLPAALCQIFrrC",
1971
+ "FLGTVLRVAARVLPAALCqifrRC"
1972
+ ]
1973
+ },
1974
+ "RWKIFKKIEKMGRNIRDGIVKAGPAIQVLGSAKAI": {
1975
+ "1": [
1976
+ "rWKIFKKIEKMGRNIRDGIVKAGPAIQVLGSAKAI",
1977
+ "RWKIFKKIEKMGRNIRDGIVKAGPAIQVLGSAKAi",
1978
+ "RWKIFKKIEKmGRNIRDGIVKAGPAIQVLGSAKAI",
1979
+ "rWKIFKKIEKMGRNIRDGIVKAGPAIQVLGSAKAi"
1980
+ ],
1981
+ "0": [
1982
+ "rwkifkkiekmgrnirdgivkagpaiqvlgsakai",
1983
+ "Rwkifkkiekmgrnirdgivkagpaiqvlgsakai",
1984
+ "rwkifkkiekmgrnirdgivkagpaiqvlgsakaI",
1985
+ "rwkifkkiekMgrnirdgivKagpaiqvlgsakai",
1986
+ "RWKIFKKIEKmgrnirdgivkagpaiqvlgsakai",
1987
+ "RwKiFkKiEkMgRnIrDgIvKaGpAiQvLgSaKaI"
1988
+ ]
1989
+ },
1990
+ "GPLGVRGKRLWDIVRRWVGWL": {
1991
+ "1": [
1992
+ "GPlGvRGKRLWDIVRRWVGWL",
1993
+ "GPlGvRGKRLWDIvRRWVGWL",
1994
+ "GPLGvRGKRlWDIVRRWVGWL",
1995
+ "GPLGVRGKRlWDIVRRWVGWl",
1996
+ "GPLGVRGKRLWDIvRRWvGWL",
1997
+ "GPlGVRGKRlWDIvRRWVgWL"
1998
+ ],
1999
+ "0": [
2000
+ "gPLGVRGKRLWDIVRRWVGWL",
2001
+ "GPLGVRgKRLWDIVRRWVGWL",
2002
+ "GPLGVRGKRLWDIvRrWVGWL",
2003
+ "GPLGVRGKRLWDIVrRwVGWL",
2004
+ "GPLGVRGKRLWDIVRRWVgWl"
2005
+ ]
2006
+ },
2007
+ "RIVQRIKKWLR": {
2008
+ "1": [
2009
+ "rivqrikkwlr",
2010
+ "rIVQRIKKwlr",
2011
+ "riVqRIKKWLR",
2012
+ "RivqRIKKwlr",
2013
+ "rivQRiKKWLR",
2014
+ "RIvQrIKKWLr"
2015
+ ],
2016
+ "0": [
2017
+ "RIVqRIKKWLr",
2018
+ "riVQRiKKwlR",
2019
+ "RiVQRIkKwLr",
2020
+ "rIVQrIKKwLR",
2021
+ "rivQrIKKwLR"
2022
+ ]
2023
+ },
2024
+ "KRIWQRIK": {
2025
+ "1": [
2026
+ "kriwqrik",
2027
+ "KrIWQRIK",
2028
+ "KRIwQRIK",
2029
+ "kRIWQRiK",
2030
+ "kriwqRIK",
2031
+ "KriwqRIk"
2032
+ ],
2033
+ "0": [
2034
+ "KRIWqRIK",
2035
+ "kRIwQRIk",
2036
+ "KRIWqriK",
2037
+ "kRIWqRIK",
2038
+ "KRIWQrIk"
2039
+ ]
2040
+ },
2041
+ "KRIWQRIKDF": {
2042
+ "1": [
2043
+ "kriwqrikdf",
2044
+ "Kriwqrikdf",
2045
+ "krIwqrikdf",
2046
+ "kriwQrikdf",
2047
+ "kriwqrIkdf",
2048
+ "kriwqrikDf"
2049
+ ],
2050
+ "0": [
2051
+ "kRIWQRIKDF",
2052
+ "KRiWQRIKDF",
2053
+ "KRIWQRIKDf",
2054
+ "KrIWqRIKDF",
2055
+ "KRIwQrIKDF"
2056
+ ]
2057
+ },
2058
+ "KYKKALKKLAKLL": {
2059
+ "1": [
2060
+ "kykkalkklakll",
2061
+ "Kykkalkklakll",
2062
+ "kYkkalkklakll",
2063
+ "kyKkalkklakll",
2064
+ "kykKalkklakll",
2065
+ "kykkAlkklakll"
2066
+ ],
2067
+ "0": [
2068
+ "kYKKALKKLAKLL",
2069
+ "KyKKALKKLAKLL",
2070
+ "KYkKALKKLAKLL",
2071
+ "KYKkALKKLAKLL",
2072
+ "KYKKaLKKLAKLL"
2073
+ ]
2074
+ },
2075
+ "VQWRAIRVRVIR": {
2076
+ "1": [
2077
+ "vqwrairvrvir",
2078
+ "vQWRAIRVRVIR",
2079
+ "vqWRAIRVRVIR",
2080
+ "vqwRAIRVRVIR",
2081
+ "vqwrAIRVRVIR",
2082
+ "vqwraIRVRVIR"
2083
+ ],
2084
+ "0": [
2085
+ "vqwraiRVRVIR",
2086
+ "vqwrairVRVIR",
2087
+ "vqwrairvRVIR",
2088
+ "vqwrairvrVIR",
2089
+ "vqwrairvrvIR"
2090
+ ]
2091
+ },
2092
+ "GFAWNVCVYRNGVRVCHRRAN": {
2093
+ "1": [
2094
+ "gFAWNVCVYRNGVRVCHRRAN",
2095
+ "GfAWNVCVYRNGVRVCHRRAN",
2096
+ "GFawNVCVYRNGVRVCHRRAN",
2097
+ "GFAWNVCVYRNGVRVCHRRAn",
2098
+ "GFAWNVCVyRNGVRVCHRRAN"
2099
+ ],
2100
+ "0": [
2101
+ "GfawnvcvyrnGvrvchrran",
2102
+ "gfawnvcvyrngvrvchrran",
2103
+ "Gfawnvcvyrngvrvchrran",
2104
+ "GfawnvcvyrngvrvchrraN",
2105
+ "gfawnvcvyrNgvrvChrran",
2106
+ "gfawnvcvyRNgvrvchrran"
2107
+ ]
2108
+ },
2109
+ "LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES": {
2110
+ "1": [
2111
+ "llgdffrkskekigkefkrivqrikdflrnlvprtes",
2112
+ "LLGDFFRkskeKIGKEFKRIVQRIKDFLRNLVPRTES",
2113
+ "LLGDffrkskeKIGKEFKRIVQRIKDFLRNLVPRTES",
2114
+ "LLGDFFRKSKEKIGKefkrIVQRIKDFLRNLVPRTES",
2115
+ "LLGDFFRKSKEKigkeFKRIvqrikdflrnLVPRTES",
2116
+ "llgdFFRKSKEKIGKEFKRIVQrikdflrnlvprtes"
2117
+ ],
2118
+ "0": [
2119
+ "LLgDfFRKsKEkIgKeFkRiVqRIKdFlRnLvPRtEs",
2120
+ "lLGDFfRkSKeKIGKeFkRIvQRIkDfLrnlVPrTeS",
2121
+ "LlGdFfRkSKEkIGKeFkRIVQRIKdflRNLvPRTeS",
2122
+ "LLgDFFRKSkEkIgKeFKRivQRIkdfLrnlVPrTeS",
2123
+ "LLgDFfRksKekIGkEfKrivQrIKdflRNlVpRtEs"
2124
+ ]
2125
+ },
2126
+ "LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNL": {
2127
+ "1": [
2128
+ "llgdffrkskekigkefkrivqrikdflrnl",
2129
+ "Llgdffrkskekigkefkrivqrikdflrnl",
2130
+ "llgdfFrkskekigkefkrivqrikdflrnl",
2131
+ "llgdffrkskEkigkefkrivqrikdflrnl",
2132
+ "llgdffrkskekigkEfkrivqrikdflrnl",
2133
+ "llgdffrkskekigkefkriVqrikdflrnl"
2134
+ ],
2135
+ "0": [
2136
+ "lLGDFFRKSKEKIGKEFKRIVQRIKDFLRNL",
2137
+ "LLGDFFRKSKeKIGKEFKRIVQRIKDFLRNL",
2138
+ "llgDFFRKSKEKIGKEFKRIVQRIKDFLRNL",
2139
+ "LLGDFFRKSKeKIGKEFKRIvQRIKDFLRNl",
2140
+ "LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNl"
2141
+ ]
2142
+ },
2143
+ "RKRWWRWWKWWKR": {
2144
+ "1": [],
2145
+ "0": [
2146
+ "RKrWWrWwkWWkR"
2147
+ ]
2148
+ },
2149
+ "WRWWKWW": {
2150
+ "1": [
2151
+ "wRWWKWW",
2152
+ "WRwWKWW",
2153
+ "wRWwKWW",
2154
+ "WRWWKwW",
2155
+ "wrWwKWW"
2156
+ ],
2157
+ "0": [
2158
+ "WrWwkWW",
2159
+ "WrWWKWW",
2160
+ "WRWwKWW",
2161
+ "WRWWkWW",
2162
+ "WrWWkWW",
2163
+ "WRWwkWW"
2164
+ ]
2165
+ },
2166
+ "WWRWWKWW": {
2167
+ "1": [
2168
+ "wWRWWKWW",
2169
+ "WwRWWKWW",
2170
+ "WWRwWKWW",
2171
+ "WWRWWKwW",
2172
+ "wWRWWKWw"
2173
+ ],
2174
+ "0": [
2175
+ "WWrWwkWW",
2176
+ "WWrWWKWW",
2177
+ "WWRWwKWW",
2178
+ "WWRWWkWW",
2179
+ "WWrWwKWW",
2180
+ "WWRWwkWW"
2181
+ ]
2182
+ },
2183
+ "RRGKKLLLLLKKKG": {
2184
+ "1": [
2185
+ "rrgkklllllkkkg",
2186
+ "RRGKKlllllKKKG",
2187
+ "rrGKKlllllKKKG",
2188
+ "RRgKKlllllKKKG",
2189
+ "RRGKKlllllkkkG",
2190
+ "RRGkklllllKKKG"
2191
+ ],
2192
+ "0": [
2193
+ "rrgkkLLLLLKKKG",
2194
+ "RRGKKLLLLLkkkg",
2195
+ "rRGKKllllLKKKG",
2196
+ "RrGKKLllllKKKG",
2197
+ "RRGkkLLLllKKKG"
2198
+ ]
2199
+ },
2200
+ "LLWIALRKK": {
2201
+ "1": [
2202
+ "llwialrkk",
2203
+ "llwIaLRKK",
2204
+ "LLwiaLRKK",
2205
+ "LLWIALrKK",
2206
+ "llWialRKK",
2207
+ "LLwiAlrKK"
2208
+ ],
2209
+ "0": [
2210
+ "lLwIALRKK",
2211
+ "LLWiaLrKK",
2212
+ "LLwiALRKK",
2213
+ "llwIaLrkK",
2214
+ "LLWIAlrKk"
2215
+ ]
2216
+ },
2217
+ "PRPRPRP": {
2218
+ "1": [
2219
+ "PrPrPrP",
2220
+ "pRpRpRp",
2221
+ "PRpRPRP",
2222
+ "PrPRPRP",
2223
+ "pRPRPRP"
2224
+ ],
2225
+ "0": [
2226
+ "prprprp",
2227
+ "prPRPRP",
2228
+ "PRprPRP",
2229
+ "PRPRprP",
2230
+ "prprprP",
2231
+ "pRPRPRp"
2232
+ ]
2233
+ },
2234
+ "KWLKKWLKWLKK": {
2235
+ "1": [
2236
+ "kwlKkWLKWLKK",
2237
+ "KWlKkKWLKWLK",
2238
+ "KWLKkwLKWLKk",
2239
+ "KWlKKwLkwLKK",
2240
+ "kWLKKWLKwLkK"
2241
+ ],
2242
+ "0": [
2243
+ "kwLkkwLkwLkk",
2244
+ "kwlkkwLkwLkk",
2245
+ "kwLKKwlkwlkk",
2246
+ "kWLkkwLkWLkk",
2247
+ "kwLKKwLkWLkk",
2248
+ "KwLKKwLkwlkk"
2249
+ ]
2250
+ },
2251
+ "ILRWPWWPWRRK": {
2252
+ "1": [
2253
+ "iLRWPWWPWRRK",
2254
+ "ILrWPWWPWRRK",
2255
+ "ILRwPWWPWRRK",
2256
+ "ILRWPwWPWRRK",
2257
+ "ILRWPWWpWRRK"
2258
+ ],
2259
+ "0": [
2260
+ "ilrwpwwpwrrk",
2261
+ "Ilrwpwwpwrrk",
2262
+ "ilRwpwwpwrrk",
2263
+ "ilrWpwwpwrrk",
2264
+ "ilrwPwwpwrrk",
2265
+ "ilrwpwWpwrrk"
2266
+ ]
2267
+ },
2268
+ "KRKIFLRTKILV": {
2269
+ "1": [
2270
+ "KrKiFlRtKiLv",
2271
+ "KrKiFLRTKILV",
2272
+ "KrKIFlRTKILV",
2273
+ "KRKiFlRTKILV",
2274
+ "KrKIFLRTKILv",
2275
+ "KRKiFLRTKILv"
2276
+ ],
2277
+ "0": [
2278
+ "kRkIfLrTkIlV",
2279
+ "kRKIFLRTKILV",
2280
+ "KRkIFLRTKILV",
2281
+ "kRkIFLRTKILV",
2282
+ "kRKIfLRTKILV",
2283
+ "KRKIFLrTKILV"
2284
+ ]
2285
+ },
2286
+ "VLIKTRLFIKRK": {
2287
+ "1": [
2288
+ "vLiKtRlFiKrK",
2289
+ "vLiKtRLFIKrK",
2290
+ "VLIKtRlFiKrK",
2291
+ "vLIKtRlFiKrK",
2292
+ "VLiKtRLfiKRk",
2293
+ "VLIKTlFirKrk"
2294
+ ],
2295
+ "0": [
2296
+ "VliKTrlfiKRK",
2297
+ "vLIkTrLfIkRK",
2298
+ "VLkTrLFiKrkK",
2299
+ "vlIKtrLFikRk",
2300
+ "VLiKTlFiKrkk"
2301
+ ]
2302
+ },
2303
+ "KWKLFKKIEKVGQNIRDGIIKAGPAVAVVGQATQIAK": {
2304
+ "1": [
2305
+ "kWKLFKKIEKVGQNIRDGIIKAGPAVAVVGQATQIAK",
2306
+ "KwKLFKKIEKVGQNIRDGIIKAGPAVAVVGQATQIAK",
2307
+ "KWkLFKKIEKVGQNIRDGIIKAGPAVAVVGQATQIAK",
2308
+ "kWkLFKKIEKVGQNIRDGIIKAGPAVAVVGQATQIAK",
2309
+ "KWKLFKKIEKVGQNIRDGIIKAGPAVAVVGQATQIAk"
2310
+ ],
2311
+ "0": [
2312
+ "kwklfkkiekvgqnirdgiikagpavavvgqatqiak",
2313
+ "Kwklfkkiekvgqnirdgiikagpavavvgqatqiak",
2314
+ "kwklfkkiekvgqnirdgiikagpavavvgqatqiAk",
2315
+ "kwklfkkiekvgqnirdgiiKAGPAVAVVGQATQIAK",
2316
+ "KwKlFkKiEkVgQnIrDgIiKaGpAvAvVgQaTqIaK"
2317
+ ]
2318
+ },
2319
+ "GIGKFLHSAKKFGKAFVGEIMNS": {
2320
+ "1": [
2321
+ "gigkflhsakkfgkafvgeimns",
2322
+ "gIgKFLHSAKKFGKAFVGEIMNS",
2323
+ "GIgKFLHSAKKFGKAFVGEIMNS",
2324
+ "GIGkfLHSAKKFGKAFVGEIMNS",
2325
+ "GIGKFlHSAKKFGKaFVGEIMNS",
2326
+ "GIGKFLHSaKKFGKAFVGEiMNS"
2327
+ ],
2328
+ "0": [
2329
+ "GIGkfLHSaKKFGKAFVGEIMNS",
2330
+ "GIGKFLHsakkfgKAFVGEIMNS",
2331
+ "GIGKFLHSAKKfGKafvgeimns",
2332
+ "GigkflhsakKfGKAFVGEIMNS",
2333
+ "GIGKFLHSaKKFGKAFVGEImnS"
2334
+ ]
2335
+ },
2336
+ "KWKLFKKIEKVGQGIGAVLKVLTTGL": {
2337
+ "1": [
2338
+ "KWKLfKKIEKVGQGIGAVLKVLTTGL",
2339
+ "kKWLFKKIEKVGQGIGAVLKVLTTGL",
2340
+ "KWKkFKKIEKVGQGIGAVLKVLTTGL",
2341
+ "KWKlFKKiEKVGQGIGAVLKVLTTGL",
2342
+ "KwKLFKKIEkVGQGIGAVLKVLTTGL"
2343
+ ],
2344
+ "0": [
2345
+ "kwklfkkiekvgqgigavlkvlttgl",
2346
+ "kwkLfkkiekvgqgigavlkvlttgl",
2347
+ "kwklfkkiekvgqgigavlkVLttgl",
2348
+ "KwklfkkiekvgqgigavLKVlttgl",
2349
+ "kwklfKkiekvgqgigavlkvlttgl",
2350
+ "kWkLfkkiekvgqgigavlkvlttgl"
2351
+ ]
2352
+ },
2353
+ "KWKLFKKIGIGAVLKVLTTGLPALIS": {
2354
+ "1": [
2355
+ "kwklfkkigigavlkvlttglpalis",
2356
+ "kwklfkkigigavlkvlttgLPALIS",
2357
+ "KWKLFKkigigavlkvlttglpalis",
2358
+ "KwklfkkigIgavlkvlttGlpalis",
2359
+ "kwklFkkigigavlKvlttglpalIs"
2360
+ ],
2361
+ "0": [
2362
+ "kWKLFKKIGIGAVLKVLTTGLPALIS",
2363
+ "kwKLFKKIGIGAVLKVLTTGLPALIS",
2364
+ "KWKLFKKIGiGAVLKVLTTGLPALIS",
2365
+ "KWKLFKKIGIGAVLKVLTTgLPALIS",
2366
+ "KWKLfKKIGIGAVLKVLTTGLPALIS"
2367
+ ]
2368
+ },
2369
+ "KWKLFKKGIGAVLKV": {
2370
+ "1": [
2371
+ "kwklfkkgigavlkv",
2372
+ "kWKLFKKGIGAVLKv",
2373
+ "kwKLFKKGIGAVLKV",
2374
+ "kwkLFKKGIGAVLKV",
2375
+ "KWKLFKKGIGAVLkv",
2376
+ "KWKLFKKgIGAVlkV"
2377
+ ],
2378
+ "0": [
2379
+ "KWKlfKKGIGAVLKV",
2380
+ "KWKLFKKGiGAVLKV",
2381
+ "KWKlFKKGiGAVLKV",
2382
+ "KWKLfKKGiGAVLKV"
2383
+ ]
2384
+ },
2385
+ "KWKLFKKIGAVLKVL": {
2386
+ "1": [
2387
+ "kwklfkkigavlkvl",
2388
+ "kWKLFKKIGAVLKVL",
2389
+ "KwKLFKKIGAVLKVL",
2390
+ "kwKLFKKIGAVLKVL",
2391
+ "KWKLFKKIGAVLKVl",
2392
+ "kWkLfKkIgAvLkVl"
2393
+ ],
2394
+ "0": [
2395
+ "KWklFKKIGAVLKVL",
2396
+ "kwkLFKKIGAVLKVL",
2397
+ "KWKLFKKIGAVLkvl",
2398
+ "KwKlFkKiGaVlKvL",
2399
+ "kwKlFKkIGAvLKVL"
2400
+ ]
2401
+ },
2402
+ "KWKLFKKGAVLKVLT": {
2403
+ "1": [
2404
+ "kwklfkkgavlkvlt",
2405
+ "KWKlfkkgavlkvlt",
2406
+ "kwklFKKgavlkvlt",
2407
+ "Kwklfkkgavlkvlt",
2408
+ "kwklfkkkgVLkvlt"
2409
+ ],
2410
+ "0": [
2411
+ "kWKLFKKGAVLKVLT",
2412
+ "kWKLFKKGAVLKVLt",
2413
+ "kwkLFKKGAVLKVLT",
2414
+ "KWKlfkkGAVLKVLT",
2415
+ "kWkLfKkGaVLKVLT"
2416
+ ]
2417
+ },
2418
+ "KWKLFKKAVLKVLTT": {
2419
+ "1": [
2420
+ "kwklfkkavlkvltt",
2421
+ "Kwklfkkavlkvltt",
2422
+ "kWklfkkavlkvltt",
2423
+ "kwKlfkkavlkvltt",
2424
+ "kwkLfkkavlkvltt",
2425
+ "kwklFkkavlkvltt"
2426
+ ],
2427
+ "0": [
2428
+ "KWKLFkKAVLKVLTT",
2429
+ "KWKLFKkAVLKVLTT",
2430
+ "KWKLFKKaVLKVLTT",
2431
+ "KWKLFKKAvLKVLTT",
2432
+ "KWKLFKKAVlKVLTT"
2433
+ ]
2434
+ },
2435
+ "KWKLFKKVLKVLTTG": {
2436
+ "1": [
2437
+ "kwklfkkvlkvlttg",
2438
+ "Kwklfkkvlkvlttg",
2439
+ "kWklfkkvlkvlttg",
2440
+ "kwKlfkkvlkvlttg",
2441
+ "kwkLfkkvlkvlttg",
2442
+ "kwklFkkvlkvlttg"
2443
+ ],
2444
+ "0": [
2445
+ "kWKLFKKVLKVLTTG",
2446
+ "kwKLFKKVLKVLTTG",
2447
+ "kwkLFKKVLKVLTTG",
2448
+ "kwklFKKVLKVLTTG",
2449
+ "kwklfKKVLKVLTTG"
2450
+ ]
2451
+ },
2452
+ "GSKKPVPIIYCNRRTGKCQRM": {
2453
+ "1": [
2454
+ "GsKKPVPIIYCNRRTGKCQRM",
2455
+ "GSKkpvpiiyCNRRTGKCQRM",
2456
+ "GSKKPVPIIYCNrRTgKCQRM",
2457
+ "gSKKPVPIIYCNRRTGkCQRM",
2458
+ "GSKKPVPIIycnrrTGKCQRM"
2459
+ ],
2460
+ "0": [
2461
+ "gskkpvpiiycnrrtgkcqrm",
2462
+ "gskkpvpiiycnrrtgkCQRM",
2463
+ "gskkpvpiIYCNRRTGKcqrM",
2464
+ "GSKKPVPiiycnrrtgkcqrm",
2465
+ "gskkpVPIIYCNRRTgkcqrm",
2466
+ "gskkPVPIIYcnrrtgkcqrm"
2467
+ ]
2468
+ },
2469
+ "RRWQWRMKK": {
2470
+ "1": [
2471
+ "rrwqwrmkk",
2472
+ "Rrwqwrmkk",
2473
+ "rRwqwrmkk",
2474
+ "rrwQwrmkk",
2475
+ "rrwqWrmkk",
2476
+ "rrwqwRmkk"
2477
+ ],
2478
+ "0": [
2479
+ "rRWQWRMKK",
2480
+ "RrWQWRMKK",
2481
+ "RRwQWRMKK",
2482
+ "RRWqWRMKK",
2483
+ "RRWQwRMKK"
2484
+ ]
2485
+ },
2486
+ "FKCRRWQWRMKKLGA": {
2487
+ "1": [
2488
+ "fkcrrwqwrmkklga",
2489
+ "fkcrrwqwRMKKLGA",
2490
+ "fKcRrWqWrMkKlGa",
2491
+ "FKCRRwQwRMKKLGA",
2492
+ "fkCrrwqwrmkklga"
2493
+ ],
2494
+ "0": [
2495
+ "fKCRRWQWRMKKLGA",
2496
+ "FkCRRWQWRMKKLGA",
2497
+ "FKCrRWQWRMKKLGA",
2498
+ "FKCRRWqWRMKKLGA",
2499
+ "FKCRRWQWRMKKLGa"
2500
+ ]
2501
+ },
2502
+ "PKLLKTFLSKWIG": {
2503
+ "1": [
2504
+ "pKLLKTFLSKWIG",
2505
+ "PKlLKTFLSKWIG",
2506
+ "PKLLkTFLSKWIG",
2507
+ "PKLLKTfLSKWIG",
2508
+ "PKLLKTFLsKWIG"
2509
+ ],
2510
+ "0": [
2511
+ "pkllktflskwig",
2512
+ "pkllktflskwiG",
2513
+ "pkllktflskwIg",
2514
+ "pkllktflsKwIg",
2515
+ "pkllktfLSkwiG",
2516
+ "pkllkTflskwiG"
2517
+ ]
2518
+ },
2519
+ "KLPLIGRVLSGIL": {
2520
+ "1": [
2521
+ "klpligrvlsgil",
2522
+ "KLPLigrvlsgil",
2523
+ "kLPLigrvlsgil",
2524
+ "klPLigrvlsgil",
2525
+ "klpLIGRvlsgil",
2526
+ "klpliGrVLSGIL"
2527
+ ],
2528
+ "0": [
2529
+ "KlPLigrvlsgil",
2530
+ "klpLIGRVlSGIL",
2531
+ "KLPLigRvLSGIl",
2532
+ "klPLIGRvlsgil",
2533
+ "KlpligRVLSGiL"
2534
+ ]
2535
+ },
2536
+ "KKHRKHRKHRKHGGSGGSKNLRRIIRKGIHIIKKYG": {
2537
+ "1": [
2538
+ "KKHRKHRKHRKHGGSGGSKNLRRIIRKGIHIIKKYG",
2539
+ "kKHRKHRKHRKHGGSGGSKNLRRIIRKGIHIIKKYG",
2540
+ "KKHRKHRKHRKHGGSGGSKNLRRIIRKGIHIIKKYg",
2541
+ "KKHRKHRKHRKHGGsGGSKNLRRIIRKGIHIIKKYG"
2542
+ ],
2543
+ "0": [
2544
+ "kkhrkhrkhrkhggsggsknlrriirkgihiikkyg",
2545
+ "kkhrkhrkhrkhggsggsKnlrriirkgihiikkyg",
2546
+ "KKHRKhrkhrkhggsggsknlrriirkgihiikkyg",
2547
+ "KkHrKhRkHrKhGgSgGsKnLrRiIrKgIhIiKkYg",
2548
+ "KKHRKHRKHRKHGGSGGSknlrriirkgihiikkyg"
2549
+ ]
2550
+ },
2551
+ "FKRIVQRIKDFLRNLV": {
2552
+ "1": [
2553
+ "fKRIVQRIKDFLRNLV",
2554
+ "FkRIVQRIKDFLRNLV",
2555
+ "FKrIVQRIKDFLRNLV",
2556
+ "FKRIVQRIKDFLRrLV",
2557
+ "FKRIVQRIKdFLRNLV"
2558
+ ],
2559
+ "0": [
2560
+ "FKRiVQRiKDFlRNLV",
2561
+ "FKRIvQRiKDFlRNLV",
2562
+ "FKRIVQrIKDFLRNlV",
2563
+ "fKRiVQRIKDFLRNLV",
2564
+ "FKRIVQRikDFLRnLV",
2565
+ "FKRIVQRIKDfLRNLV"
2566
+ ]
2567
+ },
2568
+ "GWGSFFKKAAHVGKHVGKAALTHYL": {
2569
+ "1": [
2570
+ "gwgsffkKAAHVGKHVGKAALTHYL",
2571
+ "GWGSFFKKAAhvgkhvgkaalTHYL",
2572
+ "gWgSfFkKaAhVgKhVgKaAlThYl",
2573
+ "GwGSffKKaaHvGKHvGKaalTHyl",
2574
+ "GWGSFFkkAAHVGKHVGKAALTHYL"
2575
+ ],
2576
+ "0": [
2577
+ "gwgsffkkaahvgkhvgkaalthyl",
2578
+ "Gwgsffkkaahvgkhvgkaalthyl",
2579
+ "GWGSFFkkAAHVGkHVGkAALTHYL",
2580
+ "gwgsffkkaahvgKHVGKAALTHYL",
2581
+ "gwgsffkkaahvgKhvgKaalthyl",
2582
+ "GwGsffkkaahvGkhvGkaalthyl"
2583
+ ]
2584
+ },
2585
+ "RRGWVLALVLRYGRR": {
2586
+ "1": [
2587
+ "rRGWVLALVLRYGRR",
2588
+ "RrGWVLALVLRYGRR",
2589
+ "RRgWVLALVLRYGRR",
2590
+ "RRGwVLALVLRYGRR",
2591
+ "RRGWvLALVLRYGRR"
2592
+ ],
2593
+ "0": [
2594
+ "RRGWVLALVlRYGRR",
2595
+ "rRGWVLALVlRYGRR",
2596
+ "RrGWVLALVlRYGRR",
2597
+ "RRgWVLALVlRYGRR",
2598
+ "RRGwVLALVlRYGRR",
2599
+ "RRGWvLALVlRYGRR"
2600
+ ]
2601
+ },
2602
+ "RRGWVLALYLRYGRR": {
2603
+ "1": [
2604
+ "rRGWVLALYLRYGRR",
2605
+ "RRgWVLALYLRYGRR",
2606
+ "RRGWVLalYLRyGRR",
2607
+ "RrGWvLALYlRYgRR"
2608
+ ],
2609
+ "0": [
2610
+ "RRGWVLALYlRYGRR",
2611
+ "RRGWVLALyLrYGRR",
2612
+ "RRGwVLalYLRYGRR",
2613
+ "rRGWVLALYLryGRR",
2614
+ "RRGWvLALyLRYGrR",
2615
+ "RRgWVLalYLRyGRr"
2616
+ ]
2617
+ },
2618
+ "RRGWALRLVLAY": {
2619
+ "1": [
2620
+ "rRGWALRLVLAY",
2621
+ "RrGWALRLVLAY",
2622
+ "RRgWALRLVLAY",
2623
+ "RRGWaLRLVLAY",
2624
+ "RRGWALrLVLAY"
2625
+ ],
2626
+ "0": [
2627
+ "RRGWALRLVlAY",
2628
+ "RRGwALRLVLAY",
2629
+ "RRGWAlRLVLAY",
2630
+ "RRGWALRlVLAY",
2631
+ "RRGWALRLVLaY",
2632
+ "RRGWALRLVLAy"
2633
+ ]
2634
+ },
2635
+ "KWKKLLKKPLLKKLLKKL": {
2636
+ "1": [
2637
+ "kwkkllkkpllkkllkkl",
2638
+ "Kwkkllkkpllkkllkkl",
2639
+ "kWkkllkkpllkkllkkl",
2640
+ "kwkkllkkpLLkkllkkl",
2641
+ "kwkkllkkpllkkllkkL",
2642
+ "KWkkllkkpllkkllkkl"
2643
+ ],
2644
+ "0": [
2645
+ "kWKKLLKKPLLKKLLKKL",
2646
+ "KWKKLLKKPLLKKLLKKl",
2647
+ "KWKKLLKKpLLKKLLKKL",
2648
+ "kwKKLLKKPLLKKLLKKL",
2649
+ "KWKKLLKKPllKKLLKKL"
2650
+ ]
2651
+ },
2652
+ "NKKAGLFVVQFPKKY": {
2653
+ "1": [
2654
+ "nkkaglfvvqfpkky",
2655
+ "nKkAGLFVVQFPKKY",
2656
+ "NkKAGLFVVQFPKKy",
2657
+ "NKkaglfVVQFPKKY",
2658
+ "NkkaGlfVVQFPKKY",
2659
+ "nKKaGLfvvqFPkky"
2660
+ ],
2661
+ "0": [
2662
+ "NKkAGlFVVQfPKKy",
2663
+ "NkkAglFvvQFPKkY",
2664
+ "nkkaglFVVqfpKKY",
2665
+ "NKKAGLfVvQfPKkY",
2666
+ "nkKAGlFvVQFPkKy"
2667
+ ]
2668
+ },
2669
+ "LVKKLLKLAMGFG": {
2670
+ "1": [
2671
+ "lvkkllklamgfg",
2672
+ "Lvkkllklamgfg",
2673
+ "lVkkllklamgfg",
2674
+ "lvKkllklamgfg",
2675
+ "lvkKllklamgfg",
2676
+ "lvkkLlklamgfg"
2677
+ ],
2678
+ "0": [
2679
+ "VkKKLLKLAMGFG",
2680
+ "LvKLLKLAMGFGg",
2681
+ "LVKKllKLaMGFG",
2682
+ "LVKKLLkLAmGfG"
2683
+ ]
2684
+ },
2685
+ "WLRRIKAWLRRIKA": {
2686
+ "1": [
2687
+ "wlrrikawlrrika",
2688
+ "Wlrrikawlrrika",
2689
+ "wLrrikawlrrika",
2690
+ "wlRrikawlrrika",
2691
+ "wlrRikawlrrika",
2692
+ "WLrrikawlrrika"
2693
+ ],
2694
+ "0": [
2695
+ "wLRRIKAWLRRIKA",
2696
+ "WlRRIKAWLRRIKA",
2697
+ "WLrRIKAWLRRIKA",
2698
+ "WLRrIKAWLRRIKA",
2699
+ "WLrrIKAWLRRIKA"
2700
+ ]
2701
+ },
2702
+ "RRGWARRLAFAFGRR": {
2703
+ "1": [
2704
+ "rrgwarrlafafgrr",
2705
+ "Rrgwarrlafafgrr",
2706
+ "rrgwarRLafafgrr",
2707
+ "RrGWarRLafafgrr",
2708
+ "rrgWarrlafafgRR",
2709
+ "rrgwarrlafafgRr"
2710
+ ],
2711
+ "0": [
2712
+ "rRGWARRLAFAFGRR",
2713
+ "rrGWARRLAFAFGRR",
2714
+ "RRGWARRLAFaFGRR",
2715
+ "RRGWaRRLAFAFGRR",
2716
+ "RRGWARRLafAFGRR"
2717
+ ]
2718
+ }
2719
+ }
finetune.py ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ import logging
4
+ import os
5
+ import time
6
+
7
+ from dataset import PeptidePairDataset, PeptidePairPicDataset, SimplePairClsDataset
8
+ from network import DMutaPeptide, DMutaPeptideCNN#, DMutaPeptideWiden
9
+ from sklearn.model_selection import KFold
10
+ from train import train_cls
11
+ import torch
12
+ import torch.nn as nn
13
+ from torch.utils.data import DataLoader, WeightedRandomSampler, RandomSampler, Subset
14
+ import numpy as np
15
+ from loss import MLCE, SuperLoss, LogCoshLoss, BMCLoss
16
+ from utils import set_seed
17
+
18
+
19
+ parser = argparse.ArgumentParser(description='resnet26')
20
+ # model setting
21
+ parser.add_argument('--model', type=str, default='resnet34',
22
+ help='resnet34 resnet50 densenet')
23
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='cnn',
24
+ help='lstm mamba mla')
25
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default='lstm',
26
+ help="use side features")
27
+ parser.add_argument('--channels', type=int, default=16)
28
+ parser.add_argument('--fusion', type=str, default='att',
29
+ help='mlp att diff')
30
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
31
+ help="use global features")
32
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
33
+ help="use non-siamese architecture")
34
+ parser.add_argument('--widen', action='store_true', default=False,
35
+ help='use widen non-siamese architecture')
36
+
37
+ # task & dataset setting
38
+ parser.add_argument('--task', type=str, default='cls',
39
+ help='reg or cls')
40
+ parser.add_argument('--pdb-src', type=str, dest='pdb_src', default='af',
41
+ help='af or hf')
42
+ parser.add_argument('--data-ver', type=str, dest='data_ver', default='250228',
43
+ help='data version')
44
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=True,
45
+ help='use one-way constructed dataset')
46
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
47
+ help='Max length for sequence filtering')
48
+ parser.add_argument('--split', type=int, default=5,
49
+ help="Split k fold in cross validation (default: 5)")
50
+ parser.add_argument('--run-folds', type=int, dest='run_folds', nargs='+', default=-1,
51
+ help='specify which folds to run')
52
+ parser.add_argument('--seed', type=int, default=1,
53
+ help="Seed (default: 1)")
54
+ parser.add_argument('--pcs', action='store_true', default=False,
55
+ help='Consider protease cut site')
56
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
57
+ help='Consider protease cut site')
58
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
59
+ help='resize the image')
60
+ parser.add_argument('--llm-data', action='store_true', default=False,
61
+ help='Use LLM augmentation data')
62
+
63
+ # training setting
64
+ parser.add_argument('--gpu', type=int, default=0,
65
+ help='GPU index to use, -1 for CPU (default: 0)')
66
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
67
+ help='input batch size for training (default: 128)')
68
+ parser.add_argument('--epochs', type=int, default=50,
69
+ help='number of epochs to train (default: 100)')
70
+ parser.add_argument('--lr', type=float, default=0.001,
71
+ help='learning rate (default: 0.001)')
72
+ parser.add_argument('--decay', type=float, default=0.0005,
73
+ help='weight decay (default: 0.0005)')
74
+ parser.add_argument('--warm-steps', type=int, dest='warm_steps', default=0,
75
+ help='number of warm start steps for learning rate (default: 10)')
76
+ parser.add_argument('--patience', type=int, default=10,
77
+ help='patience for early stopping (default: 10)')
78
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
79
+ help='path of the pretrain model')
80
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
81
+ help='metric average type')
82
+
83
+ parser.add_argument('--loss', type=str, default='ce',
84
+ help='loss function')
85
+ parser.add_argument('--dir', action='store_true', default=False,
86
+ help='use DIR')
87
+
88
+ parser.add_argument('--bias-curri', dest='bias_curri', action='store_true', default=False,
89
+ help='directly use loss as the training data (biased) or not (unbiased)')
90
+ parser.add_argument('--anti-curri', dest='anti_curri', action='store_true', default=False,
91
+ help='easy to hard (curri), hard to easy (anti)')
92
+ parser.add_argument('--std-coff', dest='std_coff', type=float, default=1,
93
+ help='the hyper-parameter of std')
94
+
95
+ parser.add_argument('--ft-epochs', dest='ft_epochs', type=int, default=15,
96
+ help='fine-tune epochs')
97
+ parser.add_argument('--ft-lr', dest='ft_lr', type=float, default=0.0002,
98
+ help='fine-tune learning rate')
99
+
100
+ parser.add_argument('--simple', dest='simple', action='store_true', default=False)
101
+
102
+ args = parser.parse_args()
103
+
104
+ if args.llm_data:
105
+ args.simple = True
106
+
107
+ if args.simple:
108
+ args.one_way = True
109
+
110
+ if args.run_folds == -1:
111
+ args.run_folds = list(range(args.split))
112
+
113
+ def main():
114
+ set_seed(args.seed)
115
+ if args.task == 'reg':
116
+ args.classes = 1
117
+ if args.loss == "mse" or args.loss in ['ce']:
118
+ args.loss = 'mse'
119
+ criterion = nn.MSELoss()
120
+ elif args.loss == "smoothl1":
121
+ criterion = nn.SmoothL1Loss()
122
+ elif args.loss == "super":
123
+ criterion = SuperLoss()
124
+ elif args.loss in ["bmc", "bmc_ln"]:
125
+ criterion = BMCLoss()
126
+ else:
127
+ raise NotImplementedError("unimplemented regression task loss function")
128
+ elif args.task == 'cls':
129
+ args.classes = 2
130
+ if args.loss == 'ce' or args.loss in ['mse', 'smoothl1', 'super']:
131
+ args.loss = 'ce'
132
+ criterion = nn.CrossEntropyLoss()
133
+ else:
134
+ raise NotImplementedError("unimplemented classification task loss function")
135
+ else:
136
+ raise NotImplementedError("unimplemented task")
137
+
138
+ if args.q_encoder in ['cnn', 'rn18']:
139
+ weight_dir = f'./run-{args.task}/{args.q_encoder}{f"-non-siamese" if args.non_siamese else ""}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
140
+ else:
141
+ weight_dir = f'./run-{args.task}/{args.q_encoder}{f"-non-siamese" if args.non_siamese else ""}-{args.fusion}-{args.channels}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
142
+
143
+ logging.basicConfig(handlers=[
144
+ logging.FileHandler(filename=os.path.join(weight_dir, "finetune.log"), encoding='utf-8', mode='w+'),
145
+ logging.StreamHandler()],
146
+ format="%(asctime)s: %(message)s", datefmt="%F %T", level=logging.INFO)
147
+
148
+ logging.info(f'Finetuning: {weight_dir}')
149
+
150
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
151
+
152
+ logging.info(f'Loading Training Dataset')
153
+ train_set = SimplePairClsDataset(pad_length=args.max_length, ftr2=True, gf=args.glob_feat, q_encoder=args.q_encoder, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
154
+
155
+ logging.info('Loading Test Dataset')
156
+ if args.q_encoder in ['cnn', 'rn18']:
157
+ test_set = PeptidePairPicDataset(mode='r2_case', pad_length=args.max_length, task=args.task, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
158
+ else:
159
+ test_set = PeptidePairDataset(mode='r2_case', pad_length=args.max_length, task=args.task, gf=args.glob_feat)
160
+
161
+ train_loader = DataLoader(train_set, batch_size=args.batch_size, shuffle=True, drop_last=True, num_workers=8, pin_memory=True)
162
+ test_loader = DataLoader(test_set, batch_size=args.batch_size, shuffle=False, num_workers=8, pin_memory=True)
163
+
164
+ best_perform_list = [[] for i in range(5)]
165
+
166
+ for fold in range(args.split):
167
+ logging.info(f'Finetuning Fold {fold}')
168
+ logging.info(f'Fold {fold} Train set:{len(train_set)}, Test set: {len(test_set)}')
169
+ # if args.widen:
170
+ # model = DMutaPeptideWiden(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, fusion=args.fusion, side_enc=args.side_enc)
171
+ # else:
172
+ if args.q_encoder in ['cnn', 'rn18']:
173
+ model = DMutaPeptideCNN(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese)
174
+ else:
175
+ model = DMutaPeptide(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, fusion=args.fusion, non_siamese=args.non_siamese)
176
+
177
+ weights_path = f"{weight_dir}/model_{fold}.pth"
178
+
179
+ model.to(device)
180
+ # model.load_state_dict(torch.load(weights_path.replace('.pth', '_test.pth'), map_location=device), strict=False)
181
+ model.load_state_dict(torch.load(weights_path, map_location=device), strict=False)
182
+
183
+ optimizer = torch.optim.AdamW(model.parameters(), lr=args.ft_lr)
184
+
185
+ best_metric = -float('inf')
186
+
187
+ if args.task == 'cls':
188
+ for epoch in range(1, args.ft_epochs + 1):
189
+ train_loss, ap, auc, f1, acc = train_cls(args, epoch, model, train_loader, test_loader, device, criterion, optimizer)
190
+ logging.info(f'Epoch: {epoch:03d} Train Loss: {train_loss:.3f}, ap: {ap:.3f}, auc: {auc:.3f}, f1: {f1:.3f}, acc: {acc:.3f}')
191
+ avg_metric = ap + auc #+ f1 + acc
192
+ if avg_metric > best_metric:
193
+ logging.info(f'Epoch: {epoch:03d} New best VALIDATION metrics')
194
+ best_metric = avg_metric
195
+ best_perform_list[fold] = np.asarray([ap, auc, f1, acc])
196
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_ft.pth'))
197
+
198
+
199
+
200
+ if __name__ == "__main__":
201
+ main()
gradcam.py ADDED
@@ -0,0 +1,407 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn.functional as F
3
+ import numpy as np
4
+ from PIL import Image
5
+ import matplotlib.pyplot as plt
6
+ from matplotlib.colors import ListedColormap
7
+ from torchvision import transforms
8
+ from network import DMutaPeptideCNN
9
+ from dataset import draw_peptide, encode_sequence
10
+
11
+ class GradCAMMulti:
12
+ def __init__(self, model):
13
+ self.model = model
14
+ self.has_side_enc = hasattr(model, 'side_encoder') and model.side_encoder is not None
15
+
16
+ def generate(self, img1, img2, seq1=None, seq2=None, target_class=1):
17
+ self.model.eval()
18
+
19
+ # 先计算两个图的原始CAM(未归一化)
20
+ cam1_raw = self._compute_cam_for_input(img1, img2, seq1, seq2, target_class, analyze_idx=0, normalize=False)
21
+ cam2_raw = self._compute_cam_for_input(img1, img2, seq1, seq2, target_class, analyze_idx=1, normalize=False)
22
+
23
+ # 使用全局最大最小值进行归一化
24
+ global_min = min(cam1_raw.min(), cam2_raw.min())
25
+ global_max = max(cam1_raw.max(), cam2_raw.max())
26
+
27
+ hm_cnn1 = self._normalize_cam(cam1_raw, global_min, global_max)
28
+ hm_cnn2 = self._normalize_cam(cam2_raw, global_min, global_max)
29
+
30
+ if not self.has_side_enc:
31
+ return hm_cnn1, hm_cnn2
32
+
33
+ # 序列热力图也使用相同的策略
34
+ seq1_raw = self._compute_seq_cam_for_input(img1, img2, seq1, seq2, target_class, analyze_idx=0, normalize=False)
35
+ seq2_raw = self._compute_seq_cam_for_input(img1, img2, seq1, seq2, target_class, analyze_idx=1, normalize=False)
36
+
37
+ seq_global_min = min(seq1_raw.min(), seq2_raw.min())
38
+ seq_global_max = max(seq1_raw.max(), seq2_raw.max())
39
+
40
+ hm_seq1 = self._normalize_cam(seq1_raw, seq_global_min, seq_global_max)
41
+ hm_seq2 = self._normalize_cam(seq2_raw, seq_global_min, seq_global_max)
42
+
43
+ return hm_cnn1, hm_cnn2, hm_seq1, hm_seq2
44
+
45
+ def _normalize_cam(self, cam, global_min, global_max):
46
+ """使用全局最大最小值归一化"""
47
+ cam_norm = (cam - global_min) / (global_max - global_min + 1e-8)
48
+ return np.uint8(cam_norm * 255)
49
+
50
+ def _compute_cam_for_input(self, img1, img2, seq1, seq2, target_class, analyze_idx, normalize=True):
51
+ """
52
+ analyze_idx: 0 分析 img1, 1 分析 img2
53
+ normalize: 是否在此函数内归一化(False时返回原始numpy数组)
54
+ """
55
+ if analyze_idx == 0:
56
+ img_analyze = img1.clone().requires_grad_(True)
57
+ img_other = img2.detach()
58
+ else:
59
+ img_analyze = img2.clone().requires_grad_(True)
60
+ img_other = img1.detach()
61
+
62
+ activations = []
63
+ gradients = []
64
+
65
+ def fwd_hook(mod, inp, out):
66
+ activations.append(out)
67
+ return out
68
+
69
+ def bwd_hook(mod, grad_in, grad_out):
70
+ gradients.append(grad_out[0])
71
+
72
+ last_conv = self.model.q_encoder[7][-1].conv2
73
+ fwd_h = last_conv.register_forward_hook(fwd_hook)
74
+ bwd_h = last_conv.register_full_backward_hook(bwd_hook)
75
+
76
+ try:
77
+ if self.has_side_enc:
78
+ if analyze_idx == 0:
79
+ inputs = ((img_analyze, seq1), (img_other, seq2))
80
+ else:
81
+ inputs = ((img_other, seq1), (img_analyze, seq2))
82
+ else:
83
+ if analyze_idx == 0:
84
+ inputs = (img_analyze, img_other)
85
+ else:
86
+ inputs = (img_other, img_analyze)
87
+
88
+ logits = self.model(inputs)
89
+ if isinstance(logits, tuple):
90
+ logits = logits[0]
91
+ score = logits[0, target_class]
92
+
93
+ self.model.zero_grad()
94
+ score.backward()
95
+
96
+ act = activations[analyze_idx]
97
+ grad = gradients[-(analyze_idx + 1)]
98
+
99
+ if grad is None:
100
+ cam = np.zeros((img1.shape[2], img1.shape[3]), dtype=np.float32)
101
+ return np.uint8(cam * 255) if normalize else cam
102
+
103
+ # 使用梯度的绝对值来计算权重
104
+ α = grad.abs().mean(dim=(2, 3), keepdim=True)
105
+ cam = (α * act).sum(dim=1, keepdim=True)
106
+ cam = cam.abs() # 取绝对值
107
+
108
+ cam = F.interpolate(cam, size=img1.shape[2:], mode='bilinear', align_corners=False)
109
+ cam = cam.squeeze().detach().cpu().numpy()
110
+
111
+ if normalize:
112
+ cam = (cam - cam.min()) / (cam.max() - cam.min() + 1e-8)
113
+ return np.uint8(cam * 255)
114
+ else:
115
+ return cam # 返回原始float数组
116
+
117
+ finally:
118
+ fwd_h.remove()
119
+ bwd_h.remove()
120
+
121
+ def _compute_seq_cam_for_input(self, img1, img2, seq1, seq2, target_class, analyze_idx, normalize=True):
122
+ """序列CAM计算"""
123
+ if analyze_idx == 0:
124
+ seq_analyze = seq1.clone().requires_grad_(True)
125
+ seq_other = seq2.detach()
126
+ else:
127
+ seq_analyze = seq2.clone().requires_grad_(True)
128
+ seq_other = seq1.detach()
129
+
130
+ activations = []
131
+ gradients = []
132
+
133
+ def fwd_hook(mod, inp, out):
134
+ activations.append(out)
135
+ return out
136
+
137
+ def bwd_hook(mod, grad_in, grad_out):
138
+ gradients.append(grad_out[0])
139
+
140
+ fwd_h = self.model.side_encoder.mamba.register_forward_hook(fwd_hook)
141
+ bwd_h = self.model.side_encoder.mamba.register_full_backward_hook(bwd_hook)
142
+
143
+ try:
144
+ if analyze_idx == 0:
145
+ inputs = ((img1.detach(), seq_analyze), (img2.detach(), seq_other))
146
+ else:
147
+ inputs = ((img1.detach(), seq_other), (img2.detach(), seq_analyze))
148
+
149
+ logits = self.model(inputs)
150
+ if isinstance(logits, tuple):
151
+ logits = logits[0]
152
+ score = logits[0, target_class]
153
+
154
+ self.model.zero_grad()
155
+ score.backward()
156
+
157
+ act = activations[analyze_idx]
158
+ grad = gradients[-(analyze_idx + 1)]
159
+
160
+ if grad is None:
161
+ cam_seq = np.zeros(seq1.shape[1], dtype=np.float32)
162
+ return np.uint8(cam_seq * 255) if normalize else cam_seq
163
+
164
+ # 使用绝对值
165
+ α = grad.abs().mean(dim=1, keepdim=True)
166
+ cam_seq = (act * α).sum(dim=2).abs()
167
+
168
+ cam_seq = cam_seq.squeeze().detach().cpu().numpy()
169
+
170
+ if normalize:
171
+ cam_seq = (cam_seq - cam_seq.min()) / (cam_seq.max() - cam_seq.min() + 1e-8)
172
+ return np.uint8(cam_seq * 255)
173
+ else:
174
+ return cam_seq # 返回原始float数组
175
+
176
+ finally:
177
+ fwd_h.remove()
178
+ bwd_h.remove()
179
+
180
+
181
+ def plot_seq_heat_tailpad(
182
+ seq: str,
183
+ heatmap: np.ndarray,
184
+ keep_pad: int = 2,
185
+ ax=None,
186
+ cmap='Oranges',
187
+ border_width: float = 2.0,
188
+ figsize_per_base: float = 0.3
189
+ ):
190
+ """
191
+ seq: 原始氨基酸序列,不含 padding
192
+ heatmap: np.uint8 数组,长度 N = L + padding_length
193
+ keep_pad: 在末端保留的 padding 方块数
194
+ ax: matplotlib Axes
195
+ cmap: 配色方案
196
+ border_width: 最外圈边框宽度
197
+ figsize_per_base: 每个位置宽度,用于自动计算 figsize
198
+ """
199
+ N = len(heatmap)
200
+ L = len(seq)
201
+ # 实际要显示的长度:0 ~ end_pos
202
+ end_pos = min(L + keep_pad, N)
203
+ data = heatmap[:end_pos].astype(np.float32) / 255.0 # 归一化到 [0,1]
204
+ M = end_pos
205
+
206
+ # 构造 x 轴标签:前 L 位显示字母,后面 keep_pad 位留空
207
+ xticks = [seq[i] if i < L else '' for i in range(M)]
208
+
209
+ if ax is None:
210
+ fig, ax = plt.subplots(
211
+ figsize=(figsize_per_base * M, 1.5),
212
+ dpi=100
213
+ )
214
+ im = ax.imshow(
215
+ data[np.newaxis, :], # 变为 shape (1, M)
216
+ cmap=cmap,
217
+ aspect='auto',
218
+ interpolation='nearest',
219
+ vmin=0, vmax=1
220
+ )
221
+
222
+ # x 轴在顶部显示
223
+ ax.set_xticks(np.arange(M))
224
+ ax.set_xticklabels(xticks, fontsize=12)
225
+ ax.xaxis.set_ticks_position('top')
226
+ ax.xaxis.set_label_position('top')
227
+
228
+ # 隐藏 y 轴
229
+ ax.set_yticks([])
230
+
231
+ # 四周画一圈粗边框
232
+ for spine in ax.spines.values():
233
+ spine.set_visible(True)
234
+ spine.set_linewidth(border_width)
235
+ spine.set_edgecolor('black')
236
+
237
+ return im, ax
238
+
239
+
240
+ def inv_norm(tensor: torch.Tensor, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
241
+ tensor = tensor.clone()
242
+ for t, m, s in zip(tensor, mean, std):
243
+ t.mul_(s).add_(m)
244
+ return -tensor
245
+
246
+
247
+ def diff_hm(hm1, hm2):
248
+ diff = hm2.astype(np.float32) - hm1.astype(np.float32) + 127.
249
+ return np.clip(diff, 0, 255).astype(np.uint8)
250
+
251
+ def get_resnet18_last_conv(model):
252
+ """
253
+ 获取 ResNet18 的最后一个卷积层
254
+ 从打印的结构可知:
255
+ - model.q_encoder[7] 是 layer4 (Sequential with 2 BasicBlocks)
256
+ - model.q_encoder[7][-1] 是最后一个 BasicBlock
257
+ - model.q_encoder[7][-1].conv2 是最后一个卷积层
258
+ """
259
+ return model.q_encoder[7][-1].conv2
260
+
261
+ def add_alpha_to_cmap(base_cmap='RdBu_r', name='RdBu_alpha', center_alpha=0.0):
262
+ """
263
+ 给已有的colormap添加alpha通道
264
+
265
+ Args:
266
+ base_cmap: 基础colormap名称
267
+ name: 新colormap名称
268
+ center_alpha: 中心透明度
269
+ """
270
+ from matplotlib import colormaps as cm
271
+
272
+ # 获取基础colormap
273
+ base = cm.get_cmap(base_cmap)
274
+
275
+ # 创建新的颜色数组
276
+ n = 256
277
+ colors = base(np.linspace(0, 1, n))
278
+
279
+ # 修改alpha通道:中心透明,两端不透明
280
+ alpha_values = np.abs(np.linspace(-1, 1, n)) # V型曲线
281
+ alpha_values = alpha_values ** 0.7 # 调整曲线形状
282
+ alpha_values = alpha_values * (1 - center_alpha) + center_alpha
283
+
284
+ colors[:, 3] = alpha_values
285
+
286
+ return ListedColormap(colors, name=name)
287
+
288
+ def main(sequence1, sequence2, model):
289
+ img1 = draw_peptide(sequence1, pcs=True)
290
+ img2 = draw_peptide(sequence2, pcs=True)
291
+ img1_raw = transforms.ToPILImage()(inv_norm(img1))
292
+ img2_raw = transforms.ToPILImage()(inv_norm(img2))
293
+
294
+ # img1_raw.save('./gradcam/img1.png')
295
+
296
+ img1 = img1.unsqueeze(0).to(torch.device('cuda'))
297
+ img2 = img2.unsqueeze(0).to(torch.device('cuda'))
298
+
299
+ has_side_enc = hasattr(model, 'side_enc') and model.side_enc
300
+
301
+ if has_side_enc:
302
+ # 假设序列已 one-hot 或 embedding,直接作 tensor
303
+ seq1 = encode_sequence(sequence1, 30).unsqueeze(0).to(torch.device('cuda'))
304
+ seq2 = encode_sequence(sequence2, 30).unsqueeze(0).to(torch.device('cuda'))
305
+
306
+ # 挂 hook - ResNet18 的最后一个卷积层
307
+ cam = GradCAMMulti(model)
308
+
309
+ # 生成热力图
310
+ hm_c1, hm_c2, hm_s1, hm_s2 = cam.generate(
311
+ img1, img2, seq1, seq2,
312
+ target_class=1
313
+ )
314
+ else:
315
+ seq1 = seq2 = None
316
+ cam = GradCAMMulti(model)
317
+
318
+ hm_c1, hm_c2 = cam.generate(
319
+ img1, img2, seq1, seq2,
320
+ target_class=1
321
+ )
322
+
323
+ # 可视化 CNN 热力图
324
+ def show_img_heat(img_pil, hm, name, cmap='jet', alpha=0.4):
325
+ plt.figure(figsize=(5, 5))
326
+ img = np.array(img_pil.resize(hm.shape[::-1]))
327
+ plt.imshow(img, alpha=0.8)
328
+ plt.imshow(hm, cmap=cmap, alpha=alpha)
329
+ plt.axis('off')
330
+ plt.savefig(f'{name}.png',
331
+ bbox_inches='tight',
332
+ pad_inches=0,
333
+ dpi=200)
334
+ plt.close()
335
+
336
+ diff_cmap = add_alpha_to_cmap()
337
+
338
+ hm_diff = diff_hm(hm_c1, hm_c2)
339
+
340
+ show_img_heat(img1_raw, hm_c1, f'./gradcam/{sequence1}-temp')
341
+ show_img_heat(img2_raw, hm_c2, f'./gradcam/{sequence2}-muta')
342
+ show_img_heat(img2_raw, hm_diff, f'./gradcam/{sequence2}-diff', cmap=diff_cmap, alpha=0.8)
343
+
344
+ # 可视化序列热力图(如果有)
345
+ if has_side_enc:
346
+ fig, axes = plt.subplots(
347
+ 2, 1,
348
+ figsize=(len(sequence1) * 0.3, 1.25),
349
+ constrained_layout=True
350
+ )
351
+ plot_seq_heat_tailpad(
352
+ sequence1, hm_s1,
353
+ keep_pad=0,
354
+ ax=axes[0],
355
+ cmap='jet'
356
+ )
357
+ plot_seq_heat_tailpad(
358
+ sequence2, hm_s2,
359
+ keep_pad=0,
360
+ ax=axes[1],
361
+ cmap='jet'
362
+ )
363
+ plt.savefig(f'./gradcam/{sequence1}_seq.svg')
364
+ plt.close()
365
+
366
+ fig, ax = plt.subplots(
367
+ 1, 1,
368
+ figsize=(len(sequence1) * 0.3, 0.625),
369
+ constrained_layout=True
370
+ )
371
+ plot_seq_heat_tailpad(
372
+ sequence2, diff_hm(hm_s1, hm_s2),
373
+ keep_pad=0,
374
+ ax=ax,
375
+ cmap=diff_cmap
376
+ )
377
+ plt.savefig(f'./gradcam/{sequence2}_diff.svg')
378
+ plt.close()
379
+
380
+
381
+ # —— 使用示例 —— #
382
+ if __name__ == "__main__":
383
+ # 1) load model
384
+ model = DMutaPeptideCNN(
385
+ q_encoder='rn18',
386
+ classes=2,
387
+ channels=16,
388
+ dir=False,
389
+ gf=False,
390
+ side_enc='mamba',
391
+ fusion='diff'
392
+ )
393
+ model.eval().to(torch.device('cuda'))
394
+ model.load_state_dict(
395
+ torch.load("run-cls/rn18-diff-16-mamba-pcs-768-ce-32-0.001-50/model_0.pth",
396
+ map_location=torch.device('cuda')),
397
+ strict=True
398
+ )
399
+
400
+ # 2) 准备数据
401
+ sequence1 = "KWKIKWPVKWFKML"
402
+ sequence2 = "KWKIKWPVKWfKML"
403
+ main(sequence1, sequence2, model)
404
+
405
+ sequence1 = "KKLFKKILKYL"
406
+ sequence2 = "KKLFKKiLKYL"
407
+ main(sequence1, sequence2, model)
gradcam/KKLFKKILKYL-temp.png ADDED

Git LFS Details

  • SHA256: 3418b14be289d12e2c3242fcbf2f2952bf8fc0be60a664d06fff79a8bd4b3e02
  • Pointer size: 131 Bytes
  • Size of remote file: 238 kB
gradcam/KKLFKKILKYL_seq.svg ADDED
gradcam/KKLFKKiLKYL-diff.png ADDED

Git LFS Details

  • SHA256: 4e1d2bd73b75d54cb4cc757e6166375dc590e9ae321207e5bb185d6d37e58abb
  • Pointer size: 131 Bytes
  • Size of remote file: 372 kB
gradcam/KKLFKKiLKYL-muta.png ADDED

Git LFS Details

  • SHA256: bb816e2944edbb571f272ed0fd6fd6f21fe8111562d87fd6897d72e6c5f4094e
  • Pointer size: 131 Bytes
  • Size of remote file: 273 kB
gradcam/KKLFKKiLKYL_diff.svg ADDED
gradcam/KWKIKWPVKWFKML-temp.png ADDED

Git LFS Details

  • SHA256: ea2a037791e6747203f40cf82797a3a9e981721f401f941e5d138dd25adec63e
  • Pointer size: 131 Bytes
  • Size of remote file: 272 kB
gradcam/KWKIKWPVKWFKML_seq.svg ADDED
gradcam/KWKIKWPVKWfKML-diff.png ADDED

Git LFS Details

  • SHA256: 9a650fd05bd74bf265dfb3ee6b5b014ce527360cd73f0b2fcdb0e56f59c011e5
  • Pointer size: 131 Bytes
  • Size of remote file: 262 kB
gradcam/KWKIKWPVKWfKML-muta.png ADDED

Git LFS Details

  • SHA256: e82859a83a4edaa8f11492f65044023888a202ffc5cd93696cc723b25bf3073b
  • Pointer size: 131 Bytes
  • Size of remote file: 277 kB
gradcam/KWKIKWPVKWfKML_diff.svg ADDED
gradcam/img1.png ADDED
infer.py ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ from dataset import PeptidePairDataset, PeptidePairPicDataset
3
+ from network import DMutaPeptide, DMutaPeptideCNN
4
+ from train import move_to_device
5
+ import torch
6
+ import torch.nn as nn
7
+ from torch.utils.data import DataLoader
8
+ import numpy as np
9
+ from utils import set_seed
10
+ import pandas as pd
11
+ from torchmetrics import MeanAbsoluteError, RelativeSquaredError, PearsonCorrCoef, KendallRankCorrCoef, F1Score, Accuracy, AveragePrecision, AUROC
12
+
13
+ parser = argparse.ArgumentParser(description='resnet26')
14
+ # model setting
15
+ parser.add_argument('--model', type=str, default='resnet34',
16
+ help='resnet34 resnet50 densenet')
17
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='cnn',
18
+ help='lstm mamba mla')
19
+ parser.add_argument('--channels', type=int, default=16)
20
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default=None,
21
+ help="use side features")
22
+ parser.add_argument('--fusion', type=str, default='mlp',
23
+ help='mlp att')
24
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
25
+ help="use global features")
26
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
27
+ help="use non-siamese architecture")
28
+
29
+ # task & dataset setting
30
+ parser.add_argument('--task', type=str, default='cls',
31
+ help='reg or cls')
32
+ parser.add_argument('--pdb-src', type=str, dest='pdb_src', default='af',
33
+ help='af or hf')
34
+ parser.add_argument('--data-ver', type=str, dest='data_ver', default='250228',
35
+ help='data version')
36
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=False,
37
+ help='use one-way constructed dataset')
38
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
39
+ help='Max length for sequence filtering')
40
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
41
+ help='resize the image')
42
+ parser.add_argument('--split', type=int, default=5,
43
+ help="Split k fold in cross validation (default: 5)")
44
+ parser.add_argument('--seed', type=int, default=1,
45
+ help="Seed (default: 1)")
46
+ parser.add_argument('--pcs', action='store_true', default=False,
47
+ help='Consider protease cut site')
48
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
49
+ help='Consider protease cut site')
50
+
51
+ # training setting
52
+ parser.add_argument('--gpu', type=int, default=0,
53
+ help='GPU index to use, -1 for CPU (default: 0)')
54
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
55
+ help='input batch size for training (default: 128)')
56
+ parser.add_argument('--epochs', type=int, default=50,
57
+ help='number of epochs to train (default: 100)')
58
+ parser.add_argument('--lr', type=float, default=0.001,
59
+ help='learning rate (default: 0.001)')
60
+ parser.add_argument('--decay', type=float, default=0.0005,
61
+ help='weight decay (default: 0.0005)')
62
+ parser.add_argument('--warm-steps', type=int, dest='warm_steps', default=0,
63
+ help='number of warm start steps for learning rate (default: 10)')
64
+ parser.add_argument('--patience', type=int, default=10,
65
+ help='patience for early stopping (default: 10)')
66
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
67
+ help='path of the pretrain model') # /home/duadua/Desktop/fetal/3dpretrain/runs/e50.pth
68
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
69
+ help='metric average type')
70
+
71
+ parser.add_argument('--loss', type=str, default='ce',
72
+ help='loss function')
73
+ parser.add_argument('--dir', action='store_true', default=False,
74
+ help='use DIR')
75
+
76
+ parser.add_argument('--simple', dest='simple', action='store_true', default=False)
77
+ parser.add_argument('--llm-data', dest='llm_data', action='store_true', default=False)
78
+ parser.add_argument('--uda', type=str, default=None)
79
+
80
+ args = parser.parse_args()
81
+
82
+ if args.llm_data:
83
+ args.simple = True
84
+
85
+ if args.simple:
86
+ args.one_way = True
87
+
88
+ if args.mix_pcs:
89
+ args.pcs = 'mix'
90
+
91
+ if args.q_encoder in ['cnn', 'rn18']:
92
+ weight_dir = f'./run-{args.task}/{f"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
93
+ else:
94
+ weight_dir = f'./run-{args.task}/{f"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
95
+
96
+ if args.uda:
97
+ weight_dir += f'/uda_{args.uda}'
98
+
99
+ print(weight_dir)
100
+
101
+ def metrics(preds, gt, task):
102
+ avg = 'marco'
103
+ device = preds.device
104
+ if task == 'cls':
105
+ metric_1 = AveragePrecision(average=avg, task='binary').to(device)
106
+ metric_2 = AUROC(average=avg, task='binary').to(device)
107
+ metric_3 = F1Score(average=avg, task='binary').to(device)
108
+ metric_4 = Accuracy(average=avg, task='binary').to(device)
109
+ all_metrics = [metric_1(preds, gt).item(),
110
+ metric_2(preds, gt).item(),
111
+ metric_3(preds, gt).item(),
112
+ metric_4(preds, gt).item()]
113
+
114
+ elif task == 'reg':
115
+ metric_1 = MeanAbsoluteError().to(device)
116
+ metric_2 = RelativeSquaredError(num_outputs=1).to(device)
117
+ metric_3 = PearsonCorrCoef(num_outputs=1).to(device)
118
+ metric_4 = KendallRankCorrCoef(num_outputs=1).to(device)
119
+ all_metrics = [metric_1(preds, gt).item(),
120
+ metric_2(preds, gt).item(),
121
+ metric_3(preds.squeeze(), gt.squeeze()).mean().item(),
122
+ metric_4(preds.squeeze(), gt.squeeze()).mean().item()]
123
+
124
+ return [f'{i * 100:.2f}' for i in all_metrics]
125
+
126
+
127
+ def main(dataset):
128
+ set_seed(args.seed)
129
+ if args.task == 'reg':
130
+ args.classes = 1
131
+ elif args.task == 'cls':
132
+ args.classes = 2
133
+ else:
134
+ raise NotImplementedError("unimplemented task")
135
+
136
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
137
+
138
+ if args.q_encoder in ['cnn', 'rn18']:
139
+ model = DMutaPeptideCNN(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese).to(device).eval()
140
+ test_set = PeptidePairPicDataset(mode=dataset, pad_length=args.max_length, task=args.task, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
141
+ else:
142
+ model = DMutaPeptide(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, fusion=args.fusion, non_siamese=args.non_siamese).to(device).eval()
143
+ test_set = PeptidePairDataset(mode=dataset, pad_length=args.max_length, task=args.task, gf=args.glob_feat)
144
+
145
+ test_loader = DataLoader(test_set, batch_size=args.batch_size, shuffle=False)
146
+
147
+ df = pd.DataFrame()
148
+ raw_preds = []
149
+ ckpt_names = ['model_uda_teacher'] if args.uda else [f'model_{i}_test' for i in range(5)]
150
+ for i in ckpt_names:
151
+ model.load_state_dict(torch.load(f'{weight_dir}/{i}.pth', map_location=device))
152
+ preds = []
153
+ gt_list_valid = []
154
+ with torch.no_grad():
155
+ for data in test_loader:
156
+ x, gt = data
157
+ gt_list_valid.append(gt.to(device))
158
+ out = model(move_to_device(x, device))
159
+ if args.dir:
160
+ out, _ = out
161
+ preds.append(out)
162
+ r_pred = torch.cat(preds, dim=0)
163
+ if args.task == 'reg':
164
+ preds = r_pred.cpu().numpy()
165
+ elif args.task == 'cls':
166
+ preds = torch.softmax(r_pred, dim=-1)[:, 1].cpu().numpy()
167
+ gt_tensor = torch.cat(gt_list_valid, dim=0)
168
+ gt_list_valid = gt_tensor.cpu().numpy()
169
+ df[f'{i}'] = preds
170
+ raw_preds.append(r_pred)
171
+ if args.task == 'cls':
172
+ preds_tensor = torch.softmax(torch.stack(raw_preds, 0).mean(0), dim=-1)[:, 1]
173
+ elif args.task == 'reg':
174
+ preds_tensor = torch.stack(raw_preds, 0).mean(0)
175
+ df['fusion'] = preds_tensor.cpu().numpy()
176
+ df['gt'] = gt_list_valid
177
+ df.to_csv(f'{weight_dir}/preds_{dataset}.csv', index=False)
178
+ return metrics(preds_tensor, gt_tensor, args.task)
179
+
180
+
181
+ if __name__ == '__main__':
182
+ if args.task == 'cls':
183
+ df = pd.DataFrame(columns=['dataset', 'AUPRC', 'AUROC', 'F1', 'ACC'])
184
+ elif args.task == 'reg':
185
+ df = pd.DataFrame(columns=['dataset', 'MAE', 'RSE', 'PCC', 'KCC'])
186
+
187
+ datasets = [
188
+ 'r2_case',
189
+ # 'r2_case_'
190
+ "test",
191
+ # "mhb",
192
+ # "nacl",
193
+ # "125fbs",
194
+ # "25fbs",
195
+ ]
196
+
197
+ for dataset in datasets:
198
+ results = main(dataset)
199
+ df.loc[len(df) + 1] = [dataset] + results
200
+ df.to_csv(f'{weight_dir}/inference_results.csv', index=False)
201
+ print(df)
infer_case.py ADDED
@@ -0,0 +1,245 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import time
3
+ from dataset import PeptidePairPicCaseDataset, encode_sequence
4
+ from network import DMutaPeptideCNN
5
+ from train import move_to_device
6
+ import torch
7
+ import torch.nn as nn
8
+ from torch.utils.data import DataLoader
9
+ import numpy as np
10
+ from utils import set_seed
11
+ import pandas as pd
12
+
13
+ parser = argparse.ArgumentParser(description='resnet26')
14
+ # model setting
15
+ parser.add_argument('--model', type=str, default='resnet34',
16
+ help='resnet34 resnet50 densenet')
17
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='cnn',
18
+ help='lstm mamba mla')
19
+ parser.add_argument('--channels', type=int, default=16)
20
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default='lstm',
21
+ help="use side features")
22
+ parser.add_argument('--fusion', type=str, default='att',
23
+ help='mlp att')
24
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
25
+ help="use global features")
26
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
27
+ help="use non-siamese architecture")
28
+
29
+ # task & dataset setting
30
+ parser.add_argument('--task', type=str, default='cls',
31
+ help='reg or cls')
32
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=False,
33
+ help='use one-way constructed dataset')
34
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
35
+ help='Max length for sequence filtering')
36
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
37
+ help='resize the image')
38
+ parser.add_argument('--split', type=int, default=5,
39
+ help="Split k fold in cross validation (default: 5)")
40
+ parser.add_argument('--seed', type=int, default=1,
41
+ help="Seed for model initialization (default: 1)")
42
+ parser.add_argument('--pcs', action='store_true', default=False,
43
+ help='Consider protease cut site')
44
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
45
+ help='Consider protease cut site')
46
+
47
+ # training setting
48
+ parser.add_argument('--gpu', type=int, default=0,
49
+ help='GPU index to use, -1 for CPU (default: 0)')
50
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
51
+ help='input batch size for training (default: 128)')
52
+ parser.add_argument('--epochs', type=int, default=50,
53
+ help='number of epochs to train (default: 100)')
54
+ parser.add_argument('--lr', type=float, default=0.001,
55
+ help='learning rate (default: 0.001)')
56
+ parser.add_argument('--decay', type=float, default=0.0005,
57
+ help='weight decay (default: 0.0005)')
58
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
59
+ help='path of the pretrain model')
60
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
61
+ help='metric average type')
62
+
63
+ parser.add_argument('--loss', type=str, default='ce',
64
+ help='loss function')
65
+ parser.add_argument('--dir', action='store_true', default=False,
66
+ help='use DIR')
67
+
68
+ parser.add_argument('--simple', dest='simple', action='store_true', default=False)
69
+ parser.add_argument('--llm-data', dest='llm_data', action='store_true', default=False)
70
+
71
+ # Case Study Specific
72
+ parser.add_argument('--case', type=str, default='r2',
73
+ help='case to infer')
74
+ parser.add_argument('--use-ft', dest='use_ft', type=str, default='')
75
+
76
+ args = parser.parse_args()
77
+
78
+ if args.llm_data:
79
+ args.simple = True
80
+
81
+ if args.simple:
82
+ args.one_way = True
83
+
84
+ if args.mix_pcs:
85
+ args.pcs = 'mix'
86
+
87
+ if args.gpu != -1:
88
+ torch.backends.cudnn.benchmark = True
89
+ torch.set_float32_matmul_precision('high')
90
+
91
+
92
+ class FasterModelForCase(DMutaPeptideCNN):
93
+ def cache_temp_vector(self, seq):
94
+ if self.side_enc:
95
+ seq_seq = seq[1]
96
+ seq = seq[0]
97
+ if self.side_encoder.__class__.__name__ == 'MambaModel':
98
+ self.temp_seq_vector = self.norm(self.side_encoder(seq_seq))
99
+ else:
100
+ self.temp_seq_vector = self.norm(self.side_encoder(seq_seq)[0][:, -1, :])
101
+ self.temp_vector = self.norm(self.q_encoder(seq))
102
+
103
+ def forward(self, x, labels=None, epoch=0):
104
+ seq2 = x
105
+
106
+ if self.side_enc:
107
+ seq2_seq = seq2[1]
108
+ seq2 = seq2[0]
109
+
110
+ batch_size = seq2.shape[0]
111
+
112
+ fusion = []
113
+
114
+ # 获取两个序列的编码结果
115
+ fusion.append(self.temp_vector.expand(batch_size, -1))
116
+ fusion.append(self.norm(self.q_encoder_2(seq2)))
117
+ if self.side_enc:
118
+ fusion.append(self.temp_seq_vector.expand(batch_size, -1))
119
+ if self.side_encoder.__class__.__name__ == 'MambaModel':
120
+ fusion.append(self.norm(self.side_encoder_2(seq2_seq)))
121
+ else:
122
+ fusion.append(self.norm(self.side_encoder_2(seq2_seq)[0][:, -1, :]))
123
+
124
+ # 根据 fusion_method 决定融合方式
125
+ if self.fusion_method == 'mlp':
126
+ # 维持原有行为:拼接两个向量
127
+ fusion = torch.cat(fusion, dim=-1)
128
+ elif self.fusion_method == 'diff':
129
+ if not self.side_enc:
130
+ fusion = torch.cat([fusion[1] - fusion[0]] + fusion[2:], dim=-1)
131
+ else:
132
+ fusion = torch.cat([fusion[1] - fusion[0], fusion[3] - fusion[2]] + fusion[4:], dim=-1)
133
+ elif self.fusion_method == 'att':
134
+ # 使用 attention 融合:
135
+ # 先将两个向量堆叠成“tokens”,形状:(batch, 2, embed_dim)
136
+ tokens = torch.stack(fusion, dim=1) # embed_dim 应该为 final_dim//2
137
+ # 利用 MultiheadAttention 进行自注意力计算
138
+ # 注意:因为采用 batch_first=True,所以输入形状为 (batch, seq_len, embed_dim)
139
+ attn_output, _ = self.attn(tokens, tokens, tokens)
140
+ # 将 attention 输出展平,得到形状 (batch, 2 * embed_dim),即 (batch, final_dim)
141
+ fusion = attn_output.reshape(attn_output.size(0), -1)
142
+ else:
143
+ raise ValueError("Invalid fusion method: choose either 'mse' or 'att'.")
144
+
145
+ # 如果启用 DIR 模块,保留传入 FDS 前的特征表示
146
+ if self.DIR:
147
+ features = fusion
148
+ fusion = self.FDS.smooth(fusion, labels, epoch)
149
+
150
+ pred = self.fc(fusion)
151
+
152
+ if self.DIR:
153
+ return pred, features
154
+ else:
155
+ return pred
156
+
157
+ class CustomDataset(PeptidePairPicCaseDataset):
158
+ def __getitem__(self, idx):
159
+ variant = self.variants[idx]
160
+ seq2, label = variant, variant
161
+ img2 = self.read_img(variant)
162
+
163
+ if self.side_enc:
164
+ img2 = (img2, encode_sequence(seq2, self.pad_length))
165
+
166
+ return img2, label
167
+
168
+ def load_model(args, weight_path, device, temp_batch):
169
+ model = FasterModelForCase(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese).to(device).eval()
170
+ model.load_state_dict(torch.load(weight_path, map_location=device), strict=False)
171
+ model.cache_temp_vector(move_to_device(temp_batch, device))
172
+ model.compile()
173
+ return model
174
+
175
+
176
+ def main():
177
+ set_seed(args.seed)
178
+ if args.task == 'reg':
179
+ args.classes = 1
180
+ elif args.task == 'cls':
181
+ args.classes = 2
182
+ else:
183
+ raise NotImplementedError("unimplemented task")
184
+ weight_dir = f'./run-{args.task}/{args.q_encoder}{f"-non-siamese" if args.non_siamese else ""}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
185
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
186
+ print(weight_dir)
187
+ print(device)
188
+
189
+ test_set = CustomDataset(case=args.case, pad_length=args.max_length, side_enc=args.side_enc, pcs=True, resize=args.resize, gf=args.glob_feat)
190
+ test_loader = DataLoader(test_set, batch_size=192, shuffle=False, num_workers=16, pin_memory=True)
191
+ # test_loader = DataLoader(test_set, batch_size=192, shuffle=False, num_workers=8)
192
+
193
+ temp_batch = test_set.template_pic.unsqueeze(0)
194
+ if args.side_enc:
195
+ temp_batch = (temp_batch, test_set.template_seq.unsqueeze(0))
196
+
197
+ models = [load_model(args, f'{weight_dir}/model_{i}{f"_{args.use_ft}" if args.use_ft else ""}.pth', device, temp_batch) for i in range(args.split)]
198
+ # models = [load_model(args, f'{weight_dir}/model_{i}{"_ft" if args.use_ft else ""}.pth', device, temp_batch) for i in [0]]
199
+
200
+ all_seqs = []
201
+ logits_batches = [] # 存放每个 batch 的 [m,B,2] avg_logits (CPU 上)
202
+
203
+ start_time = time.time()
204
+
205
+ with torch.no_grad():
206
+ for x, gt in test_loader:
207
+ # x: [B, ...] on CPU pin memory,gt: tuple of B strings
208
+ x = move_to_device(x, device, non_blocking=True)
209
+ # x = move_to_device(x, device)
210
+
211
+ # 1) 记录 5 个模型的 logits
212
+ # logits: [m,B,2]
213
+ logits = torch.zeros(len(models), len(gt), args.classes, device=device)
214
+ for i, m in enumerate(models):
215
+ logits[i] = m(x)
216
+ # avg_logits = sum_logits.div_(len(models))
217
+
218
+ # 3) 立刻搬到 CPU(pin_memory 下可以 non_blocking)
219
+ logits_batches.append(logits.cpu())
220
+ all_seqs.extend(gt)
221
+
222
+ # 拼接成 [n,2],n = sum(batch_size)
223
+ all_logits = torch.cat(logits_batches, dim=1) # [m,n,2]
224
+
225
+ if args.task == 'reg':
226
+ preds = all_logits.mean(0).squeeze().tolist()
227
+ elif args.task == 'cls':
228
+ # 最后一次性 softmax,取正类概率
229
+ preds = torch.softmax(all_logits, dim=-1)[:, :, 1].mean(0).tolist()
230
+
231
+ consumed_time = time.time() - start_time
232
+ print(f'total consumed time: {consumed_time} s')
233
+ print(f'time per sample: {consumed_time / len(test_set)} s')
234
+
235
+ # 保存到 CSV
236
+ df = pd.DataFrame({
237
+ "seq": all_seqs,
238
+ "pred": preds,
239
+ })
240
+
241
+ df.to_csv(f'{weight_dir}/preds_case{f"_{args.use_ft}" if args.use_ft else ""}.csv', index=False)
242
+
243
+
244
+ if __name__ == '__main__':
245
+ main()
infer_case_feature.py ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import time
3
+ from dataset import PeptidePairPicCaseDataset, encode_sequence
4
+ from network import DMutaPeptideCNN
5
+ from train import move_to_device
6
+ import torch
7
+ import torch.nn as nn
8
+ from torch.utils.data import DataLoader
9
+ import numpy as np
10
+ from utils import set_seed
11
+ import pandas as pd
12
+
13
+ parser = argparse.ArgumentParser(description='resnet26')
14
+ # model setting
15
+ parser.add_argument('--model', type=str, default='resnet34',
16
+ help='resnet34 resnet50 densenet')
17
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='cnn',
18
+ help='lstm mamba mla')
19
+ parser.add_argument('--channels', type=int, default=16)
20
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default=None,
21
+ help="use side features")
22
+ parser.add_argument('--fusion', type=str, default='att',
23
+ help='mlp att')
24
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
25
+ help="use global features")
26
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
27
+ help="use non-siamese architecture")
28
+
29
+ # task & dataset setting
30
+ parser.add_argument('--task', type=str, default='cls',
31
+ help='reg or cls')
32
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=False,
33
+ help='use one-way constructed dataset')
34
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
35
+ help='Max length for sequence filtering')
36
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
37
+ help='resize the image')
38
+ parser.add_argument('--split', type=int, default=5,
39
+ help="Split k fold in cross validation (default: 5)")
40
+ parser.add_argument('--seed', type=int, default=1,
41
+ help="Seed for model initialization (default: 1)")
42
+ parser.add_argument('--pcs', action='store_true', default=False,
43
+ help='Consider protease cut site')
44
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
45
+ help='Consider protease cut site')
46
+
47
+ # training setting
48
+ parser.add_argument('--gpu', type=int, default=0,
49
+ help='GPU index to use, -1 for CPU (default: 0)')
50
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
51
+ help='input batch size for training (default: 128)')
52
+ parser.add_argument('--epochs', type=int, default=50,
53
+ help='number of epochs to train (default: 100)')
54
+ parser.add_argument('--lr', type=float, default=0.001,
55
+ help='learning rate (default: 0.001)')
56
+ parser.add_argument('--decay', type=float, default=0.0005,
57
+ help='weight decay (default: 0.0005)')
58
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
59
+ help='path of the pretrain model')
60
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
61
+ help='metric average type')
62
+
63
+ parser.add_argument('--loss', type=str, default='ce',
64
+ help='loss function')
65
+ parser.add_argument('--dir', action='store_true', default=False,
66
+ help='use DIR')
67
+
68
+ parser.add_argument('--simple', dest='simple', action='store_true', default=False)
69
+ parser.add_argument('--llm-data', dest='llm_data', action='store_true', default=False)
70
+
71
+ # Case Study Specific
72
+ parser.add_argument('--case', type=str, default='r2',
73
+ help='case to infer')
74
+ parser.add_argument('--uda', action='store_true', default=False)
75
+
76
+ args = parser.parse_args()
77
+
78
+ if args.llm_data:
79
+ args.simple = True
80
+
81
+ if args.simple:
82
+ args.one_way = True
83
+
84
+ if args.mix_pcs:
85
+ args.pcs = 'mix'
86
+
87
+ if args.gpu != -1:
88
+ torch.backends.cudnn.benchmark = True
89
+ torch.set_float32_matmul_precision('high')
90
+
91
+
92
+ class FasterModelForCase(DMutaPeptideCNN):
93
+ def cache_temp_vector(self, seq):
94
+ if self.side_enc:
95
+ seq_seq = seq[1]
96
+ seq = seq[0]
97
+ if self.side_encoder.__class__.__name__ == 'MambaModel':
98
+ self.temp_seq_vector = self.norm(self.side_encoder(seq_seq))
99
+ else:
100
+ self.temp_seq_vector = self.norm(self.side_encoder(seq_seq)[0][:, -1, :])
101
+ self.temp_vector = self.norm(self.q_encoder(seq))
102
+
103
+ def forward(self, x, labels=None, epoch=0):
104
+ seq2 = x
105
+
106
+ if self.side_enc:
107
+ seq2_seq = seq2[1]
108
+ seq2 = seq2[0]
109
+
110
+ batch_size = seq2.shape[0]
111
+
112
+ fusion = []
113
+
114
+ # 获取两个序列的编码结果
115
+ fusion.append(self.temp_vector.expand(batch_size, -1))
116
+ fusion.append(self.norm(self.q_encoder_2(seq2)))
117
+ if self.side_enc:
118
+ fusion.append(self.temp_seq_vector.expand(batch_size, -1))
119
+ if self.side_encoder.__class__.__name__ == 'MambaModel':
120
+ fusion.append(self.norm(self.side_encoder_2(seq2_seq)))
121
+ else:
122
+ fusion.append(self.norm(self.side_encoder_2(seq2_seq)[0][:, -1, :]))
123
+
124
+ # 根据 fusion_method 决定融合方式
125
+ if self.fusion_method == 'mlp':
126
+ # 维持原有行为:拼接两个向量
127
+ fusion = torch.cat(fusion, dim=-1)
128
+ elif self.fusion_method == 'diff':
129
+ if not self.side_enc:
130
+ fusion = torch.cat([fusion[1] - fusion[0]] + fusion[2:], dim=-1)
131
+ else:
132
+ fusion = torch.cat([fusion[1] - fusion[0], fusion[3] - fusion[2]] + fusion[4:], dim=-1)
133
+ elif self.fusion_method == 'att':
134
+ # 使用 attention 融合:
135
+ # 先将两个向量堆叠成“tokens”,形状:(batch, 2, embed_dim)
136
+ tokens = torch.stack(fusion, dim=1) # embed_dim 应该为 final_dim//2
137
+ # 利用 MultiheadAttention 进行自注意力计算
138
+ # 注意:因为采用 batch_first=True,所以输入形状为 (batch, seq_len, embed_dim)
139
+ attn_output, _ = self.attn(tokens, tokens, tokens)
140
+ # 将 attention 输出展平,得到形状 (batch, 2 * embed_dim),即 (batch, final_dim)
141
+ fusion = attn_output.reshape(attn_output.size(0), -1)
142
+ else:
143
+ raise ValueError("Invalid fusion method: choose either 'mse' or 'att'.")
144
+
145
+ feature = self.fc[:-1](fusion)
146
+ pred = self.fc[-1](feature)
147
+
148
+ return pred, feature
149
+
150
+ class CustomDataset(PeptidePairPicCaseDataset):
151
+ def __getitem__(self, idx):
152
+ variant = self.variants[idx]
153
+ seq2, label = variant, variant
154
+ img2 = self.read_img(variant)
155
+
156
+ if self.side_enc:
157
+ img2 = (img2, encode_sequence(seq2, self.pad_length))
158
+
159
+ return img2, label
160
+
161
+ def load_model(args, weight_path, device, temp_batch):
162
+ model = FasterModelForCase(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese).to(device).eval()
163
+ model.load_state_dict(torch.load(weight_path, map_location=device), strict=False)
164
+ model.cache_temp_vector(move_to_device(temp_batch, device))
165
+ model.compile()
166
+ return model
167
+
168
+
169
+ def main():
170
+ set_seed(args.seed)
171
+ if args.task == 'reg':
172
+ args.classes = 1
173
+ elif args.task == 'cls':
174
+ args.classes = 2
175
+ else:
176
+ raise NotImplementedError("unimplemented task")
177
+ weight_dir = f'./run-{args.task}/{args.q_encoder}{f"-non-siamese" if args.non_siamese else ""}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
178
+ if args.uda:
179
+ weight_dir += f'/uda_{args.case}'
180
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
181
+ print(weight_dir)
182
+ print(device)
183
+
184
+ test_set = CustomDataset(case=args.case, pad_length=args.max_length, side_enc=args.side_enc, pcs=True, resize=args.resize, gf=args.glob_feat)
185
+ test_loader = DataLoader(test_set, batch_size=192, shuffle=False, num_workers=16, pin_memory=True)
186
+ # test_loader = DataLoader(test_set, batch_size=192, shuffle=False, num_workers=8)
187
+
188
+ temp_batch = test_set.template_pic.unsqueeze(0)
189
+ if args.side_enc:
190
+ temp_batch = (temp_batch, test_set.template_seq.unsqueeze(0))
191
+
192
+ pth_path = f'{weight_dir}/model_uda_teacher.pth' if args.uda else f'{weight_dir}/model_0_test.pth'
193
+
194
+ model = load_model(args, pth_path, device, temp_batch)
195
+ # models = [load_model(args, f'{weight_dir}/model_{i}{"_ft" if args.use_ft else ""}.pth', device, temp_batch) for i in [0]]
196
+
197
+ all_features = {}
198
+ all_preds = {}
199
+
200
+ start_time = time.time()
201
+
202
+ with torch.no_grad():
203
+ for x, gt in test_loader:
204
+ # x: [B, ...] on CPU pin memory,gt: tuple of B strings
205
+ x = move_to_device(x, device, non_blocking=True)
206
+ preds, feats = model(x)
207
+ if args.task == 'cls':
208
+ preds = torch.softmax(preds, dim=1)[:, 1]
209
+ for name, feat, pred in zip(gt, feats, preds):
210
+ all_features[name] = feat.cpu()
211
+ all_preds[name] = pred.item()
212
+
213
+ consumed_time = time.time() - start_time
214
+ print(f'total consumed time: {consumed_time} s')
215
+ print(f'time per sample: {consumed_time / len(test_set)} s')
216
+
217
+ torch.save(all_features, f'{weight_dir}/features.pth')
218
+
219
+ df = pd.DataFrame(list(all_preds.items()), columns=['seq', 'pred'])
220
+ df.to_csv(f'{weight_dir}/feature_preds.csv', index=False)
221
+
222
+ if __name__ == '__main__':
223
+ main()
infer_case_uda.py ADDED
@@ -0,0 +1,247 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import time
3
+ from dataset import PeptidePairPicCaseDataset, encode_sequence
4
+ from network import DMutaPeptideCNN
5
+ from train import move_to_device
6
+ import torch
7
+ import torch.nn as nn
8
+ from torch.utils.data import DataLoader
9
+ import numpy as np
10
+ from utils import set_seed
11
+ import pandas as pd
12
+
13
+ parser = argparse.ArgumentParser(description='resnet26')
14
+ # model setting
15
+ parser.add_argument('--model', type=str, default='resnet34',
16
+ help='resnet34 resnet50 densenet')
17
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='cnn',
18
+ help='lstm mamba mla')
19
+ parser.add_argument('--channels', type=int, default=16)
20
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default=None,
21
+ help="use side features")
22
+ parser.add_argument('--fusion', type=str, default='att',
23
+ help='mlp att')
24
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
25
+ help="use global features")
26
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
27
+ help="use non-siamese architecture")
28
+
29
+ # task & dataset setting
30
+ parser.add_argument('--task', type=str, default='cls',
31
+ help='reg or cls')
32
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=False,
33
+ help='use one-way constructed dataset')
34
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
35
+ help='Max length for sequence filtering')
36
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
37
+ help='resize the image')
38
+ parser.add_argument('--split', type=int, default=5,
39
+ help="Split k fold in cross validation (default: 5)")
40
+ parser.add_argument('--seed', type=int, default=1,
41
+ help="Seed for model initialization (default: 1)")
42
+ parser.add_argument('--pcs', action='store_true', default=False,
43
+ help='Consider protease cut site')
44
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
45
+ help='Consider protease cut site')
46
+
47
+ # training setting
48
+ parser.add_argument('--gpu', type=int, default=0,
49
+ help='GPU index to use, -1 for CPU (default: 0)')
50
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
51
+ help='input batch size for training (default: 128)')
52
+ parser.add_argument('--epochs', type=int, default=50,
53
+ help='number of epochs to train (default: 100)')
54
+ parser.add_argument('--lr', type=float, default=0.001,
55
+ help='learning rate (default: 0.001)')
56
+ parser.add_argument('--decay', type=float, default=0.0005,
57
+ help='weight decay (default: 0.0005)')
58
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
59
+ help='path of the pretrain model')
60
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
61
+ help='metric average type')
62
+
63
+ parser.add_argument('--loss', type=str, default='ce',
64
+ help='loss function')
65
+ parser.add_argument('--dir', action='store_true', default=False,
66
+ help='use DIR')
67
+
68
+ parser.add_argument('--simple', dest='simple', action='store_true', default=False)
69
+ parser.add_argument('--llm-data', dest='llm_data', action='store_true', default=False)
70
+
71
+ # Case Study Specific
72
+ parser.add_argument('--case', type=str, default='r2',
73
+ help='case to infer')
74
+ parser.add_argument('--use-variant', dest='use_variant', type=str, default='')
75
+
76
+ args = parser.parse_args()
77
+
78
+ if args.llm_data:
79
+ args.simple = True
80
+
81
+ if args.simple:
82
+ args.one_way = True
83
+
84
+ if args.mix_pcs:
85
+ args.pcs = 'mix'
86
+
87
+ if args.gpu != -1:
88
+ torch.backends.cudnn.benchmark = True
89
+ torch.set_float32_matmul_precision('high')
90
+
91
+
92
+ class FasterModelForCase(DMutaPeptideCNN):
93
+ def cache_temp_vector(self, seq):
94
+ if self.side_enc:
95
+ seq_seq = seq[1]
96
+ seq = seq[0]
97
+ if self.side_encoder.__class__.__name__ == 'MambaModel':
98
+ self.temp_seq_vector = self.norm(self.side_encoder(seq_seq))
99
+ else:
100
+ self.temp_seq_vector = self.norm(self.side_encoder(seq_seq)[0][:, -1, :])
101
+ self.temp_vector = self.norm(self.q_encoder(seq))
102
+
103
+ def forward(self, x, labels=None, epoch=0):
104
+ seq2 = x
105
+
106
+ if self.side_enc:
107
+ seq2_seq = seq2[1]
108
+ seq2 = seq2[0]
109
+
110
+ batch_size = seq2.shape[0]
111
+
112
+ fusion = []
113
+
114
+ # 获取两个序列的编码结果
115
+ fusion.append(self.temp_vector.expand(batch_size, -1))
116
+ fusion.append(self.norm(self.q_encoder_2(seq2)))
117
+ if self.side_enc:
118
+ fusion.append(self.temp_seq_vector.expand(batch_size, -1))
119
+ if self.side_encoder.__class__.__name__ == 'MambaModel':
120
+ fusion.append(self.norm(self.side_encoder_2(seq2_seq)))
121
+ else:
122
+ fusion.append(self.norm(self.side_encoder_2(seq2_seq)[0][:, -1, :]))
123
+
124
+ # 根据 fusion_method 决定融合方式
125
+ if self.fusion_method == 'mlp':
126
+ # 维持原有行为:拼接两个向量
127
+ fusion = torch.cat(fusion, dim=-1)
128
+ elif self.fusion_method == 'diff':
129
+ if not self.side_enc:
130
+ fusion = torch.cat([fusion[1] - fusion[0]] + fusion[2:], dim=-1)
131
+ else:
132
+ fusion = torch.cat([fusion[1] - fusion[0], fusion[3] - fusion[2]] + fusion[4:], dim=-1)
133
+ elif self.fusion_method == 'att':
134
+ # 使用 attention 融合:
135
+ # 先将两个向量堆叠成“tokens”,形状:(batch, 2, embed_dim)
136
+ tokens = torch.stack(fusion, dim=1) # embed_dim 应该为 final_dim//2
137
+ # 利用 MultiheadAttention 进行自注意力计算
138
+ # 注意:因为采用 batch_first=True,所以输入形状为 (batch, seq_len, embed_dim)
139
+ attn_output, _ = self.attn(tokens, tokens, tokens)
140
+ # 将 attention 输出展平,得到形状 (batch, 2 * embed_dim),即 (batch, final_dim)
141
+ fusion = attn_output.reshape(attn_output.size(0), -1)
142
+ else:
143
+ raise ValueError("Invalid fusion method: choose either 'mse' or 'att'.")
144
+
145
+ # 如果启用 DIR 模块,保留传入 FDS 前的特征表示
146
+ if self.DIR:
147
+ features = fusion
148
+ fusion = self.FDS.smooth(fusion, labels, epoch)
149
+
150
+ pred = self.fc(fusion)
151
+
152
+ if self.DIR:
153
+ return pred, features
154
+ else:
155
+ return pred
156
+
157
+
158
+ class CustomDataset(PeptidePairPicCaseDataset):
159
+ def __getitem__(self, idx):
160
+ variant = self.variants[idx]
161
+ seq2, label = variant, variant
162
+ img2 = self.read_img(variant)
163
+
164
+ if self.side_enc:
165
+ img2 = (img2, encode_sequence(seq2, self.pad_length))
166
+
167
+ return img2, label
168
+
169
+
170
+ def load_model(args, weight_path, device, temp_batch):
171
+ model = FasterModelForCase(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese).to(device).eval()
172
+ model.load_state_dict(torch.load(weight_path, map_location=device), strict=False)
173
+ model.cache_temp_vector(move_to_device(temp_batch, device))
174
+ model.compile()
175
+ return model
176
+
177
+
178
+ def main():
179
+ set_seed(args.seed)
180
+ if args.task == 'reg':
181
+ args.classes = 1
182
+ elif args.task == 'cls':
183
+ args.classes = 2
184
+ else:
185
+ raise NotImplementedError("unimplemented task")
186
+ weight_dir = f'./run-{args.task}/{args.q_encoder}{f"-non-siamese" if args.non_siamese else ""}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}/uda_{args.case}'
187
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
188
+ print(weight_dir)
189
+ print(device)
190
+
191
+ test_set = CustomDataset(case=args.case, pad_length=args.max_length, side_enc=args.side_enc, pcs=True, resize=args.resize, gf=args.glob_feat)
192
+ test_loader = DataLoader(test_set, batch_size=192, shuffle=False, num_workers=16, pin_memory=True)
193
+ # test_loader = DataLoader(test_set, batch_size=192, shuffle=False, num_workers=8)
194
+
195
+ temp_batch = test_set.template_pic.unsqueeze(0)
196
+ if args.side_enc:
197
+ temp_batch = (temp_batch, test_set.template_seq.unsqueeze(0))
198
+
199
+ models = [load_model(args, f'{weight_dir}/model_uda_{role}{f"_{args.use_variant}" if args.use_variant else ""}.pth', device, temp_batch) for role in ('teacher',)]
200
+ # models = [load_model(args, f'{weight_dir}/model_{i}{"_ft" if args.use_ft else ""}.pth', device, temp_batch) for i in [0]]
201
+
202
+ all_seqs = []
203
+ logits_batches = [] # 存放每个 batch 的 [m,B,2] avg_logits (CPU 上)
204
+
205
+ start_time = time.time()
206
+
207
+ with torch.no_grad():#, torch.autocast(device_type=device.type):
208
+ for x, gt in test_loader:
209
+ # x: [B, ...] on CPU pin memory,gt: tuple of B strings
210
+ x = move_to_device(x, device, non_blocking=True)
211
+ # x = move_to_device(x, device)
212
+
213
+ # 1) 记录 5 个模型的 logits
214
+ # logits: [m,B,2]
215
+ logits = torch.zeros(len(models), len(gt), args.classes, device=device)
216
+ for i, m in enumerate(models):
217
+ logits[i] = m(x)
218
+ # avg_logits = sum_logits.div_(len(models))
219
+
220
+ # 3) 立刻搬到 CPU(pin_memory 下可以 non_blocking)
221
+ logits_batches.append(logits.cpu())
222
+ all_seqs.extend(gt)
223
+
224
+ # 拼接成 [n,2],n = sum(batch_size)
225
+ all_logits = torch.cat(logits_batches, dim=1) # [m,n,2]
226
+
227
+ if args.task == 'reg':
228
+ preds = all_logits.mean(0).squeeze().tolist()
229
+ elif args.task == 'cls':
230
+ # 最后一次性 softmax,取正类概率
231
+ preds = torch.softmax(all_logits, dim=-1)[:, :, 1].mean(0).tolist()
232
+
233
+ consumed_time = time.time() - start_time
234
+ print(f'total consumed time: {consumed_time} s')
235
+ print(f'time per sample: {consumed_time / len(test_set)} s')
236
+
237
+ # 保存到 CSV
238
+ df = pd.DataFrame({
239
+ "seq": all_seqs,
240
+ "pred": preds,
241
+ })
242
+
243
+ df.to_csv(f'{weight_dir}/preds_case{f"_{args.use_variant}" if args.use_variant else ""}.csv', index=False)
244
+
245
+
246
+ if __name__ == '__main__':
247
+ main()
infer_case_unoptimized.py ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import time
3
+ from dataset import PeptidePairPicCaseDataset, encode_sequence
4
+ from network import DMutaPeptideCNN
5
+ from train import move_to_device
6
+ import torch
7
+ import torch.nn as nn
8
+ from torch.utils.data import DataLoader
9
+ import numpy as np
10
+ from utils import set_seed
11
+ import pandas as pd
12
+
13
+ parser = argparse.ArgumentParser(description='resnet26')
14
+ # model setting
15
+ parser.add_argument('--model', type=str, default='resnet34',
16
+ help='resnet34 resnet50 densenet')
17
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='cnn',
18
+ help='lstm mamba mla')
19
+ parser.add_argument('--channels', type=int, default=16)
20
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default='lstm',
21
+ help="use side features")
22
+ parser.add_argument('--fusion', type=str, default='att',
23
+ help='mlp att')
24
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
25
+ help="use global features")
26
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
27
+ help="use non-siamese architecture")
28
+
29
+ # task & dataset setting
30
+ parser.add_argument('--task', type=str, default='cls',
31
+ help='reg or cls')
32
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=False,
33
+ help='use one-way constructed dataset')
34
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
35
+ help='Max length for sequence filtering')
36
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
37
+ help='resize the image')
38
+ parser.add_argument('--split', type=int, default=5,
39
+ help="Split k fold in cross validation (default: 5)")
40
+ parser.add_argument('--seed', type=int, default=1,
41
+ help="Seed for model initialization (default: 1)")
42
+ parser.add_argument('--pcs', action='store_true', default=False,
43
+ help='Consider protease cut site')
44
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
45
+ help='Consider protease cut site')
46
+
47
+ # training setting
48
+ parser.add_argument('--gpu', type=int, default=0,
49
+ help='GPU index to use, -1 for CPU (default: 0)')
50
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
51
+ help='input batch size for training (default: 128)')
52
+ parser.add_argument('--epochs', type=int, default=50,
53
+ help='number of epochs to train (default: 100)')
54
+ parser.add_argument('--lr', type=float, default=0.001,
55
+ help='learning rate (default: 0.001)')
56
+ parser.add_argument('--decay', type=float, default=0.0005,
57
+ help='weight decay (default: 0.0005)')
58
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
59
+ help='path of the pretrain model')
60
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
61
+ help='metric average type')
62
+
63
+ parser.add_argument('--loss', type=str, default='ce',
64
+ help='loss function')
65
+ parser.add_argument('--dir', action='store_true', default=False,
66
+ help='use DIR')
67
+
68
+ parser.add_argument('--simple', dest='simple', action='store_true', default=False)
69
+ parser.add_argument('--llm-data', dest='llm_data', action='store_true', default=False)
70
+
71
+ # Case Study Specific
72
+ parser.add_argument('--case', type=str, default='r2',
73
+ help='case to infer')
74
+ parser.add_argument('--use-ft', dest='use_ft', action='store_true', default=False)
75
+
76
+ args = parser.parse_args()
77
+
78
+ if args.llm_data:
79
+ args.simple = True
80
+
81
+ if args.simple:
82
+ args.one_way = True
83
+
84
+ if args.mix_pcs:
85
+ args.pcs = 'mix'
86
+
87
+ if args.gpu != -1:
88
+ torch.backends.cudnn.benchmark = True
89
+ torch.set_float32_matmul_precision('high')
90
+
91
+
92
+ def load_model(args, weight_path, device):
93
+ model = DMutaPeptideCNN(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese).to(device).eval()
94
+ model.load_state_dict(torch.load(weight_path, map_location=device), strict=False)
95
+ model.compile()
96
+ return model
97
+
98
+
99
+ def main():
100
+ set_seed(args.seed)
101
+ if args.task == 'reg':
102
+ args.classes = 1
103
+ elif args.task == 'cls':
104
+ args.classes = 2
105
+ else:
106
+ raise NotImplementedError("unimplemented task")
107
+ weight_dir = f'./run-{args.task}/{args.q_encoder}{f"-non-siamese" if args.non_siamese else ""}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
108
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
109
+ print(weight_dir)
110
+ print(device)
111
+
112
+ test_set = PeptidePairPicCaseDataset(case=args.case, pad_length=args.max_length, side_enc=args.side_enc, pcs=True, resize=args.resize, gf=args.glob_feat)
113
+ test_loader = DataLoader(test_set, batch_size=128, shuffle=False, num_workers=16, pin_memory=True)
114
+ # test_loader = DataLoader(test_set, batch_size=192, shuffle=False, num_workers=8)
115
+
116
+ models = [load_model(args, f'{weight_dir}/model_{i}{"_ft" if args.use_ft else ""}.pth', device) for i in range(args.split)]
117
+
118
+ all_seqs = []
119
+ logits_batches = [] # 存放每个 batch 的 [m,B,2] avg_logits (CPU 上)
120
+
121
+ start_time = time.time()
122
+
123
+ with torch.no_grad():
124
+ for x, gt in test_loader:
125
+ # x: [B, ...] on CPU pin memory,gt: tuple of B strings
126
+ x = move_to_device(x, device, non_blocking=True)
127
+ # x = move_to_device(x, device)
128
+
129
+ # 1) 记录 5 个模型的 logits
130
+ # logits: [m,B,2]
131
+ logits = torch.zeros(len(models), len(gt), args.classes, device=device)
132
+ for i, m in enumerate(models):
133
+ logits[i] = m(x)
134
+
135
+ # 3) 立刻搬到 CPU(pin_memory 下可以 non_blocking)
136
+ logits_batches.append(logits.cpu())
137
+ all_seqs.extend(gt)
138
+
139
+ # 拼接成 [n,2],n = sum(batch_size)
140
+ all_logits = torch.cat(logits_batches, dim=1) # [m,n,2]
141
+
142
+ if args.task == 'reg':
143
+ preds = all_logits.mean(0).squeeze().tolist()
144
+ elif args.task == 'cls':
145
+ # 最后一次性 softmax,取正类概率
146
+ preds = torch.softmax(all_logits, dim=-1)[:, :, 1].mean(0).tolist()
147
+
148
+ consumed_time = time.time() - start_time
149
+ print(f'total consumed time: {consumed_time} s')
150
+ print(f'time per sample: {consumed_time / len(test_set)} s')
151
+
152
+ # 保存到 CSV
153
+ df = pd.DataFrame({
154
+ "seq": all_seqs,
155
+ "pred": preds,
156
+ })
157
+
158
+ df.to_csv(f'{weight_dir}/preds_case.csv', index=False)
159
+
160
+
161
+
162
+
163
+ if __name__ == '__main__':
164
+ main()
infer_cf.py ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ from dataset import PeptidePairDataset, PeptidePairPicDataset
3
+ from network import DMutaPeptide, DMutaPeptideCNN
4
+ from train import move_to_device
5
+ import torch
6
+ import torch.nn as nn
7
+ from torch.utils.data import DataLoader
8
+ import numpy as np
9
+ from utils import set_seed
10
+ import pandas as pd
11
+ from torchmetrics import MeanAbsoluteError, RelativeSquaredError, PearsonCorrCoef, KendallRankCorrCoef, F1Score, Accuracy, AveragePrecision, AUROC
12
+
13
+ parser = argparse.ArgumentParser(description='resnet26')
14
+ # model setting
15
+ parser.add_argument('--model', type=str, default='resnet34',
16
+ help='resnet34 resnet50 densenet')
17
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='cnn',
18
+ help='lstm mamba mla')
19
+ parser.add_argument('--channels', type=int, default=16)
20
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default=None,
21
+ help="use side features")
22
+ parser.add_argument('--fusion', type=str, default='diff',
23
+ help='mlp att')
24
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
25
+ help="use global features")
26
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
27
+ help="use non-siamese architecture")
28
+
29
+ # task & dataset setting
30
+ parser.add_argument('--task', type=str, default='cls',
31
+ help='reg or cls')
32
+ parser.add_argument('--pdb-src', type=str, dest='pdb_src', default='af',
33
+ help='af or hf')
34
+ parser.add_argument('--data-ver', type=str, dest='data_ver', default='250228',
35
+ help='data version')
36
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=False,
37
+ help='use one-way constructed dataset')
38
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
39
+ help='Max length for sequence filtering')
40
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
41
+ help='resize the image')
42
+ parser.add_argument('--split', type=int, default=5,
43
+ help="Split k fold in cross validation (default: 5)")
44
+ parser.add_argument('--seed', type=int, default=1,
45
+ help="Seed (default: 1)")
46
+ parser.add_argument('--pcs', action='store_true', default=False,
47
+ help='Consider protease cut site')
48
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
49
+ help='Consider protease cut site')
50
+
51
+ # training setting
52
+ parser.add_argument('--gpu', type=int, default=0,
53
+ help='GPU index to use, -1 for CPU (default: 0)')
54
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
55
+ help='input batch size for training (default: 128)')
56
+ parser.add_argument('--epochs', type=int, default=50,
57
+ help='number of epochs to train (default: 100)')
58
+ parser.add_argument('--lr', type=float, default=0.001,
59
+ help='learning rate (default: 0.001)')
60
+ parser.add_argument('--decay', type=float, default=0.0005,
61
+ help='weight decay (default: 0.0005)')
62
+ parser.add_argument('--warm-steps', type=int, dest='warm_steps', default=0,
63
+ help='number of warm start steps for learning rate (default: 10)')
64
+ parser.add_argument('--patience', type=int, default=10,
65
+ help='patience for early stopping (default: 10)')
66
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
67
+ help='path of the pretrain model') # /home/duadua/Desktop/fetal/3dpretrain/runs/e50.pth
68
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
69
+ help='metric average type')
70
+
71
+ parser.add_argument('--loss', type=str, default='ce',
72
+ help='loss function')
73
+ parser.add_argument('--dir', action='store_true', default=False,
74
+ help='use DIR')
75
+
76
+ parser.add_argument('--simple', dest='simple', action='store_true', default=False)
77
+ parser.add_argument('--llm-data', dest='llm_data', action='store_true', default=False)
78
+ parser.add_argument('--uda', type=str, default=None)
79
+
80
+ args = parser.parse_args()
81
+
82
+ if args.llm_data:
83
+ args.simple = True
84
+
85
+ if args.simple:
86
+ args.one_way = True
87
+
88
+ if args.mix_pcs:
89
+ args.pcs = 'mix'
90
+
91
+ if args.q_encoder in ['cnn', 'rn18']:
92
+ weight_dir = f'./run-{args.task}/{f"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
93
+ else:
94
+ weight_dir = f'./run-{args.task}/{f"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{"-simple" if args.simple else ""}{"-llm" if args.llm_data else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
95
+
96
+ if args.uda:
97
+ weight_dir += f'/uda_{args.uda}'
98
+
99
+ print(weight_dir)
100
+
101
+ def metrics(preds, gt, task):
102
+ avg = 'marco'
103
+ device = preds.device
104
+ if task == 'cls':
105
+ metric_1 = AveragePrecision(average=avg, task='binary').to(device)
106
+ metric_2 = AUROC(average=avg, task='binary').to(device)
107
+ metric_3 = F1Score(average=avg, task='binary').to(device)
108
+ metric_4 = Accuracy(average=avg, task='binary').to(device)
109
+ all_metrics = [metric_1(preds, gt).item(),
110
+ metric_2(preds, gt).item(),
111
+ metric_3(preds, gt).item(),
112
+ metric_4(preds, gt).item()]
113
+
114
+ elif task == 'reg':
115
+ metric_1 = MeanAbsoluteError().to(device)
116
+ metric_2 = RelativeSquaredError(num_outputs=1).to(device)
117
+ metric_3 = PearsonCorrCoef(num_outputs=1).to(device)
118
+ metric_4 = KendallRankCorrCoef(num_outputs=1).to(device)
119
+ all_metrics = [metric_1(preds, gt).item(),
120
+ metric_2(preds, gt).item(),
121
+ metric_3(preds.squeeze(), gt.squeeze()).mean().item(),
122
+ metric_4(preds.squeeze(), gt.squeeze()).mean().item()]
123
+
124
+ return [f'{i * 100:.2f}' for i in all_metrics]
125
+
126
+
127
+ def main(dataset):
128
+ set_seed(args.seed)
129
+ if args.task == 'reg':
130
+ args.classes = 1
131
+ elif args.task == 'cls':
132
+ args.classes = 2
133
+ else:
134
+ raise NotImplementedError("unimplemented task")
135
+
136
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
137
+
138
+ if args.q_encoder in ['cnn', 'rn18']:
139
+ model = DMutaPeptideCNN(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese).to(device).eval()
140
+ test_set = PeptidePairPicDataset(mode=dataset, pad_length=args.max_length, task=args.task, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
141
+ else:
142
+ model = DMutaPeptide(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, fusion=args.fusion, non_siamese=args.non_siamese).to(device).eval()
143
+ test_set = PeptidePairDataset(mode=dataset, pad_length=args.max_length, task=args.task, gf=args.glob_feat)
144
+
145
+ test_loader = DataLoader(test_set, batch_size=args.batch_size, shuffle=False)
146
+
147
+ raw_preds = []
148
+ ckpt_names = ['model_uda_teacher'] if args.uda else [f'model_{i}_test' for i in range(5)]
149
+ for i in ckpt_names:
150
+ model.load_state_dict(torch.load(f'{weight_dir}/{i}.pth', map_location=device))
151
+ preds = []
152
+ gt_list_valid = []
153
+ with torch.no_grad():
154
+ for data in test_loader:
155
+ x, gt = data
156
+ gt_list_valid.append(gt.to(device))
157
+ out = model(move_to_device(x, device))
158
+ if args.dir:
159
+ out, _ = out
160
+ preds.append(out)
161
+ r_pred = torch.cat(preds, dim=0)
162
+ if args.task == 'reg':
163
+ preds = r_pred.cpu().numpy()
164
+ elif args.task == 'cls':
165
+ preds = torch.softmax(r_pred, dim=-1)[:, 1].cpu().numpy()
166
+ gt_tensor = torch.cat(gt_list_valid, dim=0)
167
+ gt_list_valid = gt_tensor.cpu().numpy()
168
+ raw_preds.append(r_pred)
169
+ if args.task == 'cls':
170
+ preds_tensor = torch.softmax(torch.stack(raw_preds, 0), dim=-1)[:, :, 1]
171
+ elif args.task == 'reg':
172
+ preds_tensor = torch.stack(raw_preds, 0)
173
+
174
+ return [metrics(preds_tensor[i], gt_tensor, args.task) for i in range(len(ckpt_names))]
175
+
176
+
177
+ if __name__ == '__main__':
178
+ if args.task == 'cls':
179
+ # df = pd.DataFrame(columns=['dataset', 'AUPRC', 'AUROC', 'F1', 'ACC'])
180
+ print(','.join(['AUPRC', 'AUROC', 'F1', 'ACC']))
181
+ elif args.task == 'reg':
182
+ # df = pd.DataFrame(columns=['dataset', 'MAE', 'RSE', 'PCC', 'KCC'])
183
+ print(','.join(['MAE', 'RSE', 'PCC', 'KCC']))
184
+
185
+ results = main('r2_case')
186
+ for result in results:
187
+ print(','.join(result))
inferthro.sh ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # !/bin/bash
2
+ python infer.py --task cls --loss ce --q-encoder lstm --channels 256 --fusion diff
3
+ python infer.py --task cls --loss ce --q-encoder mamba --channels 256 --fusion diff
4
+ python infer.py --task cls --loss ce --q-encoder mha --channels 256 --fusion diff
5
+ python infer.py --task cls --loss ce --q-encoder gru --channels 256 --fusion diff
6
+ python infer.py --task cls --loss ce --q-encoder rn18 --channels 16 --fusion diff --pcs --side-enc mamba
7
+ python infer.py --task cls --loss ce --q-encoder rn18 --channels 16 --fusion diff --pcs --side-enc mamba --uda r2
8
+ python infer.py --task reg --loss mse --q-encoder lstm --channels 256 --fusion diff
9
+ python infer.py --task reg --loss mse --q-encoder mamba --channels 256 --fusion diff
10
+ python infer.py --task reg --loss mse --q-encoder mha --channels 256 --fusion diff
11
+ python infer.py --task reg --loss mse --q-encoder gru --channels 256 --fusion diff
12
+ python infer.py --task reg --loss mse --q-encoder rn18 --channels 16 --fusion diff --pcs --side-enc mamba
13
+ python infer.py --task reg --loss mse --q-encoder rn18 --channels 16 --fusion diff --pcs --side-enc mamba --uda r2
loss.py ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from torch import nn
3
+ from torch.nn.modules.loss import _Loss
4
+ import torch.nn.functional as F
5
+ from math import cos, pi, sin
6
+ import math
7
+ import numpy as np
8
+ from scipy.special import lambertw
9
+
10
+
11
+
12
+ def mixup_criterion(criterion, pred, y_a, y_b, lam, pow=2):
13
+ y = lam ** pow * y_a + (1 - lam) ** pow * y_b
14
+ return criterion(pred, y)
15
+
16
+
17
+ def mixup_data(v, q, a):
18
+ '''Returns mixed inputs, pairs of targets, and lambda without organ constraint'''
19
+ lam = np.random.beta(1, 1)
20
+
21
+ batch_size = v.shape[0]
22
+ index = torch.randperm(batch_size)
23
+
24
+ mixed_v = lam * v + (1 - lam) * v[index, :]
25
+ mixed_q = lam * q + (1 - lam) * q[index, :]
26
+
27
+ a_1, a_2 = a, a[index]
28
+ return mixed_v, mixed_q, a_1, a_2, lam
29
+
30
+
31
+ def linear(epoch, nepoch):
32
+ return 1 - epoch / nepoch
33
+
34
+
35
+ def convex(epoch, nepoch):
36
+ return epoch / (2 - nepoch)
37
+
38
+
39
+ def concave(epoch, nepoch):
40
+ return 1 - sin((epoch / nepoch) * (pi / 2))
41
+
42
+
43
+ def composite(epoch, nepoch):
44
+ return 0.5 * cos((epoch / nepoch) * pi) + 0.5
45
+
46
+
47
+ class LogCoshLoss(nn.Module):
48
+ def __init__(self):
49
+ super().__init__()
50
+
51
+ def forward(self, y_t, y_prime_t):
52
+ ey_t = y_t - y_prime_t
53
+ return torch.mean(torch.log(torch.cosh(ey_t + 1e-12)))+F.mse_loss(y_t, y_prime_t)
54
+
55
+
56
+ class WeightedMSELoss(nn.Module):
57
+ def __init__(self):
58
+ super().__init__()
59
+
60
+ def forward(self, y, y_t, weights=None):
61
+ loss = (y - y_t) ** 2
62
+ if weights is not None:
63
+ loss *= weights.expand_as(loss)
64
+ return torch.mean(loss)
65
+
66
+
67
+ class MLCE(nn.Module):
68
+ def __init__(self):
69
+ super(MLCE, self).__init__()
70
+
71
+ def _mlcce(self, y_pred, y_true):
72
+ y_pred = (1 - 2 * y_true) * y_pred
73
+ y_pred_neg = y_pred - y_true * 1e12
74
+ y_pred_pos = y_pred - (1 - y_true) * 1e12
75
+ zeros = torch.zeros_like(y_pred[..., :1])
76
+ y_pred_neg = torch.cat([y_pred_neg, zeros], dim=-1)
77
+ y_pred_pos = torch.cat([y_pred_pos, zeros], dim=-1)
78
+ neg_loss = torch.logsumexp(y_pred_neg, dim=-1)
79
+ pos_loss = torch.logsumexp(y_pred_pos, dim=-1)
80
+ loss = torch.mean(neg_loss + pos_loss)
81
+ return loss
82
+
83
+ def __call__(self, y_pred, y_true):
84
+ return self._mlcce(y_pred, y_true)
85
+
86
+
87
+ class SuperLoss(nn.Module):
88
+ def __init__(self, C=10, lam=1, batch_size=256):
89
+ super(SuperLoss, self).__init__()
90
+ self.tau = math.log(C)
91
+ self.lam = lam # set to 1 for CIFAR10 and 0.25 for CIFAR100
92
+ self.batch_size = batch_size
93
+
94
+ def forward(self, logits, targets):
95
+ l_i = F.mse_loss(logits, targets, reduction='none').detach()
96
+ sigma = self.sigma(l_i)
97
+ loss = (F.mse_loss(logits, targets, reduction='none') - self.tau) * sigma + self.lam * (
98
+ torch.log(sigma) ** 2)
99
+ loss = loss.sum() / self.batch_size
100
+ return loss
101
+
102
+ def sigma(self, l_i):
103
+ x = torch.ones_like(l_i) * (-2 / math.exp(1.))
104
+ y = 0.5 * torch.max(x, (l_i - self.tau) / self.lam)
105
+ y = y.cpu().numpy()
106
+ sigma = np.exp(-lambertw(y))
107
+ sigma = sigma.real.astype(np.float32)
108
+ sigma = torch.from_numpy(sigma).to(l_i.device)
109
+ return sigma
110
+
111
+
112
+ def unbiased_curriculum_loss(out, data, args, epoch, epochs, scheduler='linear'):
113
+ losses = []
114
+ scheduler = linear if scheduler == 'linear' else concave
115
+
116
+ # calculate difficulty measurement function
117
+ adjusted_losses = []
118
+ for idx in range(out.shape[0]):
119
+ ground_truth = max(1, abs(data[idx].item()))
120
+ loss = F.mse_loss(out[idx], data[idx])
121
+ losses.append(loss)
122
+ adjusted_losses.append(loss.item() / ground_truth)
123
+
124
+ mean_loss, std_loss = np.mean(adjusted_losses), np.std(adjusted_losses)
125
+
126
+ # re-weight losses
127
+ total_loss = 0
128
+ for i, loss in enumerate(losses):
129
+ if adjusted_losses[i] > mean_loss + 1 * std_loss:
130
+ schedule_factor = scheduler(epoch, args.epochs)
131
+ total_loss += schedule_factor * loss
132
+ else:
133
+ total_loss += loss
134
+
135
+ return total_loss
136
+
137
+
138
+ class BMCLoss(_Loss):
139
+ def __init__(self, init_noise_sigma=1.0):
140
+ super(BMCLoss, self).__init__()
141
+ self.noise_sigma = torch.nn.Parameter(torch.tensor(init_noise_sigma))
142
+
143
+ def bmc_loss(self, pred, target, noise_var):
144
+ """Compute the Balanced MSE Loss (BMC) between `pred` and the ground truth `targets`.
145
+ Args:
146
+ pred: A float tensor of size [batch, 1].
147
+ target: A float tensor of size [batch, 1].
148
+ noise_var: A float number or tensor.
149
+ Returns:
150
+ loss: A float tensor. Balanced MSE Loss.
151
+ """
152
+ if len(pred.shape) == 1:
153
+ pred = pred.unsqueeze(1)
154
+ if len(target.shape) == 1:
155
+ target = target.unsqueeze(1)
156
+ logits = - (pred - target.T).pow(2) / (2 * noise_var) # logit size: [batch, batch]
157
+ loss = F.cross_entropy(logits, torch.arange(pred.shape[0], device=pred.device)) # contrastive-like loss
158
+ loss = loss * (2 * noise_var).detach() # optional: restore the loss scale, 'detach' when noise is learnable
159
+
160
+ return loss
161
+
162
+ def forward(self, pred, target):
163
+ noise_var = self.noise_sigma ** 2
164
+ return self.bmc_loss(pred, target, noise_var)
main.py ADDED
@@ -0,0 +1,245 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ import logging
4
+ import os
5
+ import time
6
+
7
+ from dataset import PeptidePairDataset, PeptidePairPicDataset
8
+ from network import DMutaPeptide, DMutaPeptideCNN
9
+ from sklearn.model_selection import KFold
10
+ from train import train, train_cls
11
+ import torch
12
+ import torch.nn as nn
13
+ from torch.utils.data import DataLoader, Subset
14
+ import numpy as np
15
+ from loss import MLCE, SuperLoss, LogCoshLoss, BMCLoss
16
+ from utils import set_seed
17
+
18
+
19
+ parser = argparse.ArgumentParser(description='resnet26')
20
+ # model setting
21
+ parser.add_argument('--model', type=str, default='resnet34',
22
+ help='resnet34 resnet50 densenet')
23
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='lstm',
24
+ help='lstm mamba mla')
25
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default=None,
26
+ help="use side features")
27
+ parser.add_argument('--channels', type=int, default=256)
28
+ parser.add_argument('--fusion', type=str, default='att',
29
+ help='mlp att diff')
30
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
31
+ help="use global features")
32
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
33
+ help="use non-siamese architecture")
34
+
35
+ # task & dataset setting
36
+ parser.add_argument('--task', type=str, default='reg',
37
+ help='reg or cls')
38
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=False,
39
+ help='use one-way constructed dataset')
40
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
41
+ help='Max length for sequence filtering')
42
+ parser.add_argument('--split', type=int, default=5,
43
+ help="Split k fold in cross validation (default: 5)")
44
+ parser.add_argument('--seed', type=int, default=42,
45
+ help="Seed (default: 1)")
46
+ parser.add_argument('--pcs', action='store_true', default=False,
47
+ help='Consider protease cleavage site')
48
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
49
+ help='Consider protease cleavage site')
50
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
51
+ help='resize the image')
52
+ # parser.add_argument('--llm-data', action='store_true', default=False,
53
+ # help='Use LLM augmentation data')
54
+
55
+ # training setting
56
+ parser.add_argument('--gpu', type=int, default=0,
57
+ help='GPU index to use, -1 for CPU (default: 0)')
58
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
59
+ help='input batch size for training (default: 128)')
60
+ parser.add_argument('--epochs', type=int, default=50,
61
+ help='number of epochs to train (default: 100)')
62
+ parser.add_argument('--lr', type=float, default=0.001,
63
+ help='learning rate (default: 0.001)')
64
+ parser.add_argument('--decay', type=float, default=0.0005,
65
+ help='weight decay (default: 0.0005)')
66
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
67
+ help='path of the pretrain model')
68
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
69
+ help='metric average type')
70
+
71
+ parser.add_argument('--loss', type=str, default='mse',
72
+ help='loss function')
73
+ parser.add_argument('--dir', action='store_true', default=False,
74
+ help='use DIR')
75
+
76
+ args = parser.parse_args()
77
+
78
+ if args.mix_pcs:
79
+ args.pcs = 'mix'
80
+
81
+
82
+ def main():
83
+ set_seed(args.seed)
84
+ if args.task == 'reg':
85
+ args.classes = 1
86
+ trainer = train
87
+ if args.loss == "mse" or args.loss in ['ce']:
88
+ args.loss = 'mse'
89
+ criterion = nn.MSELoss()
90
+ elif args.loss == "smoothl1":
91
+ criterion = nn.SmoothL1Loss()
92
+ elif args.loss == "super":
93
+ criterion = SuperLoss()
94
+ elif args.loss in ["bmc", "bmc_ln"]:
95
+ criterion = BMCLoss()
96
+ else:
97
+ raise NotImplementedError("unimplemented regression task loss function")
98
+ elif args.task == 'cls':
99
+ trainer = train_cls
100
+ args.classes = 2
101
+ if args.loss == 'ce' or args.loss in ['mse', 'smoothl1', 'super']:
102
+ args.loss = 'ce'
103
+ criterion = nn.CrossEntropyLoss()
104
+ else:
105
+ raise NotImplementedError("unimplemented classification task loss function")
106
+ else:
107
+ raise NotImplementedError("unimplemented task")
108
+
109
+ if args.q_encoder in ['cnn', 'rn18']:
110
+ weight_dir = f'./run-{args.task}/{"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
111
+ else:
112
+ weight_dir = f'./run-{args.task}/{"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
113
+
114
+ if not os.path.exists(weight_dir):
115
+ os.makedirs(weight_dir)
116
+
117
+ logging.basicConfig(handlers=[
118
+ logging.FileHandler(filename=os.path.join(weight_dir, "training.log"), encoding='utf-8', mode='w+'),
119
+ logging.StreamHandler()],
120
+ format="%(asctime)s: %(message)s", datefmt="%F %T", level=logging.INFO)
121
+
122
+ logging.info(f'saving_dir: {weight_dir}')
123
+
124
+ with open(os.path.join(weight_dir, "config.json"), "w") as f:
125
+ f.write(json.dumps(vars(args)))
126
+
127
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
128
+
129
+ if args.q_encoder in ['cnn', 'rn18']:
130
+ logging.info('Loading Training Dataset')
131
+ all_set = PeptidePairPicDataset(mode='train', pad_length=args.max_length, task=args.task, one_way=args.one_way, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
132
+ logging.info('Loading Test Dataset')
133
+ test_set = PeptidePairPicDataset(mode='test', pad_length=args.max_length, task=args.task, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
134
+ else:
135
+ logging.info('Loading Train Dataset')
136
+ all_set = PeptidePairDataset(mode='train', pad_length=args.max_length, task=args.task, one_way=args.one_way, gf=args.glob_feat)
137
+ logging.info('Loading Test Dataset')
138
+ test_set = PeptidePairDataset(mode='test', pad_length=args.max_length, task=args.task, gf=args.glob_feat)
139
+
140
+ test_loader = DataLoader(test_set, batch_size=args.batch_size, shuffle=False, num_workers=8, pin_memory=True)
141
+
142
+ best_perform_list = [[] for i in range(5)]
143
+ test_perform_list = [[] for i in range(5)]
144
+
145
+ kf = KFold(n_splits=5, shuffle=True, random_state=42)
146
+
147
+ for fold, (train_idx, val_idx) in enumerate(kf.split(all_set)):
148
+ train_set= Subset(all_set, train_idx)
149
+ valid_set = Subset(all_set, val_idx)
150
+
151
+ train_loader = DataLoader(train_set, batch_size=args.batch_size, shuffle=True, drop_last=True, num_workers=8, pin_memory=True)
152
+ valid_loader = DataLoader(valid_set, batch_size=args.batch_size, shuffle=False, num_workers=8, pin_memory=True)
153
+
154
+ if args.q_encoder in ['cnn', 'rn18']:
155
+ model = DMutaPeptideCNN(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese)
156
+ else:
157
+ model = DMutaPeptide(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, fusion=args.fusion, non_siamese=args.non_siamese)
158
+ if len(args.pretrain) != 0: #TODO: load pretrain
159
+ pass
160
+ model.to(device)
161
+ # model.compile()
162
+
163
+ optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.decay)
164
+ # optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.decay)
165
+
166
+ # scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10], gamma=0.5)
167
+ if args.q_encoder == 'cnn':
168
+ scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
169
+ else:
170
+ scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)
171
+
172
+ if args.loss == 'bmc_ln':
173
+ optimizer.add_param_group({'params': criterion.noise_sigma, 'lr': args.lr, 'name': 'noise_sigma'})
174
+ weights_path = f"{weight_dir}/model_{fold}.pth"
175
+ # early_stopping = EarlyStopping(patience=args.patience, path=weights_path)
176
+ logging.info(f'Running Cross Validation {fold}')
177
+ logging.info(f'Fold {fold} Train set:{len(train_set)}, Valid set:{len(valid_set)}, Test set: {len(test_set)}')
178
+ best_metric = -float('inf')
179
+ best_test = -float('inf')
180
+ start_time = time.time()
181
+ if args.task == 'reg':
182
+ for epoch in range(1, args.epochs + 1):
183
+ train_loss, mae, rse, pcc, kcc = trainer(args, epoch, model, train_loader, valid_loader, device, criterion, optimizer)
184
+ logging.info(f'Epoch: {epoch:03d} Train Loss: {train_loss:.3f}, mae: {mae:.3f}, rse: {rse:.3f}, pcc: {pcc:.3f}, kcc: {kcc:.3f}')
185
+ scheduler.step()
186
+ avg_metric = (pcc + kcc) - (mae + rse)
187
+ if avg_metric > best_metric:
188
+ logging.info(f'Epoch: {epoch:03d} New best VALIDATION metrics')
189
+ torch.save(model.state_dict(), weights_path)
190
+ best_metric = avg_metric
191
+ best_perform_list[fold] = np.asarray([mae, rse, pcc, kcc])
192
+
193
+ _, test_mae, test_rse, test_pcc, test_kcc = trainer(args, epoch, model, None, test_loader, device, None, None)
194
+ logging.info(f'Epoch: {epoch:03d} Test results, ap: mae: {test_mae:.3f}, rse: {test_rse:.3f}, pcc: {test_pcc:.3f}, kcc: {test_kcc:.3f}')
195
+ test_metric = (test_pcc + test_kcc) - (test_mae + test_rse)
196
+ if test_metric > best_test and epoch > 10:
197
+ logging.info(f'Epoch: {epoch:03d} New best TEST metrics')
198
+ best_test = test_metric
199
+ test_perform_list[fold] = np.asarray([test_mae, test_rse, test_pcc, test_kcc])
200
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_test.pth'))
201
+
202
+ elif args.task == 'cls':
203
+ for epoch in range(1, args.epochs + 1):
204
+ train_loss, ap, auc, f1, acc = trainer(args, epoch, model, train_loader, valid_loader, device, criterion, optimizer)
205
+ logging.info(f'Epoch: {epoch:03d} Train Loss: {train_loss:.3f}, ap: {ap:.3f}, auc: {auc:.3f}, f1: {f1:.3f}, acc: {acc:.3f}')
206
+ scheduler.step()
207
+ avg_metric = ap + auc #+ f1 + acc
208
+ if avg_metric > best_metric:
209
+ logging.info(f'Epoch: {epoch:03d} New best VALIDATION metrics')
210
+ torch.save(model.state_dict(), weights_path)
211
+ best_metric = avg_metric
212
+ best_perform_list[fold] = np.asarray([ap, auc, f1, acc])
213
+
214
+ _, test_ap, test_auc, test_f1, test_acc = trainer(args, epoch, model, None, test_loader, device, None, None)
215
+ logging.info(f'Epoch: {epoch:03d} Test results, ap: {test_ap:.3f}, auc: {test_auc:.3f}, f1: {test_f1:.3f}, acc: {test_acc:.3f}')
216
+ test_metric = test_ap + test_auc #+ test_f1 + test_acc
217
+ if test_metric > best_test and epoch > 10:
218
+ logging.info(f'Epoch: {epoch:03d} New best TEST metrics')
219
+ best_test = test_metric
220
+ test_perform_list[fold] = np.asarray([test_ap, test_auc, test_f1, test_acc])
221
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_test.pth'))
222
+
223
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_last.pth'))
224
+ logging.info(f'used time {(time.time()-start_time)/3600:.2f}h')
225
+
226
+ logging.info(f'Cross Validation Finished!')
227
+ best_perform_list = np.asarray(best_perform_list)
228
+ test_perform_list = np.asarray(test_perform_list)
229
+ logging.info('Best validation perform list\n%s', best_perform_list)
230
+ logging.info('mean: %s', np.round(np.mean(best_perform_list, 0), 3))
231
+ logging.info('std: %s', np.round(np.std(best_perform_list, 0), 3))
232
+ logging.info('Best test perform list\n%s', test_perform_list)
233
+ logging.info('mean: %s', np.round(np.mean(test_perform_list, 0), 3))
234
+ logging.info('std: %s', np.round(np.std(test_perform_list, 0), 3))
235
+ perform = open(weight_dir+'/result.txt', 'w')
236
+ perform.write('Valid\n')
237
+ perform.write(','.join([str(i) for i in np.mean(best_perform_list, 0)])+'\n')
238
+ perform.write(','.join([str(i) for i in np.std(best_perform_list, 0)])+'\n')
239
+ perform.write('Test\n')
240
+ perform.write(','.join([str(i) for i in np.mean(test_perform_list, 0)])+'\n')
241
+ perform.write(','.join([str(i) for i in np.std(test_perform_list, 0)])+'\n')
242
+
243
+
244
+ if __name__ == "__main__":
245
+ main()
main_aug.py ADDED
@@ -0,0 +1,412 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ import logging
4
+ import os
5
+ import time
6
+
7
+ from dataset import PeptidePairDataset, PeptidePairPicDataset
8
+ from network import DMutaPeptide, DMutaPeptideCNN
9
+ from sklearn.model_selection import KFold
10
+ from torchmetrics import MeanAbsoluteError, RelativeSquaredError, PearsonCorrCoef, KendallRankCorrCoef, F1Score, Accuracy, AveragePrecision, AUROC
11
+ import torch
12
+ import torch.nn as nn
13
+ from torch.utils.data import DataLoader, Subset
14
+ import torchvision.transforms.v2 as T
15
+ import numpy as np
16
+ from loss import MLCE, SuperLoss, LogCoshLoss, BMCLoss
17
+ from utils import set_seed
18
+
19
+
20
+ parser = argparse.ArgumentParser(description='resnet26')
21
+ # model setting
22
+ parser.add_argument('--model', type=str, default='resnet34',
23
+ help='resnet34 resnet50 densenet')
24
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='lstm',
25
+ help='lstm mamba mla')
26
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default=None,
27
+ help="use side features")
28
+ parser.add_argument('--channels', type=int, default=256)
29
+ parser.add_argument('--fusion', type=str, default='att',
30
+ help='mlp att diff')
31
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
32
+ help="use global features")
33
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
34
+ help="use non-siamese architecture")
35
+
36
+ # task & dataset setting
37
+ parser.add_argument('--task', type=str, default='reg',
38
+ help='reg or cls')
39
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=False,
40
+ help='use one-way constructed dataset')
41
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
42
+ help='Max length for sequence filtering')
43
+ parser.add_argument('--split', type=int, default=5,
44
+ help="Split k fold in cross validation (default: 5)")
45
+ parser.add_argument('--seed', type=int, default=42,
46
+ help="Seed (default: 1)")
47
+ parser.add_argument('--pcs', action='store_true', default=False,
48
+ help='Consider protease cleavage site')
49
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
50
+ help='Consider protease cleavage site')
51
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
52
+ help='resize the image')
53
+ # parser.add_argument('--llm-data', action='store_true', default=False,
54
+ # help='Use LLM augmentation data')
55
+
56
+ # training setting
57
+ parser.add_argument('--gpu', type=int, default=0,
58
+ help='GPU index to use, -1 for CPU (default: 0)')
59
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
60
+ help='input batch size for training (default: 128)')
61
+ parser.add_argument('--epochs', type=int, default=50,
62
+ help='number of epochs to train (default: 100)')
63
+ parser.add_argument('--lr', type=float, default=0.001,
64
+ help='learning rate (default: 0.001)')
65
+ parser.add_argument('--decay', type=float, default=0.0005,
66
+ help='weight decay (default: 0.0005)')
67
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
68
+ help='path of the pretrain model')
69
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
70
+ help='metric average type')
71
+
72
+ parser.add_argument('--loss', type=str, default='mse',
73
+ help='loss function')
74
+ parser.add_argument('--dir', action='store_true', default=False,
75
+ help='use DIR')
76
+
77
+ args = parser.parse_args()
78
+
79
+ if args.mix_pcs:
80
+ args.pcs = 'mix'
81
+
82
+
83
+ def main():
84
+ set_seed(args.seed)
85
+ if args.task == 'reg':
86
+ args.classes = 1
87
+ trainer = train
88
+ if args.loss == "mse" or args.loss in ['ce']:
89
+ args.loss = 'mse'
90
+ criterion = nn.MSELoss()
91
+ elif args.loss == "smoothl1":
92
+ criterion = nn.SmoothL1Loss()
93
+ elif args.loss == "super":
94
+ criterion = SuperLoss()
95
+ elif args.loss in ["bmc", "bmc_ln"]:
96
+ criterion = BMCLoss()
97
+ else:
98
+ raise NotImplementedError("unimplemented regression task loss function")
99
+ elif args.task == 'cls':
100
+ trainer = train_cls
101
+ args.classes = 2
102
+ if args.loss == 'ce' or args.loss in ['mse', 'smoothl1', 'super']:
103
+ args.loss = 'ce'
104
+ criterion = nn.CrossEntropyLoss()
105
+ else:
106
+ raise NotImplementedError("unimplemented classification task loss function")
107
+ else:
108
+ raise NotImplementedError("unimplemented task")
109
+
110
+ if args.q_encoder in ['cnn', 'rn18']:
111
+ weight_dir = f'./run-{args.task}/{"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}_aug'
112
+ else:
113
+ weight_dir = f'./run-{args.task}/{"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}_aug'
114
+
115
+ if not os.path.exists(weight_dir):
116
+ os.makedirs(weight_dir)
117
+
118
+ logging.basicConfig(handlers=[
119
+ logging.FileHandler(filename=os.path.join(weight_dir, "training.log"), encoding='utf-8', mode='w+'),
120
+ logging.StreamHandler()],
121
+ format="%(asctime)s: %(message)s", datefmt="%F %T", level=logging.INFO)
122
+
123
+ logging.info(f'saving_dir: {weight_dir}')
124
+
125
+ with open(os.path.join(weight_dir, "config.json"), "w") as f:
126
+ f.write(json.dumps(vars(args)))
127
+
128
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
129
+
130
+ if args.q_encoder in ['cnn', 'rn18']:
131
+ logging.info('Loading Training Dataset')
132
+ all_set = PeptidePairPicDataset(mode='train', pad_length=args.max_length, task=args.task, one_way=args.one_way, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
133
+ logging.info('Loading Test Dataset')
134
+ test_set = PeptidePairPicDataset(mode='test', pad_length=args.max_length, task=args.task, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
135
+ else:
136
+ logging.info('Loading Train Dataset')
137
+ all_set = PeptidePairDataset(mode='train', pad_length=args.max_length, task=args.task, one_way=args.one_way, gf=args.glob_feat)
138
+ logging.info('Loading Test Dataset')
139
+ test_set = PeptidePairDataset(mode='test', pad_length=args.max_length, task=args.task, gf=args.glob_feat)
140
+
141
+ test_loader = DataLoader(test_set, batch_size=args.batch_size, shuffle=False, num_workers=8, pin_memory=True)
142
+
143
+ best_perform_list = [[] for i in range(5)]
144
+ test_perform_list = [[] for i in range(5)]
145
+
146
+ kf = KFold(n_splits=5, shuffle=True, random_state=42)
147
+
148
+ for fold, (train_idx, val_idx) in enumerate(kf.split(all_set)):
149
+ train_set= Subset(all_set, train_idx)
150
+ valid_set = Subset(all_set, val_idx)
151
+
152
+ train_loader = DataLoader(train_set, batch_size=args.batch_size, shuffle=True, drop_last=True, num_workers=8, pin_memory=True)
153
+ valid_loader = DataLoader(valid_set, batch_size=args.batch_size, shuffle=False, num_workers=8, pin_memory=True)
154
+
155
+ if args.q_encoder in ['cnn', 'rn18']:
156
+ model = DMutaPeptideCNN(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese)
157
+ else:
158
+ model = DMutaPeptide(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, fusion=args.fusion, non_siamese=args.non_siamese)
159
+ if len(args.pretrain) != 0: #TODO: load pretrain
160
+ pass
161
+ model.to(device)
162
+ # model.compile()
163
+
164
+ optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.decay)
165
+ # optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.decay)
166
+
167
+ # scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10], gamma=0.5)
168
+ if args.q_encoder == 'cnn':
169
+ scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
170
+ else:
171
+ scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)
172
+
173
+ if args.loss == 'bmc_ln':
174
+ optimizer.add_param_group({'params': criterion.noise_sigma, 'lr': args.lr, 'name': 'noise_sigma'})
175
+ weights_path = f"{weight_dir}/model_{fold}.pth"
176
+ # early_stopping = EarlyStopping(patience=args.patience, path=weights_path)
177
+ logging.info(f'Running Cross Validation {fold}')
178
+ logging.info(f'Fold {fold} Train set:{len(train_set)}, Valid set:{len(valid_set)}, Test set: {len(test_set)}')
179
+ best_metric = -float('inf')
180
+ best_test = -float('inf')
181
+ start_time = time.time()
182
+ if args.task == 'reg':
183
+ for epoch in range(1, args.epochs + 1):
184
+ train_loss, mae, rse, pcc, kcc = trainer(args, epoch, model, train_loader, valid_loader, device, criterion, optimizer)
185
+ logging.info(f'Epoch: {epoch:03d} Train Loss: {train_loss:.3f}, mae: {mae:.3f}, rse: {rse:.3f}, pcc: {pcc:.3f}, kcc: {kcc:.3f}')
186
+ scheduler.step()
187
+ avg_metric = (pcc + kcc) - (mae + rse)
188
+ if avg_metric > best_metric:
189
+ logging.info(f'Epoch: {epoch:03d} New best VALIDATION metrics')
190
+ torch.save(model.state_dict(), weights_path)
191
+ best_metric = avg_metric
192
+ best_perform_list[fold] = np.asarray([mae, rse, pcc, kcc])
193
+
194
+ _, test_mae, test_rse, test_pcc, test_kcc = trainer(args, epoch, model, None, test_loader, device, None, None)
195
+ logging.info(f'Epoch: {epoch:03d} Test results, ap: mae: {test_mae:.3f}, rse: {test_rse:.3f}, pcc: {test_pcc:.3f}, kcc: {test_kcc:.3f}')
196
+ test_metric = (test_pcc + test_kcc) - (test_mae + test_rse)
197
+ if test_metric > best_test and epoch > 10:
198
+ logging.info(f'Epoch: {epoch:03d} New best TEST metrics')
199
+ best_test = test_metric
200
+ test_perform_list[fold] = np.asarray([test_mae, test_rse, test_pcc, test_kcc])
201
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_test.pth'))
202
+
203
+ elif args.task == 'cls':
204
+ for epoch in range(1, args.epochs + 1):
205
+ train_loss, ap, auc, f1, acc = trainer(args, epoch, model, train_loader, valid_loader, device, criterion, optimizer)
206
+ logging.info(f'Epoch: {epoch:03d} Train Loss: {train_loss:.3f}, ap: {ap:.3f}, auc: {auc:.3f}, f1: {f1:.3f}, acc: {acc:.3f}')
207
+ scheduler.step()
208
+ avg_metric = ap + auc #+ f1 + acc
209
+ if avg_metric > best_metric:
210
+ logging.info(f'Epoch: {epoch:03d} New best VALIDATION metrics')
211
+ torch.save(model.state_dict(), weights_path)
212
+ best_metric = avg_metric
213
+ best_perform_list[fold] = np.asarray([ap, auc, f1, acc])
214
+
215
+ _, test_ap, test_auc, test_f1, test_acc = trainer(args, epoch, model, None, test_loader, device, None, None)
216
+ logging.info(f'Epoch: {epoch:03d} Test results, ap: {test_ap:.3f}, auc: {test_auc:.3f}, f1: {test_f1:.3f}, acc: {test_acc:.3f}')
217
+ test_metric = test_ap + test_auc #+ test_f1 + test_acc
218
+ if test_metric > best_test and epoch > 10:
219
+ logging.info(f'Epoch: {epoch:03d} New best TEST metrics')
220
+ best_test = test_metric
221
+ test_perform_list[fold] = np.asarray([test_ap, test_auc, test_f1, test_acc])
222
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_test.pth'))
223
+
224
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_last.pth'))
225
+ logging.info(f'used time {(time.time()-start_time)/3600:.2f}h')
226
+
227
+ logging.info(f'Cross Validation Finished!')
228
+ best_perform_list = np.asarray(best_perform_list)
229
+ test_perform_list = np.asarray(test_perform_list)
230
+ logging.info('Best validation perform list\n%s', best_perform_list)
231
+ logging.info('mean: %s', np.round(np.mean(best_perform_list, 0), 3))
232
+ logging.info('std: %s', np.round(np.std(best_perform_list, 0), 3))
233
+ logging.info('Best test perform list\n%s', test_perform_list)
234
+ logging.info('mean: %s', np.round(np.mean(test_perform_list, 0), 3))
235
+ logging.info('std: %s', np.round(np.std(test_perform_list, 0), 3))
236
+ perform = open(weight_dir+'/result.txt', 'w')
237
+ perform.write('Valid\n')
238
+ perform.write(','.join([str(i) for i in np.mean(best_perform_list, 0)])+'\n')
239
+ perform.write(','.join([str(i) for i in np.std(best_perform_list, 0)])+'\n')
240
+ perform.write('Test\n')
241
+ perform.write(','.join([str(i) for i in np.mean(test_perform_list, 0)])+'\n')
242
+ perform.write(','.join([str(i) for i in np.std(test_perform_list, 0)])+'\n')
243
+
244
+
245
+ def move_to_device(batch, device, non_blocking=False):
246
+ if isinstance(batch, (list, tuple)):
247
+ return type(batch)(move_to_device(item, device, non_blocking) for item in batch)
248
+ return batch.to(device, non_blocking=non_blocking)
249
+
250
+
251
+ def move_and_aug(batch, device, transforms, non_blocking=False):
252
+ batch = move_to_device(batch, device, non_blocking)
253
+ if not isinstance(batch[0][0], (list, tuple)):
254
+ return batch
255
+
256
+ for i in range(batch[0][0][0].shape[0]):
257
+ img_pair = torch.stack((batch[0][0][0][i], batch[0][1][0][i]), dim=0)
258
+ img_pair = transforms(img_pair)
259
+ batch[0][0][0][i] = img_pair[0]
260
+ batch[0][1][0][i] = img_pair[1]
261
+ return batch
262
+
263
+
264
+ class GaussianNoise(nn.Module):
265
+ def __init__(self, mean=0., sigma=0.15):
266
+ super(GaussianNoise, self).__init__()
267
+ self.mean = mean
268
+ self.sigma = sigma
269
+
270
+ def forward(self, x):
271
+ return x + torch.randn_like(x) * self.sigma + self.mean
272
+
273
+
274
+ Transforms = T.Compose([
275
+ T.RandomResizedCrop(args.resize, scale=(0.9, 1.0)),
276
+ T.RandomRotation(degrees=30),
277
+ GaussianNoise(0., 0.05),
278
+ ])
279
+
280
+ def train(args, epoch, model, train_loader, valid_loader, device, criterion, optimizer):
281
+ train_loss = 0
282
+ num_labels = model.classes
283
+ metric_mae = MeanAbsoluteError().to(device)
284
+ metric_rse = RelativeSquaredError(num_outputs=num_labels).to(device)
285
+ metric_pcc = PearsonCorrCoef(num_outputs=num_labels).to(device)
286
+ metric_kcc = KendallRankCorrCoef(num_outputs=num_labels).to(device)
287
+
288
+ if args.dir:
289
+ encodings, labels = [], []
290
+
291
+ if train_loader is not None:
292
+ model.train()
293
+ for data in train_loader:
294
+ x, gt = data
295
+ x = move_and_aug(x, device, Transforms)
296
+ if args.dir:
297
+ out, features = model(x,
298
+ gt.to(device),
299
+ epoch)
300
+ encodings.append(features.detach().cpu())
301
+ labels.append(gt.cpu())
302
+ else:
303
+ out = model(x)
304
+ loss = criterion(out, gt.to(device))
305
+ loss.backward()
306
+ optimizer.step()
307
+ optimizer.zero_grad()
308
+ train_loss += loss.item()
309
+ train_loss /= len(train_loader)
310
+
311
+ if args.dir:
312
+ encodings, labels = torch.cat(encodings), torch.cat(labels)
313
+ model.FDS.update_last_epoch_stats(epoch)
314
+ model.FDS.update_running_stats(encodings, labels, epoch)
315
+ encodings, labels = [], []
316
+
317
+
318
+ model.eval()
319
+ preds = []
320
+ gt_list_valid = []
321
+ with torch.no_grad():
322
+ for data in valid_loader:
323
+ x, gt = data
324
+ x = move_to_device(x, device)
325
+ gt_list_valid.append(gt.to(device))
326
+ out = model(x)
327
+ if args.dir:
328
+ out, _ = out
329
+ preds.append(out)
330
+
331
+ # calculate metrics
332
+ preds = torch.cat(preds, dim=0)
333
+ gt_list_valid = torch.cat(gt_list_valid, dim=0)
334
+
335
+ mae = metric_mae(preds, gt_list_valid).item()
336
+ rse = metric_rse(preds, gt_list_valid).item()
337
+ pcc = metric_pcc(preds.squeeze(), gt_list_valid.squeeze()).mean().item()
338
+ kcc = metric_kcc(preds.squeeze(), gt_list_valid.squeeze()).mean().item()
339
+ return train_loss, mae, rse, pcc, kcc
340
+
341
+
342
+ def update_ce_loss_weight(loss_fn: torch.nn.CrossEntropyLoss, gt: torch.Tensor, num_classes: int, device):
343
+ """
344
+ 根据当前 batch 的 ground truth 标签更新 nn.CrossEntropyLoss 对象中的 weight 缓冲区,
345
+ 使用逆频率方法计算新权重,并通过 register_buffer 进行原地更新。
346
+
347
+ 参数:
348
+ loss_fn (nn.CrossEntropyLoss): 已初始化的 nn.CrossEntropyLoss 对象,
349
+ 要求在初始化时已经注册了 weight 缓冲区。
350
+ gt (torch.Tensor): 当前 batch 的 ground truth 标签,1D整数张量,标签取值范围 [0, num_classes-1]。
351
+ """
352
+ class_counts = torch.bincount(gt, minlength=num_classes).float()
353
+ epsilon = 1e-6
354
+ new_weights = 1.0 / (class_counts + epsilon)
355
+ new_weights = new_weights / new_weights.sum() * num_classes
356
+ # 使用 register_buffer 来更新 loss_fn 内部的 weight 缓冲区
357
+ loss_fn.register_buffer('weight', new_weights.to(device))
358
+
359
+ def train_cls(args, epoch, model, train_loader, valid_loader, device, criterion, optimizer):
360
+ train_loss = 0
361
+ num_labels = model.classes
362
+ avg = args.metric_avg
363
+ if num_labels == 1 or num_labels == 2:
364
+ task = 'binary'
365
+ else:
366
+ task = 'multiclass'
367
+ metric_acc = Accuracy(average=avg, task=task, num_classes=num_labels).to(device)
368
+ metric_f1 = F1Score(average=avg, task=task, num_classes=num_labels).to(device)
369
+ metric_ap = AveragePrecision(average=avg, task=task, num_classes=num_labels).to(device)
370
+ metric_auc = AUROC(average=avg, task=task, num_classes=num_labels).to(device)
371
+
372
+ if train_loader is not None:
373
+ model.train()
374
+ for data in train_loader:
375
+ x, gt = data
376
+ x = move_to_device(x, device)
377
+ out = model(x)
378
+ update_ce_loss_weight(criterion, gt, num_classes=num_labels, device=device)
379
+ loss = criterion(out, gt.to(device))
380
+ loss.backward()
381
+ optimizer.step()
382
+ optimizer.zero_grad()
383
+ train_loss += loss.item()
384
+ train_loss /= len(train_loader)
385
+
386
+ model.eval()
387
+ preds = []
388
+ gt_list_valid = []
389
+ with torch.no_grad():
390
+ for data in valid_loader:
391
+ x, gt = data
392
+ x = move_to_device(x, device)
393
+ gt_list_valid.append(gt.to(device))
394
+ out = model(x)
395
+ preds.append(out)
396
+
397
+ # calculate metrics
398
+ preds = torch.softmax(torch.cat(preds, dim=0), dim=-1).squeeze()
399
+ gt_list_valid = torch.cat(gt_list_valid, dim=0).int().squeeze()
400
+
401
+ if num_labels == 2:
402
+ preds = preds[:, 1]
403
+
404
+ ap = metric_ap(preds, gt_list_valid).item()
405
+ auc = metric_auc(preds, gt_list_valid).item()
406
+ f1 = metric_f1(preds, gt_list_valid).item()
407
+ acc = metric_acc(preds, gt_list_valid).item()
408
+ return train_loss, ap, auc, f1, acc
409
+
410
+
411
+ if __name__ == "__main__":
412
+ main()
main_imagemol.py ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ import logging
4
+ import os
5
+ import time
6
+
7
+ from dataset import PeptidePairDataset, PeptidePairPicDataset
8
+ from network import DMutaPeptide, DMutaPeptideCNN
9
+ from sklearn.model_selection import KFold
10
+ from train import train, train_cls
11
+ import torch
12
+ import torch.nn as nn
13
+ from torch.utils.data import DataLoader, Subset
14
+ import numpy as np
15
+ from loss import MLCE, SuperLoss, LogCoshLoss, BMCLoss
16
+ from utils import set_seed
17
+
18
+
19
+ parser = argparse.ArgumentParser(description='resnet26')
20
+ # model setting
21
+ parser.add_argument('--model', type=str, default='resnet34',
22
+ help='resnet34 resnet50 densenet')
23
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='rn18',
24
+ help='lstm mamba mla')
25
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default=None,
26
+ help="use side features")
27
+ parser.add_argument('--channels', type=int, default=256)
28
+ parser.add_argument('--fusion', type=str, default='att',
29
+ help='mlp att diff')
30
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
31
+ help="use global features")
32
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
33
+ help="use non-siamese architecture")
34
+
35
+ # task & dataset setting
36
+ parser.add_argument('--task', type=str, default='reg',
37
+ help='reg or cls')
38
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=False,
39
+ help='use one-way constructed dataset')
40
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
41
+ help='Max length for sequence filtering')
42
+ parser.add_argument('--split', type=int, default=5,
43
+ help="Split k fold in cross validation (default: 5)")
44
+ parser.add_argument('--seed', type=int, default=42,
45
+ help="Seed (default: 1)")
46
+ parser.add_argument('--pcs', action='store_true', default=False,
47
+ help='Consider protease cleavage site')
48
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
49
+ help='Consider protease cleavage site')
50
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
51
+ help='resize the image')
52
+ # parser.add_argument('--llm-data', action='store_true', default=False,
53
+ # help='Use LLM augmentation data')
54
+
55
+ # training setting
56
+ parser.add_argument('--gpu', type=int, default=0,
57
+ help='GPU index to use, -1 for CPU (default: 0)')
58
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
59
+ help='input batch size for training (default: 128)')
60
+ parser.add_argument('--epochs', type=int, default=50,
61
+ help='number of epochs to train (default: 100)')
62
+ parser.add_argument('--lr', type=float, default=0.001,
63
+ help='learning rate (default: 0.001)')
64
+ parser.add_argument('--decay', type=float, default=0.0005,
65
+ help='weight decay (default: 0.0005)')
66
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
67
+ help='path of the pretrain model')
68
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
69
+ help='metric average type')
70
+
71
+ parser.add_argument('--loss', type=str, default='mse',
72
+ help='loss function')
73
+ parser.add_argument('--dir', action='store_true', default=False,
74
+ help='use DIR')
75
+
76
+ args = parser.parse_args()
77
+
78
+ if args.mix_pcs:
79
+ args.pcs = 'mix'
80
+
81
+
82
+ def main():
83
+ set_seed(args.seed)
84
+ if args.task == 'reg':
85
+ args.classes = 1
86
+ trainer = train
87
+ if args.loss == "mse" or args.loss in ['ce']:
88
+ args.loss = 'mse'
89
+ criterion = nn.MSELoss()
90
+ elif args.loss == "smoothl1":
91
+ criterion = nn.SmoothL1Loss()
92
+ elif args.loss == "super":
93
+ criterion = SuperLoss()
94
+ elif args.loss in ["bmc", "bmc_ln"]:
95
+ criterion = BMCLoss()
96
+ else:
97
+ raise NotImplementedError("unimplemented regression task loss function")
98
+ elif args.task == 'cls':
99
+ trainer = train_cls
100
+ args.classes = 2
101
+ if args.loss == 'ce' or args.loss in ['mse', 'smoothl1', 'super']:
102
+ args.loss = 'ce'
103
+ criterion = nn.CrossEntropyLoss()
104
+ else:
105
+ raise NotImplementedError("unimplemented classification task loss function")
106
+ else:
107
+ raise NotImplementedError("unimplemented task")
108
+
109
+ if args.q_encoder in ['cnn', 'rn18']:
110
+ weight_dir = f'./run-{args.task}/{"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}_ImageMol'
111
+ else:
112
+ weight_dir = f'./run-{args.task}/{"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}_ImageMol'
113
+
114
+ if not os.path.exists(weight_dir):
115
+ os.makedirs(weight_dir)
116
+
117
+ logging.basicConfig(handlers=[
118
+ logging.FileHandler(filename=os.path.join(weight_dir, "training.log"), encoding='utf-8', mode='w+'),
119
+ logging.StreamHandler()],
120
+ format="%(asctime)s: %(message)s", datefmt="%F %T", level=logging.INFO)
121
+
122
+ logging.info(f'saving_dir: {weight_dir}')
123
+
124
+ with open(os.path.join(weight_dir, "config.json"), "w") as f:
125
+ f.write(json.dumps(vars(args)))
126
+
127
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
128
+
129
+ if args.q_encoder in ['cnn', 'rn18']:
130
+ logging.info('Loading Training Dataset')
131
+ all_set = PeptidePairPicDataset(mode='train', pad_length=args.max_length, task=args.task, one_way=args.one_way, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
132
+ logging.info('Loading Test Dataset')
133
+ test_set = PeptidePairPicDataset(mode='test', pad_length=args.max_length, task=args.task, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
134
+ else:
135
+ logging.info('Loading Train Dataset')
136
+ all_set = PeptidePairDataset(mode='train', pad_length=args.max_length, task=args.task, one_way=args.one_way, gf=args.glob_feat)
137
+ logging.info('Loading Test Dataset')
138
+ test_set = PeptidePairDataset(mode='test', pad_length=args.max_length, task=args.task, gf=args.glob_feat)
139
+
140
+ test_loader = DataLoader(test_set, batch_size=args.batch_size, shuffle=False, num_workers=8, pin_memory=True)
141
+
142
+ best_perform_list = [[] for i in range(5)]
143
+ test_perform_list = [[] for i in range(5)]
144
+
145
+ kf = KFold(n_splits=5, shuffle=True, random_state=42)
146
+
147
+ for fold, (train_idx, val_idx) in enumerate(kf.split(all_set)):
148
+ train_set= Subset(all_set, train_idx)
149
+ valid_set = Subset(all_set, val_idx)
150
+
151
+ train_loader = DataLoader(train_set, batch_size=args.batch_size, shuffle=True, drop_last=True, num_workers=8, pin_memory=True)
152
+ valid_loader = DataLoader(valid_set, batch_size=args.batch_size, shuffle=False, num_workers=8, pin_memory=True)
153
+
154
+ if args.q_encoder in ['cnn', 'rn18']:
155
+ model = DMutaPeptideCNN(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese)
156
+ model.q_encoder.load_state_dict(torch.load('./ImageMolEncoder.pth', map_location=device))
157
+ else:
158
+ model = DMutaPeptide(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, fusion=args.fusion, non_siamese=args.non_siamese)
159
+ if len(args.pretrain) != 0: #TODO: load pretrain
160
+ pass
161
+ model.to(device)
162
+ # model.compile()
163
+
164
+ optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.decay)
165
+ # optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.decay)
166
+
167
+ # scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10], gamma=0.5)
168
+ if args.q_encoder == 'cnn':
169
+ scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
170
+ else:
171
+ scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)
172
+
173
+ if args.loss == 'bmc_ln':
174
+ optimizer.add_param_group({'params': criterion.noise_sigma, 'lr': args.lr, 'name': 'noise_sigma'})
175
+ weights_path = f"{weight_dir}/model_{fold}.pth"
176
+ # early_stopping = EarlyStopping(patience=args.patience, path=weights_path)
177
+ logging.info(f'Running Cross Validation {fold}')
178
+ logging.info(f'Fold {fold} Train set:{len(train_set)}, Valid set:{len(valid_set)}, Test set: {len(test_set)}')
179
+ best_metric = -float('inf')
180
+ best_test = -float('inf')
181
+ start_time = time.time()
182
+ if args.task == 'reg':
183
+ for epoch in range(1, args.epochs + 1):
184
+ train_loss, mae, rse, pcc, kcc = trainer(args, epoch, model, train_loader, valid_loader, device, criterion, optimizer)
185
+ logging.info(f'Epoch: {epoch:03d} Train Loss: {train_loss:.3f}, mae: {mae:.3f}, rse: {rse:.3f}, pcc: {pcc:.3f}, kcc: {kcc:.3f}')
186
+ scheduler.step()
187
+ avg_metric = (pcc + kcc) - (mae + rse)
188
+ if avg_metric > best_metric:
189
+ logging.info(f'Epoch: {epoch:03d} New best VALIDATION metrics')
190
+ torch.save(model.state_dict(), weights_path)
191
+ best_metric = avg_metric
192
+ best_perform_list[fold] = np.asarray([mae, rse, pcc, kcc])
193
+
194
+ _, test_mae, test_rse, test_pcc, test_kcc = trainer(args, epoch, model, None, test_loader, device, None, None)
195
+ logging.info(f'Epoch: {epoch:03d} Test results, ap: mae: {test_mae:.3f}, rse: {test_rse:.3f}, pcc: {test_pcc:.3f}, kcc: {test_kcc:.3f}')
196
+ test_metric = (test_pcc + test_kcc) - (test_mae + test_rse)
197
+ if test_metric > best_test and epoch > 10:
198
+ logging.info(f'Epoch: {epoch:03d} New best TEST metrics')
199
+ best_test = test_metric
200
+ test_perform_list[fold] = np.asarray([test_mae, test_rse, test_pcc, test_kcc])
201
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_test.pth'))
202
+
203
+ elif args.task == 'cls':
204
+ for epoch in range(1, args.epochs + 1):
205
+ train_loss, ap, auc, f1, acc = trainer(args, epoch, model, train_loader, valid_loader, device, criterion, optimizer)
206
+ logging.info(f'Epoch: {epoch:03d} Train Loss: {train_loss:.3f}, ap: {ap:.3f}, auc: {auc:.3f}, f1: {f1:.3f}, acc: {acc:.3f}')
207
+ scheduler.step()
208
+ avg_metric = ap + auc #+ f1 + acc
209
+ if avg_metric > best_metric:
210
+ logging.info(f'Epoch: {epoch:03d} New best VALIDATION metrics')
211
+ torch.save(model.state_dict(), weights_path)
212
+ best_metric = avg_metric
213
+ best_perform_list[fold] = np.asarray([ap, auc, f1, acc])
214
+
215
+ _, test_ap, test_auc, test_f1, test_acc = trainer(args, epoch, model, None, test_loader, device, None, None)
216
+ logging.info(f'Epoch: {epoch:03d} Test results, ap: {test_ap:.3f}, auc: {test_auc:.3f}, f1: {test_f1:.3f}, acc: {test_acc:.3f}')
217
+ test_metric = test_ap + test_auc #+ test_f1 + test_acc
218
+ if test_metric > best_test and epoch > 10:
219
+ logging.info(f'Epoch: {epoch:03d} New best TEST metrics')
220
+ best_test = test_metric
221
+ test_perform_list[fold] = np.asarray([test_ap, test_auc, test_f1, test_acc])
222
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_test.pth'))
223
+
224
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_last.pth'))
225
+ logging.info(f'used time {(time.time()-start_time)/3600:.2f}h')
226
+
227
+ logging.info(f'Cross Validation Finished!')
228
+ best_perform_list = np.asarray(best_perform_list)
229
+ test_perform_list = np.asarray(test_perform_list)
230
+ logging.info('Best validation perform list\n%s', best_perform_list)
231
+ logging.info('mean: %s', np.round(np.mean(best_perform_list, 0), 3))
232
+ logging.info('std: %s', np.round(np.std(best_perform_list, 0), 3))
233
+ logging.info('Best test perform list\n%s', test_perform_list)
234
+ logging.info('mean: %s', np.round(np.mean(test_perform_list, 0), 3))
235
+ logging.info('std: %s', np.round(np.std(test_perform_list, 0), 3))
236
+ perform = open(weight_dir+'/result.txt', 'w')
237
+ perform.write('Valid\n')
238
+ perform.write(','.join([str(i) for i in np.mean(best_perform_list, 0)])+'\n')
239
+ perform.write(','.join([str(i) for i in np.std(best_perform_list, 0)])+'\n')
240
+ perform.write('Test\n')
241
+ perform.write(','.join([str(i) for i in np.mean(test_perform_list, 0)])+'\n')
242
+ perform.write(','.join([str(i) for i in np.std(test_perform_list, 0)])+'\n')
243
+
244
+
245
+ if __name__ == "__main__":
246
+ main()
main_simple.py ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ import logging
4
+ import os
5
+ import time
6
+
7
+ from dataset import PeptidePairDataset, PeptidePairPicDataset, SimplePairClsDataset
8
+ from network import DMutaPeptide, DMutaPeptideCNN
9
+ from sklearn.model_selection import KFold
10
+ from train import train, train_cls
11
+ import torch
12
+ import torch.nn as nn
13
+ from torch.utils.data import DataLoader, Subset
14
+ import numpy as np
15
+ from loss import MLCE, SuperLoss, LogCoshLoss, BMCLoss
16
+ from utils import set_seed
17
+
18
+
19
+ parser = argparse.ArgumentParser(description='resnet26')
20
+ # model setting
21
+ parser.add_argument('--model', type=str, default='resnet34',
22
+ help='resnet34 resnet50 densenet')
23
+ parser.add_argument('--q-encoder', dest='q_encoder', type=str, default='lstm',
24
+ help='lstm mamba mla')
25
+ parser.add_argument("--side-enc", dest='side_enc', type=str, default=None,
26
+ help="use side features")
27
+ parser.add_argument('--channels', type=int, default=256)
28
+ parser.add_argument('--fusion', type=str, default='att',
29
+ help='mlp att diff')
30
+ parser.add_argument('--glob-feat', dest='glob_feat', action='store_true', default=False,
31
+ help="use global features")
32
+ parser.add_argument('--non-siamese', dest='non_siamese', action='store_true', default=False,
33
+ help="use non-siamese architecture")
34
+
35
+ # task & dataset setting
36
+ parser.add_argument('--task', type=str, default='cls',
37
+ help='reg or cls')
38
+ parser.add_argument('--one-way', action='store_true', dest='one_way', default=True,
39
+ help='use one-way constructed dataset')
40
+ parser.add_argument('--max-length', dest='max_length', type=int, default=30,
41
+ help='Max length for sequence filtering')
42
+ parser.add_argument('--split', type=int, default=5,
43
+ help="Split k fold in cross validation (default: 5)")
44
+ parser.add_argument('--seed', type=int, default=1,
45
+ help="Seed (default: 1)")
46
+ parser.add_argument('--pcs', action='store_true', default=False,
47
+ help='Consider protease cut site')
48
+ parser.add_argument('--mix-pcs', dest='mix_pcs', action='store_true', default=False,
49
+ help='Consider protease cut site')
50
+ parser.add_argument('--resize', type=int, default=[768], nargs='+',
51
+ help='resize the image')
52
+ parser.add_argument('--llm-data', action='store_true', default=False,
53
+ help='Use LLM augmentation data')
54
+
55
+ # training setting
56
+ parser.add_argument('--gpu', type=int, default=0,
57
+ help='GPU index to use, -1 for CPU (default: 0)')
58
+ parser.add_argument('--batch-size', type=int, dest='batch_size', default=32,
59
+ help='input batch size for training (default: 128)')
60
+ parser.add_argument('--epochs', type=int, default=50,
61
+ help='number of epochs to train (default: 100)')
62
+ parser.add_argument('--lr', type=float, default=0.001,
63
+ help='learning rate (default: 0.001)')
64
+ parser.add_argument('--decay', type=float, default=0.0005,
65
+ help='weight decay (default: 0.0005)')
66
+ parser.add_argument('--pretrain', type=str, dest='pretrain', default='',
67
+ help='path of the pretrain model')
68
+ parser.add_argument('--metric-avg', type=str, dest='metric_avg', default='macro',
69
+ help='metric average type')
70
+
71
+ parser.add_argument('--loss', type=str, default='ce',
72
+ help='loss function')
73
+ parser.add_argument('--dir', action='store_true', default=False,
74
+ help='use DIR')
75
+
76
+ args = parser.parse_args()
77
+
78
+ if args.mix_pcs:
79
+ args.pcs = 'mix'
80
+
81
+
82
+ def main():
83
+ set_seed(args.seed)
84
+ if args.task == 'reg':
85
+ raise NotImplementedError("unimplemented regression task")
86
+ elif args.task == 'cls':
87
+ trainer = train_cls
88
+ args.classes = 2
89
+ if args.loss == 'ce' or args.loss in ['mse', 'smoothl1', 'super']:
90
+ args.loss = 'ce'
91
+ criterion = nn.CrossEntropyLoss()
92
+ else:
93
+ raise NotImplementedError("unimplemented classification task loss function")
94
+ else:
95
+ raise NotImplementedError("unimplemented task")
96
+
97
+ if args.q_encoder in ['cnn', 'rn18']:
98
+ weight_dir = f'./run-{args.task}/{"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}{f"-{args.side_enc}" if args.side_enc else ""}{"-mixpcs" if args.mix_pcs else ""}{"-pcs" if args.pcs==True else ""}-simple{"-llm" if args.llm_data else ""}{"-" + "x".join(str(n) for n in args.resize) if args.resize else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
99
+ else:
100
+ weight_dir = f'./run-{args.task}/{f"non-siamese-" if args.non_siamese else ""}{args.q_encoder}-{args.fusion}-{args.channels}-simple{"-llm" if args.llm_data else ""}{"-gf" if args.glob_feat else ""}{"-oneway" if args.one_way else ""}-{args.loss + "-dir" if args.dir else args.loss}-{str(args.batch_size)}-{str(args.lr)}-{str(args.epochs)}'
101
+
102
+ if not os.path.exists(weight_dir):
103
+ os.makedirs(weight_dir)
104
+
105
+ logging.basicConfig(handlers=[
106
+ logging.FileHandler(filename=os.path.join(weight_dir, "training.log"), encoding='utf-8', mode='w+'),
107
+ logging.StreamHandler()],
108
+ format="%(asctime)s: %(message)s", datefmt="%F %T", level=logging.INFO)
109
+
110
+ logging.info(f'saving_dir: {weight_dir}')
111
+
112
+ with open(os.path.join(weight_dir, "config.json"), "w") as f:
113
+ f.write(json.dumps(vars(args)))
114
+
115
+ device = torch.device("cpu" if args.gpu == -1 or not torch.cuda.is_available() else f"cuda:{args.gpu}")
116
+
117
+ logging.info('Loading Training Dataset')
118
+ all_set = SimplePairClsDataset(pad_length=args.max_length, llm=args.llm_data, gf=args.glob_feat, q_encoder=args.q_encoder, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
119
+
120
+ logging.info('Loading Test Dataset')
121
+ if args.q_encoder in ['cnn', 'rn18']:
122
+ test_set = PeptidePairPicDataset(mode='test', pad_length=args.max_length, task=args.task, gf=args.glob_feat, side_enc=args.side_enc, pcs=args.pcs, resize=args.resize)
123
+ else:
124
+ test_set = PeptidePairDataset(mode='test', pad_length=args.max_length, task=args.task, gf=args.glob_feat)
125
+
126
+ test_loader = DataLoader(test_set, batch_size=args.batch_size, shuffle=False, num_workers=8, pin_memory=True)
127
+
128
+ best_perform_list = [[] for i in range(5)]
129
+ test_perform_list = [[] for i in range(5)]
130
+
131
+ kf = KFold(n_splits=5, shuffle=True, random_state=42)
132
+
133
+ for fold, (train_idx, val_idx) in enumerate(kf.split(all_set)):
134
+ train_set= Subset(all_set, train_idx)
135
+ valid_set = Subset(all_set, val_idx)
136
+
137
+ train_loader = DataLoader(train_set, batch_size=args.batch_size, shuffle=True, drop_last=True, num_workers=8, pin_memory=True)
138
+ valid_loader = DataLoader(valid_set, batch_size=args.batch_size, shuffle=False, num_workers=8, pin_memory=True)
139
+
140
+ if args.q_encoder in ['cnn', 'rn18']:
141
+ model = DMutaPeptideCNN(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, side_enc=args.side_enc, fusion=args.fusion, non_siamese=args.non_siamese)
142
+ else:
143
+ model = DMutaPeptide(q_encoder=args.q_encoder, classes=args.classes, channels=args.channels, dir=args.dir, gf=args.glob_feat, fusion=args.fusion, non_siamese=args.non_siamese)
144
+ if len(args.pretrain) != 0: #TODO: load pretrain
145
+ pass
146
+ model.to(device)
147
+ # model.compile()
148
+
149
+ optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.decay)
150
+
151
+ if args.q_encoder == 'cnn':
152
+ scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
153
+ else:
154
+ scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)
155
+
156
+ if args.loss == 'bmc_ln':
157
+ optimizer.add_param_group({'params': criterion.noise_sigma, 'lr': args.lr, 'name': 'noise_sigma'})
158
+ weights_path = f"{weight_dir}/model_{fold}.pth"
159
+ # early_stopping = EarlyStopping(patience=args.patience, path=weights_path)
160
+ logging.info(f'Running Cross Validation {fold}')
161
+ logging.info(f'Fold {fold} Train set:{len(train_set)}, Valid set:{len(valid_set)}, Test set: {len(test_set)}')
162
+ best_metric = -float('inf')
163
+ best_test = -float('inf')
164
+ start_time = time.time()
165
+ if args.task == 'cls':
166
+ for epoch in range(1, args.epochs + 1):
167
+ train_loss, ap, auc, f1, acc = trainer(args, epoch, model, train_loader, valid_loader, device, criterion, optimizer)
168
+ logging.info(f'Epoch: {epoch:03d} Train Loss: {train_loss:.3f}, ap: {ap:.3f}, auc: {auc:.3f}, f1: {f1:.3f}, acc: {acc:.3f}')
169
+ scheduler.step()
170
+ avg_metric = ap + auc #+ f1 + acc
171
+ if avg_metric > best_metric:
172
+ logging.info(f'Epoch: {epoch:03d} New best VALIDATION metrics')
173
+ torch.save(model.state_dict(), weights_path)
174
+ best_metric = avg_metric
175
+ best_perform_list[fold] = np.asarray([ap, auc, f1, acc])
176
+
177
+ _, test_ap, test_auc, test_f1, test_acc = trainer(args, epoch, model, None, test_loader, device, None, None)
178
+ logging.info(f'Epoch: {epoch:03d} Test results, ap: {test_ap:.3f}, auc: {test_auc:.3f}, f1: {test_f1:.3f}, acc: {test_acc:.3f}')
179
+ test_metric = test_ap + test_auc #+ test_f1 + test_acc
180
+ if test_metric > best_test and epoch > 10:
181
+ logging.info(f'Epoch: {epoch:03d} New best TEST metrics')
182
+ best_test = test_metric
183
+ test_perform_list[fold] = np.asarray([test_ap, test_auc, test_f1, test_acc])
184
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_test.pth'))
185
+
186
+ torch.save(model.state_dict(), weights_path.replace('.pth', '_last.pth'))
187
+ logging.info(f'used time {(time.time()-start_time)/3600:.2f}h')
188
+
189
+ logging.info(f'Cross Validation Finished!')
190
+ best_perform_list = np.asarray(best_perform_list)
191
+ test_perform_list = np.asarray(test_perform_list)
192
+ logging.info('Best validation perform list\n%s', best_perform_list)
193
+ logging.info('mean: %s', np.round(np.mean(best_perform_list, 0), 3))
194
+ logging.info('std: %s', np.round(np.std(best_perform_list, 0), 3))
195
+ logging.info('Best test perform list\n%s', test_perform_list)
196
+ logging.info('mean: %s', np.round(np.mean(test_perform_list, 0), 3))
197
+ logging.info('std: %s', np.round(np.std(test_perform_list, 0), 3))
198
+ perform = open(weight_dir+'/result.txt', 'w')
199
+ perform.write('Valid\n')
200
+ perform.write(','.join([str(i) for i in np.mean(best_perform_list, 0)])+'\n')
201
+ perform.write(','.join([str(i) for i in np.std(best_perform_list, 0)])+'\n')
202
+ perform.write('Test\n')
203
+ perform.write(','.join([str(i) for i in np.mean(test_perform_list, 0)])+'\n')
204
+ perform.write(','.join([str(i) for i in np.std(test_perform_list, 0)])+'\n')
205
+
206
+
207
+ if __name__ == "__main__":
208
+ main()
network.py ADDED
@@ -0,0 +1,586 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ import torch.nn.functional as F
4
+ from copy import deepcopy
5
+ from mamba_ssm import Mamba
6
+ from utils import FDS
7
+ from torchvision.models import resnet18
8
+
9
+ class MambaModel(nn.Module):
10
+ def __init__(self, d_model, max_length=30):
11
+ super(MambaModel, self).__init__()
12
+ self.linear = nn.Linear(in_features=21, out_features=d_model)
13
+ self.pos_encoder = PositionalEncoding(d_model, max_length)
14
+ self.mamba = Mamba(d_model=d_model, d_state=32, expand=4)
15
+ self.global_pool = nn.AdaptiveAvgPool1d(1)
16
+
17
+ def forward(self, x: torch.Tensor):
18
+ x = self.pos_encoder(self.linear(x))
19
+ y = self.mamba(x)
20
+ y_flip = self.mamba(x.flip([-2])).flip([-2])
21
+ y = torch.cat((y, y_flip), dim=-1)
22
+ y = self.global_pool(y.permute(0, 2, 1)).squeeze(-1)
23
+ return y
24
+
25
+
26
+ class MLP(nn.Module):
27
+ def __init__(self, input_dim, hidden_dim, output_dim, num_layers=3, dropout_rate=0.1):
28
+ super(MLP, self).__init__()
29
+ if isinstance(hidden_dim, int):
30
+ hidden_dim = [hidden_dim] * num_layers
31
+
32
+ layers = []
33
+ layers.append(nn.Linear(input_dim, hidden_dim[0]))
34
+ layers.append(nn.ReLU())
35
+ layers.append(nn.Dropout(dropout_rate))
36
+
37
+ for i in range(len(hidden_dim) - 1):
38
+ layers.append(nn.Linear(hidden_dim[i], hidden_dim[i + 1]))
39
+ layers.append(nn.ReLU())
40
+ layers.append(nn.Dropout(dropout_rate))
41
+
42
+ layers.append(nn.Linear(hidden_dim[-1], output_dim))
43
+
44
+ self.network = nn.Sequential(*layers)
45
+
46
+ def forward(self, x):
47
+ return self.network(x)
48
+
49
+
50
+ class PositionalEncoding(nn.Module):
51
+ def __init__(self, d_model, max_len=50):
52
+ super(PositionalEncoding, self).__init__()
53
+
54
+ pe = torch.zeros(max_len, d_model) # (max_len, d_model)
55
+ position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1) # (max_len, 1)
56
+ div_term = torch.exp(torch.arange(0, d_model, 2).float() *
57
+ (-torch.log(torch.FloatTensor([10000.0])) / d_model)) # (d_model/2,)
58
+ pe[:, 0::2] = torch.sin(position * div_term) # 偶数维
59
+ pe[:, 1::2] = torch.cos(position * div_term) # 奇数维
60
+ pe = pe.unsqueeze(0) # (1, max_len, d_model)
61
+ self.register_buffer('pe', pe)
62
+
63
+ def forward(self, x):
64
+ """
65
+ x: (B, N, d_model)
66
+ """
67
+ x = x + self.pe[:, :x.size(1), :]
68
+ return x
69
+
70
+
71
+ class MHAModel(nn.Module):
72
+ def __init__(self, d_model, max_length=50):
73
+ super(MHAModel, self).__init__()
74
+ self.linear = nn.Linear(in_features=21, out_features=d_model)
75
+ self.pos_encoder = PositionalEncoding(d_model, max_length)
76
+ self.self_attn = nn.MultiheadAttention(d_model, num_heads=8, batch_first=True)
77
+ self.global_pool = nn.AdaptiveAvgPool1d(1)
78
+
79
+ def forward(self, x: torch.Tensor):
80
+ # 线性变换 + 位置编码
81
+ x = self.pos_encoder(self.linear(x)) # [batch, seq_len, d_model]
82
+
83
+ # 正向自注意力
84
+ y, _ = self.self_attn(x, x, x) # [batch, seq_len, d_model]
85
+
86
+ # 反向自注意力
87
+ x_flip = x.flip([-2]) # 沿序列维度翻转
88
+ y_flip, _ = self.self_attn(x_flip, x_flip, x_flip)
89
+ y_flip = y_flip.flip([-2]) # 翻转回原顺序
90
+
91
+ # 拼接正反向结果
92
+ y = torch.cat((y, y_flip), dim=-1) # [batch, seq_len, 2*d_model]
93
+
94
+ # 全局池化
95
+ y = self.global_pool(y.permute(0, 2, 1)) # [batch, 2*d_model, 1]
96
+ return y.squeeze(-1) # [batch, 2*d_model]
97
+
98
+
99
+ class MLAModel(nn.Module):
100
+ def __init__(self, d_model, max_length=50):
101
+ super(MLAModel, self).__init__()
102
+ self.linear = nn.Linear(in_features=21, out_features=d_model)
103
+ self.pos_encoder = PositionalEncoding(d_model, max_length)
104
+ self.MLA = MLA(d_model, n_heads=8, max_len=max_length)
105
+ self.global_pool = nn.AdaptiveAvgPool1d(1)
106
+
107
+ def forward(self, x: torch.Tensor):
108
+ x = self.pos_encoder(self.linear(x))
109
+ y = self.MLA(x)
110
+ y_flip = self.MLA(x.flip([-2])).flip([-2])
111
+ y = torch.cat((y, y_flip), dim=-1)
112
+ y = self.global_pool(y.permute(0, 2, 1)).squeeze(-1)
113
+ return y
114
+
115
+
116
+ class MLA(nn.Module):
117
+ def __init__(self, d_model, n_heads, max_len=50, rope_theta=10000.0):
118
+ super().__init__()
119
+ self.d_model = d_model
120
+ self.n_heads = n_heads
121
+ self.dh = d_model // n_heads
122
+ self.q_proj_dim = d_model // 2
123
+ self.kv_proj_dim = (2*d_model) // 3
124
+
125
+ self.qk_nope_dim = self.dh // 2
126
+ self.qk_rope_dim = self.dh // 2
127
+
128
+ ## Q projections
129
+ # Lora
130
+ self.W_dq = nn.Parameter(0.01*torch.randn((d_model, self.q_proj_dim)))
131
+ self.W_uq = nn.Parameter(0.01*torch.randn((self.q_proj_dim, self.d_model)))
132
+ self.q_layernorm = nn.LayerNorm(self.q_proj_dim)
133
+
134
+ ## KV projections
135
+ # Lora
136
+ self.W_dkv = nn.Parameter(0.01*torch.randn((d_model, self.kv_proj_dim + self.qk_rope_dim)))
137
+ self.W_ukv = nn.Parameter(0.01*torch.randn((self.kv_proj_dim,
138
+ self.d_model + (self.n_heads * self.qk_nope_dim))))
139
+ self.kv_layernorm = nn.LayerNorm(self.kv_proj_dim)
140
+
141
+ # output projection
142
+ self.W_o = nn.Parameter(0.01*torch.randn((d_model, d_model)))
143
+
144
+ # RoPE
145
+ self.max_seq_len = max_len
146
+ self.rope_theta = rope_theta
147
+
148
+ # https://github.com/lucidrains/rotary-embedding-torch/tree/main
149
+ # visualize emb later to make sure it looks ok
150
+ # we do self.dh here instead of self.qk_rope_dim because its better
151
+ freqs = 1.0 / (rope_theta ** (torch.arange(0, self.dh, 2).float() / self.dh))
152
+ emb = torch.outer(torch.arange(self.max_seq_len).float(), freqs)
153
+ cos_cached = emb.cos()[None, None, :, :]
154
+ sin_cached = emb.sin()[None, None, :, :]
155
+
156
+ # https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_buffer
157
+ # This is like a parameter but its a constant so we can use register_buffer
158
+ self.register_buffer("cos_cached", cos_cached)
159
+ self.register_buffer("sin_cached", sin_cached)
160
+
161
+ def apply_rope_x(self, x, cos, sin):
162
+ return (x * cos) + (self.rotate_half(x) * sin)
163
+
164
+ @staticmethod
165
+ def rotate_half(x):
166
+ x1, x2 = x.chunk(2, dim=-1)
167
+ return torch.cat((-x2, x1), dim=-1)
168
+
169
+ def forward(self, x, kv_cache=None, past_length=0):
170
+ B, S, D = x.size()
171
+
172
+ # Q Projections
173
+ compressed_q = x @ self.W_dq
174
+ compressed_q = self.q_layernorm(compressed_q)
175
+ Q = compressed_q @ self.W_uq
176
+ Q = Q.view(B, -1, self.n_heads, self.dh).transpose(1,2)
177
+ Q, Q_for_rope = torch.split(Q, [self.qk_nope_dim, self.qk_rope_dim], dim=-1)
178
+
179
+ # Q Decoupled RoPE
180
+ cos_q = self.cos_cached[:, :, past_length:past_length+S, :self.qk_rope_dim//2].repeat(1, 1, 1, 2)
181
+ sin_q = self.sin_cached[:, :, past_length:past_length+S, :self.qk_rope_dim//2].repeat(1, 1, 1, 2)
182
+ Q_for_rope = self.apply_rope_x(Q_for_rope, cos_q, sin_q)
183
+
184
+ # KV Projections
185
+ if kv_cache is None:
186
+ compressed_kv = x @ self.W_dkv
187
+ KV_for_lora, K_for_rope = torch.split(compressed_kv,
188
+ [self.kv_proj_dim, self.qk_rope_dim],
189
+ dim=-1)
190
+ KV_for_lora = self.kv_layernorm(KV_for_lora)
191
+ else:
192
+ new_kv = x @ self.W_dkv
193
+ compressed_kv = torch.cat([kv_cache, new_kv], dim=1)
194
+ new_kv, new_K_for_rope = torch.split(new_kv,
195
+ [self.kv_proj_dim, self.qk_rope_dim],
196
+ dim=-1)
197
+ old_kv, old_K_for_rope = torch.split(kv_cache,
198
+ [self.kv_proj_dim, self.qk_rope_dim],
199
+ dim=-1)
200
+ new_kv = self.kv_layernorm(new_kv)
201
+ old_kv = self.kv_layernorm(old_kv)
202
+ KV_for_lora = torch.cat([old_kv, new_kv], dim=1)
203
+ K_for_rope = torch.cat([old_K_for_rope, new_K_for_rope], dim=1)
204
+
205
+
206
+ KV = KV_for_lora @ self.W_ukv
207
+ KV = KV.view(B, -1, self.n_heads, self.dh+self.qk_nope_dim).transpose(1,2)
208
+ K, V = torch.split(KV, [self.qk_nope_dim, self.dh], dim=-1)
209
+ S_full = K.size(2)
210
+
211
+ # K Rope
212
+ K_for_rope = K_for_rope.view(B, -1, 1, self.qk_rope_dim).transpose(1,2)
213
+ cos_k = self.cos_cached[:, :, :S_full, :self.qk_rope_dim//2].repeat(1, 1, 1, 2)
214
+ sin_k = self.sin_cached[:, :, :S_full, :self.qk_rope_dim//2].repeat(1, 1, 1, 2)
215
+ K_for_rope = self.apply_rope_x(K_for_rope, cos_k, sin_k)
216
+
217
+ # apply position encoding to each head
218
+ K_for_rope = K_for_rope.repeat(1, self.n_heads, 1, 1)
219
+
220
+ # split into multiple heads
221
+ q_heads = torch.cat([Q, Q_for_rope], dim=-1)
222
+ k_heads = torch.cat([K, K_for_rope], dim=-1)
223
+ v_heads = V # already reshaped before the split
224
+
225
+ # make attention mask
226
+ mask = torch.ones((S,S_full), device=x.device)
227
+ mask = torch.tril(mask, diagonal=past_length)
228
+ mask = mask[None, None, :, :]
229
+
230
+ sq_mask = mask == 1
231
+
232
+ # attention
233
+ x = nn.functional.scaled_dot_product_attention(
234
+ q_heads, k_heads, v_heads,
235
+ attn_mask=sq_mask
236
+ )
237
+
238
+ x = x.transpose(1, 2).reshape(B, S, D)
239
+
240
+ # apply projection
241
+ x = x @ self.W_o.T
242
+
243
+ return x
244
+
245
+
246
+ class DMutaPeptide(nn.Module):
247
+ def __init__(self, q_encoder='lstm', classes=1, channels=128, dir=False, gf=False, fusion='mlp', non_siamese=False):
248
+ """
249
+ 参数:
250
+ q_encoder: 使用的编码器类型,支持 'lstm', 'mamba', 'mla', 'mha'
251
+ classes: 输出类别数
252
+ channels: 通道数量,影响隐藏状态维度
253
+ dir: 是否使用 DIR 模块
254
+ fusion: 融合方法,可选 'mlp'(默认,直接拼接)或 'att'(使用 attention 融合)
255
+ """
256
+ super().__init__()
257
+ self.classes = classes
258
+ self.DIR = dir
259
+ self.gf = gf
260
+ self.fusion_method = fusion # 融合方式
261
+ self.non_siamese = non_siamese
262
+ # 拼接后维度设定为 channels * 4
263
+ final_dim = channels * 4
264
+
265
+
266
+ # 初始化编码器
267
+ if q_encoder == 'lstm':
268
+ self.q_encoder = nn.LSTM(
269
+ input_size=21,
270
+ hidden_size=channels,
271
+ num_layers=2,
272
+ batch_first=True, # 输入和输出均以 (batch, time_step, input_size) 表示
273
+ dropout=0.1,
274
+ bidirectional=True
275
+ )
276
+ elif q_encoder == 'gru':
277
+ self.q_encoder = nn.GRU(
278
+ input_size=21,
279
+ hidden_size=channels,
280
+ num_layers=2,
281
+ batch_first=True, # 输入和输出均以 (batch, time_step, input_size) 表示
282
+ dropout=0.1,
283
+ bidirectional=True
284
+ )
285
+ elif q_encoder == 'mamba':
286
+ self.q_encoder = MambaModel(channels, 30)
287
+ elif q_encoder == 'mla':
288
+ self.q_encoder = MLAModel(channels, 30)
289
+ elif q_encoder == 'mha':
290
+ self.q_encoder = MHAModel(channels, 30)
291
+ else:
292
+ raise NotImplementedError
293
+
294
+ if non_siamese:
295
+ self.q_encoder_2 = deepcopy(self.q_encoder)
296
+ else:
297
+ self.q_encoder_2 = self.q_encoder
298
+
299
+ if self.fusion_method == 'diff':
300
+ final_dim //= 2
301
+
302
+ if gf:
303
+ self.g_encoder = MLP(1024, [512, 256, 128], channels * 2, dropout_rate=0.3)
304
+ final_dim += channels * 2
305
+
306
+ # 如果 fusion 模式为 'att' ,则使用 MultiheadAttention 对两个向量进行融合
307
+ if self.fusion_method == 'att':
308
+ # 假设每个编码器输出的向量维度为 final_dim // 2
309
+ embed_dim = channels * 2
310
+ self.attn = nn.MultiheadAttention(embed_dim=embed_dim, num_heads=4 if gf else 2, batch_first=True)
311
+
312
+ if self.DIR:
313
+ self.FDS = FDS(final_dim)
314
+
315
+ self.fc = nn.Sequential(
316
+ nn.Linear(final_dim, 128),
317
+ nn.Mish(),
318
+ nn.Dropout(0.3),
319
+ nn.Linear(128, 64),
320
+ nn.Mish(),
321
+ nn.Dropout(0.3),
322
+ nn.Linear(64, self.classes)
323
+ )
324
+
325
+ def norm(self, x, dim=-1, p=2):
326
+ return F.normalize(x, p=p, dim=dim)
327
+
328
+ def forward(self, x, labels=None, epoch=0):
329
+ if self.gf:
330
+ seq1, seq2, gf = x
331
+ else:
332
+ seq1, seq2 = x
333
+ fusion = []
334
+
335
+ # 获取两个序列的编码结果
336
+ if self.q_encoder.__class__.__name__ in ['LSTM', 'GRU']:
337
+ # 对于 LSTM, 取序列最后时刻的输出,其维度应为 channels*2 (bidirectional)
338
+ fusion.append(self.norm(self.q_encoder(seq1)[0][:, -1, :]))
339
+ fusion.append(self.norm(self.q_encoder_2(seq2)[0][:, -1, :]))
340
+ # elif self.q_encoder.__class__.__name__ in ['MambaModel', 'MLAModel', 'MHAModel']:
341
+ else:
342
+ fusion.append(self.norm(self.q_encoder(seq1)))
343
+ fusion.append(self.norm(self.q_encoder_2(seq2)))
344
+
345
+ if self.gf:
346
+ fusion.append(self.g_encoder(gf))
347
+
348
+ # 根据 fusion_method 决定融合方式
349
+ if self.fusion_method == 'mlp':
350
+ # 维持原有行为:拼接两个向量
351
+ fusion = torch.cat(fusion, dim=-1)
352
+ elif self.fusion_method == 'diff':
353
+ fusion = torch.cat([fusion[1] - fusion[0]] + fusion[2:], dim=-1)
354
+ elif self.fusion_method == 'att':
355
+ # 使用 attention 融合:
356
+ # 先将两个向量堆叠成“tokens”,形状:(batch, 2, embed_dim)
357
+ tokens = torch.stack(fusion, dim=1) # embed_dim 应该为 final_dim//2
358
+ # 利用 MultiheadAttention 进行自注意力计算
359
+ # 注意:因为采用 batch_first=True,所以输入形状为 (batch, seq_len, embed_dim)
360
+ attn_output, _ = self.attn(tokens, tokens, tokens)
361
+ # 将 attention 输出展平,得到形状 (batch, 2 * embed_dim),即 (batch, final_dim)
362
+ fusion = attn_output.reshape(attn_output.size(0), -1)
363
+ else:
364
+ raise ValueError("Invalid fusion method: choose either 'mse' or 'att'.")
365
+
366
+ # 如果启用 DIR 模块,保留传入 FDS 前的特征表示
367
+ if self.DIR:
368
+ features = fusion
369
+ fusion = self.FDS.smooth(fusion, labels, epoch)
370
+
371
+ pred = self.fc(fusion).squeeze(-1)
372
+
373
+ if self.DIR:
374
+ return pred, features
375
+ else:
376
+ return pred
377
+
378
+
379
+ class CNNEncoder(nn.Module):
380
+ def __init__(self, feature_dim=256, base_channels=16, in_dim=3):
381
+ """
382
+ feature_dim: 输出的一维特征向量维度
383
+ base_channels: 基础卷积模块的通道数
384
+ """
385
+ super(CNNEncoder, self).__init__()
386
+
387
+ # 卷积层
388
+ self.conv = nn.Sequential(
389
+ nn.Conv2d(in_dim, base_channels, kernel_size=3, stride=1, padding=1),
390
+ nn.BatchNorm2d(base_channels),
391
+ # nn.ReLU(inplace=True),
392
+ nn.Mish(inplace=True),
393
+ nn.MaxPool2d(kernel_size=2),
394
+
395
+ nn.Conv2d(base_channels, base_channels * 2, kernel_size=3, stride=1, padding=1),
396
+ nn.BatchNorm2d(base_channels * 2),
397
+ # nn.ReLU(inplace=True),
398
+ nn.Mish(inplace=True),
399
+ nn.MaxPool2d(kernel_size=2),
400
+
401
+ nn.Conv2d(base_channels * 2, base_channels * 4, kernel_size=3, stride=1, padding=1),
402
+ nn.BatchNorm2d(base_channels * 4),
403
+ # nn.ReLU(inplace=True),
404
+ nn.Mish(inplace=True),
405
+ nn.MaxPool2d(kernel_size=2)
406
+ )
407
+
408
+ # 自适应池化,得到固定尺寸(1x1)的特征图
409
+ self.adaptive_pool = nn.AdaptiveAvgPool2d((1, 1))
410
+
411
+ # 全连接层将卷积特征转换为一维特征向量
412
+ self.fc = nn.Linear(base_channels * 4, feature_dim)
413
+
414
+ def forward(self, img):
415
+ """
416
+ img: [B, 3, 1024, 1024] 输入的 RGB 图像张量
417
+ """
418
+ # 融合后进一步进行卷积、池化处理
419
+ fused_conv = self.conv(img)
420
+ pooled = self.adaptive_pool(fused_conv) # [B, base_channels*4, 1, 1]
421
+
422
+ # 展平并经过全连接层输出特征向量
423
+ flattened = pooled.view(pooled.size(0), -1) # [B, base_channels*4]
424
+ feature_vector = self.fc(flattened) # [B, feature_dim]
425
+ return feature_vector
426
+
427
+
428
+ class DMutaPeptideCNN(nn.Module):
429
+ def __init__(self, q_encoder='cnn', classes=1, channels=16, dir=False, gf=False, side_enc=None, fusion='mlp', non_siamese=False):
430
+ """
431
+ 参数:
432
+ q_encoder: 使用的编码器类型,支持 'lstm', 'mamba', 'mla', 'mha'
433
+ classes: 输出类别数
434
+ channels: 通道数量,影响隐藏状态维度
435
+ dir: 是否使用 DIR 模块
436
+ fusion: 融合方法,可选 'mlp'(默认,直接拼接)或 'att'(使用 attention 融合)
437
+ """
438
+ super().__init__()
439
+ self.classes = classes
440
+ self.DIR = dir
441
+ self.gf = gf
442
+ self.fusion_method = fusion # 融合方式
443
+ self.non_siamese = non_siamese
444
+ # 拼接后维度设定为 channels * 4
445
+ vector_dim = 512
446
+ final_dim = vector_dim * 2
447
+
448
+ # 初始化编码器
449
+ if q_encoder == 'cnn':
450
+ self.q_encoder = CNNEncoder(feature_dim=vector_dim, base_channels=channels)
451
+ elif q_encoder == 'rn18':
452
+ self.q_encoder = resnet18_backbone(pretrained=True)
453
+ if non_siamese:
454
+ self.q_encoder_2 = deepcopy(self.q_encoder)
455
+ else:
456
+ self.q_encoder_2 = self.q_encoder
457
+
458
+ if side_enc:
459
+ self.side_enc = True
460
+ if side_enc == 'lstm':
461
+ self.side_encoder = nn.LSTM(
462
+ input_size=21,
463
+ hidden_size=256,
464
+ num_layers=2,
465
+ batch_first=True, # 输入和输出均以 (batch, time_step, input_size) 表示
466
+ dropout=0.1,
467
+ bidirectional=True
468
+ )
469
+ elif side_enc == 'mamba':
470
+ self.side_encoder = MambaModel(256, 30)
471
+ else:
472
+ raise NotImplementedError
473
+
474
+ final_dim += vector_dim * 2
475
+
476
+ if non_siamese:
477
+ self.side_encoder_2 = deepcopy(self.side_encoder)
478
+ else:
479
+ self.side_encoder_2 = self.side_encoder
480
+ else:
481
+ self.side_enc = False
482
+
483
+ if self.fusion_method == 'diff':
484
+ final_dim //= 2
485
+
486
+ if gf:
487
+ self.g_encoder = MLP(1024, [512, 256, 128], vector_dim, dropout_rate=0.3)
488
+ final_dim += vector_dim
489
+
490
+ # 如果 fusion 模式为 'att' ,则使用 MultiheadAttention 对两个向量进行融合
491
+ if self.fusion_method == 'att':
492
+ # 假设每个编码器输出的向量维度为 final_dim // 2
493
+ embed_dim = vector_dim
494
+ self.attn = nn.MultiheadAttention(embed_dim=embed_dim, num_heads=4 if gf else 2, batch_first=True)
495
+
496
+ if self.DIR:
497
+ self.FDS = FDS(final_dim)
498
+
499
+ self.fc = nn.Sequential(
500
+ nn.Linear(final_dim, 128),
501
+ nn.Mish(),
502
+ nn.Dropout(0.3),
503
+ nn.Linear(128, 64),
504
+ nn.Mish(),
505
+ nn.Dropout(0.3),
506
+ nn.Linear(64, self.classes)
507
+ )
508
+
509
+ def norm(self, x, dim=-1, p=2):
510
+ return F.normalize(x, p=p, dim=dim)
511
+
512
+ def forward(self, x, labels=None, epoch=0):
513
+ if self.gf:
514
+ seq1, seq2, gf = x
515
+ else:
516
+ seq1, seq2 = x
517
+
518
+ if self.side_enc:
519
+ seq1_seq = seq1[1]
520
+ seq1 = seq1[0]
521
+ seq2_seq = seq2[1]
522
+ seq2 = seq2[0]
523
+
524
+ fusion = []
525
+
526
+ # 获取两个序列的编码结果
527
+ fusion.append(self.norm(self.q_encoder(seq1)))
528
+ fusion.append(self.norm(self.q_encoder_2(seq2)))
529
+ if self.side_enc:
530
+ if self.side_encoder.__class__.__name__ == 'MambaModel':
531
+ fusion.append(self.norm(self.side_encoder(seq1_seq)))
532
+ fusion.append(self.norm(self.side_encoder_2(seq2_seq)))
533
+ # elif self.side_encoder.__class__.__name__ == 'LSTM':
534
+ else:
535
+ fusion.append(self.norm(self.side_encoder(seq1_seq)[0][:, -1, :]))
536
+ fusion.append(self.norm(self.side_encoder_2(seq2_seq)[0][:, -1, :]))
537
+
538
+ if self.gf:
539
+ fusion.append(self.g_encoder(gf))
540
+
541
+ # 根据 fusion_method 决定融合方式
542
+ if self.fusion_method == 'mlp':
543
+ # 维持原有行为:拼接两个向量
544
+ fusion = torch.cat(fusion, dim=-1)
545
+ elif self.fusion_method == 'diff':
546
+ if not self.side_enc:
547
+ fusion = torch.cat([fusion[1] - fusion[0]] + fusion[2:], dim=-1)
548
+ else:
549
+ fusion = torch.cat([fusion[1] - fusion[0], fusion[3] - fusion[2]] + fusion[4:], dim=-1)
550
+ elif self.fusion_method == 'att':
551
+ # 使用 attention 融合:
552
+ # 先将两个向量堆叠成“tokens”,形状:(batch, 2, embed_dim)
553
+ tokens = torch.stack(fusion, dim=1) # embed_dim 应该为 final_dim//2
554
+ # 利用 MultiheadAttention 进行自注意力计算
555
+ # 注意:因为采用 batch_first=True,所以输入形状为 (batch, seq_len, embed_dim)
556
+ attn_output, _ = self.attn(tokens, tokens, tokens)
557
+ # 将 attention 输出展平,得到形状 (batch, 2 * embed_dim),即 (batch, final_dim)
558
+ fusion = attn_output.reshape(attn_output.size(0), -1)
559
+ else:
560
+ raise ValueError("Invalid fusion method: choose either 'mse' or 'att'.")
561
+
562
+ # 如果启用 DIR 模块,保留传入 FDS 前的特征表示
563
+ if self.DIR:
564
+ features = fusion
565
+ fusion = self.FDS.smooth(fusion, labels, epoch)
566
+
567
+ pred = self.fc(fusion).squeeze(-1)
568
+
569
+ if self.DIR:
570
+ return pred, features
571
+ else:
572
+ return pred
573
+
574
+
575
+ def resnet18_backbone(pretrained=False):
576
+ weights = None
577
+ if pretrained:
578
+ weights = 'IMAGENET1K_V1'
579
+ model = resnet18(weights=weights, progress=False)
580
+ return torch.nn.Sequential(*list(model.children())[:-1], nn.Flatten())
581
+
582
+
583
+ if __name__ == "__main__":
584
+ model = resnet18_backbone(pretrained=True)
585
+ print(model)
586
+ pass
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ mamba_ssm==2.2.4
2
+ numpy==1.26.3
3
+ pandas==2.1.4
4
+ rdkit==2024.3.5
5
+ scikit_learn==1.4.1.post1
6
+ scipy==1.13.0
7
+ torch==2.2.0
8
+ torchmetrics==1.3.1
9
+ torchvision==0.17.0