chq1155 commited on
Commit
021edb3
·
verified ·
1 Parent(s): ccbe063

Add Colab demo notebook + KaiB demo data

Browse files
examples/SF_Cluster_Demo.ipynb ADDED
@@ -0,0 +1,290 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# SF-Cluster \u2014 frustration-guided MSA subset builder\n",
8
+ "\n",
9
+ "**What this notebook does.** Installs the open-source `sf_cluster` package, downloads a small KaiB demo bundle (a 364-sequence MSA + a per-residue Frustration Index matrix from FrustrAI-Seq), and builds two flavours of stratified MSA subsets (`mosaic` and `gradient`) using the contrast-HV/LV score. Everything runs on CPU in roughly two minutes.\n",
10
+ "\n",
11
+ "**Who it is for.** Biologists who want reproducible, frustration-stratified MSA slices to feed into an AF-Cluster-style multi-conformer prediction loop.\n",
12
+ "\n",
13
+ "**What you do next.** Take the 12 mosaic or 12 gradient A3M subsets emitted at the end of this notebook, run each through ColabFold AF2, and aggregate per the SF-Cluster \u00a79.1 hit criterion.\n",
14
+ "\n",
15
+ "---\n",
16
+ "\n",
17
+ "> ## LIMITATIONS \u2014 please read\n",
18
+ "> A controlled comparison on the Main-21 cases shows that **uniform random subsampling performs equivalently on most cases**. The frustration signal is **not** the active ingredient here \u2014 depth reduction is. See the OSS README for the full ablation.\n",
19
+ ">\n",
20
+ "> Use this tool when you want **stratified, reproducible MSA subsets** with a clear provenance story \u2014 not as a guaranteed conformational diversity engine. It is a research baseline, not a turnkey accuracy improvement."
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "markdown",
25
+ "metadata": {},
26
+ "source": [
27
+ "## 1. Install the package\n",
28
+ "\n",
29
+ "Pulls the OSS release from Hugging Face. Pure-Python; only depends on `numpy` and `scipy`."
30
+ ]
31
+ },
32
+ {
33
+ "cell_type": "code",
34
+ "metadata": {},
35
+ "execution_count": null,
36
+ "outputs": [],
37
+ "source": [
38
+ "!pip install -q git+https://huggingface.co/ChatterjeeLab/SF-Cluster"
39
+ ]
40
+ },
41
+ {
42
+ "cell_type": "markdown",
43
+ "metadata": {},
44
+ "source": [
45
+ "## 2. Download the KaiB demo bundle\n",
46
+ "\n",
47
+ "Three files, ~200 KB total: a filtered MSA, a per-residue FI matrix from FrustrAI-Seq, and the parallel sequence-ID list."
48
+ ]
49
+ },
50
+ {
51
+ "cell_type": "code",
52
+ "metadata": {},
53
+ "execution_count": null,
54
+ "outputs": [],
55
+ "source": [
56
+ "from huggingface_hub import hf_hub_download\n",
57
+ "from pathlib import Path\n",
58
+ "import os\n",
59
+ "\n",
60
+ "REPO = 'ChatterjeeLab/SF-Cluster'\n",
61
+ "FILES = ['examples/data/KaiB_filtered.a3m',\n",
62
+ " 'examples/data/KaiB_fi_matrix.npy',\n",
63
+ " 'examples/data/KaiB_seq_ids.txt']\n",
64
+ "\n",
65
+ "local = {}\n",
66
+ "for fname in FILES:\n",
67
+ " p = hf_hub_download(repo_id=REPO, filename=fname, repo_type='model')\n",
68
+ " local[fname] = p\n",
69
+ " print(f'{fname:50s} {os.path.getsize(p)/1024:7.1f} KB -> {p}')\n",
70
+ "\n",
71
+ "A3M = local['examples/data/KaiB_filtered.a3m']\n",
72
+ "FI = local['examples/data/KaiB_fi_matrix.npy']\n",
73
+ "IDS = local['examples/data/KaiB_seq_ids.txt']"
74
+ ]
75
+ },
76
+ {
77
+ "cell_type": "markdown",
78
+ "metadata": {},
79
+ "source": [
80
+ "## 3. Build the pool and stratified subsets\n",
81
+ "\n",
82
+ "The `pool_msa` call ties the MSA records to their per-residue FI vectors. `contrast_hvlv` computes the per-sequence high-variance / low-variance FI contrast (see README for the formula). `method_mosaic` and `method_gradient` then deterministically draw 12 subsets of 32 sequences each."
83
+ ]
84
+ },
85
+ {
86
+ "cell_type": "code",
87
+ "metadata": {},
88
+ "execution_count": null,
89
+ "outputs": [],
90
+ "source": [
91
+ "import numpy as np\n",
92
+ "from sf_cluster import pool_msa, contrast_hvlv, method_mosaic, method_gradient\n",
93
+ "\n",
94
+ "pool = pool_msa(A3M, FI)\n",
95
+ "print(f'pool: N_seq={pool.n_seq}, L={pool.n_cols}, query={pool.headers[0]!r}')\n",
96
+ "\n",
97
+ "score = contrast_hvlv(pool.fi_matrix)\n",
98
+ "print(f'contrast_hvlv: shape={score.shape}, '\n",
99
+ " f'min={score.min():+.3f}, median={np.median(score):+.3f}, max={score.max():+.3f}')\n",
100
+ "\n",
101
+ "mosaic_subsets = method_mosaic(score)\n",
102
+ "gradient_subsets = method_gradient(score)\n",
103
+ "\n",
104
+ "def summarize(name, subsets):\n",
105
+ " print(f'\\n[{name}] {len(subsets)} subsets')\n",
106
+ " print(f'{\"subset_id\":>10} {\"n_seqs\":>7} {\"mean_contrast\":>14}')\n",
107
+ " for i, sub in enumerate(subsets):\n",
108
+ " m = float(np.mean(score[sub]))\n",
109
+ " print(f'{i:>10d} {len(sub):>7d} {m:>+14.4f}')\n",
110
+ "\n",
111
+ "summarize('mosaic', mosaic_subsets)\n",
112
+ "summarize('gradient', gradient_subsets)"
113
+ ]
114
+ },
115
+ {
116
+ "cell_type": "markdown",
117
+ "metadata": {},
118
+ "source": [
119
+ "## 4. Visualise\n",
120
+ "\n",
121
+ "Three plots: the contrast score distribution with tercile / quartile boundaries marked, the per-subset mean contrast score for both methods, and the pairwise sequence-overlap heatmap between mosaic and gradient subsets."
122
+ ]
123
+ },
124
+ {
125
+ "cell_type": "code",
126
+ "metadata": {},
127
+ "execution_count": null,
128
+ "outputs": [],
129
+ "source": [
130
+ "import matplotlib.pyplot as plt\n",
131
+ "import numpy as np\n",
132
+ "\n",
133
+ "fig, axes = plt.subplots(1, 3, figsize=(15, 4))\n",
134
+ "\n",
135
+ "# (a) score histogram with tercile + quartile lines\n",
136
+ "ax = axes[0]\n",
137
+ "ax.hist(score, bins=40, color='#4477AA', edgecolor='white', alpha=0.85)\n",
138
+ "sorted_s = np.sort(score)\n",
139
+ "N = len(sorted_s)\n",
140
+ "terciles = [sorted_s[N//3], sorted_s[2*N//3]]\n",
141
+ "quartiles = [sorted_s[N//4], sorted_s[N//2], sorted_s[3*N//4]]\n",
142
+ "for t in terciles:\n",
143
+ " ax.axvline(t, color='#CC6677', linestyle='--', label='tercile (mosaic)' if t==terciles[0] else None)\n",
144
+ "for q in quartiles:\n",
145
+ " ax.axvline(q, color='#117733', linestyle=':', label='quartile (gradient)' if q==quartiles[0] else None)\n",
146
+ "ax.set_xlabel('contrast_hvlv')\n",
147
+ "ax.set_ylabel('count')\n",
148
+ "ax.set_title('(a) per-sequence contrast score')\n",
149
+ "ax.legend(fontsize=8)\n",
150
+ "\n",
151
+ "# (b) per-subset mean contrast\n",
152
+ "ax = axes[1]\n",
153
+ "x = np.arange(12)\n",
154
+ "m_means = np.array([score[s].mean() for s in mosaic_subsets])\n",
155
+ "g_means = np.array([score[s].mean() for s in gradient_subsets])\n",
156
+ "w = 0.4\n",
157
+ "ax.bar(x - w/2, m_means, width=w, label='mosaic', color='#4477AA')\n",
158
+ "ax.bar(x + w/2, g_means, width=w, label='gradient', color='#CC6677')\n",
159
+ "ax.axhline(0, color='black', lw=0.5)\n",
160
+ "ax.set_xlabel('subset id')\n",
161
+ "ax.set_ylabel('mean contrast_hvlv')\n",
162
+ "ax.set_title('(b) per-subset mean score')\n",
163
+ "ax.legend(fontsize=8)\n",
164
+ "\n",
165
+ "# (c) pairwise overlap heatmap (mosaic x gradient)\n",
166
+ "ax = axes[2]\n",
167
+ "M = np.zeros((12, 12), dtype=int)\n",
168
+ "for i, si in enumerate(mosaic_subsets):\n",
169
+ " set_i = set(si)\n",
170
+ " for j, sj in enumerate(gradient_subsets):\n",
171
+ " M[i, j] = len(set_i & set(sj))\n",
172
+ "im = ax.imshow(M, cmap='magma', aspect='auto')\n",
173
+ "ax.set_xlabel('gradient subset')\n",
174
+ "ax.set_ylabel('mosaic subset')\n",
175
+ "ax.set_title('(c) sequence overlap (count)')\n",
176
+ "plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)\n",
177
+ "\n",
178
+ "plt.tight_layout()\n",
179
+ "plt.show()"
180
+ ]
181
+ },
182
+ {
183
+ "cell_type": "markdown",
184
+ "metadata": {},
185
+ "source": [
186
+ "## 5. Write subsets to A3M files\n",
187
+ "\n",
188
+ "Each subset is written as a ColabFold-compatible A3M with the query as the first record. Downstream you would feed one A3M per AF2 run."
189
+ ]
190
+ },
191
+ {
192
+ "cell_type": "code",
193
+ "metadata": {},
194
+ "execution_count": null,
195
+ "outputs": [],
196
+ "source": [
197
+ "from pathlib import Path\n",
198
+ "from sf_cluster import build_subsets\n",
199
+ "\n",
200
+ "out_mosaic = Path('./subsets_mosaic')\n",
201
+ "out_gradient = Path('./subsets_gradient')\n",
202
+ "\n",
203
+ "_, _, _, mosaic_paths = build_subsets(A3M, FI, method='mosaic', out_dir=out_mosaic)\n",
204
+ "_, _, _, gradient_paths = build_subsets(A3M, FI, method='gradient', out_dir=out_gradient)\n",
205
+ "\n",
206
+ "print(f'mosaic -> {len(mosaic_paths):2d} files in {out_mosaic}/')\n",
207
+ "print(f'gradient -> {len(gradient_paths):2d} files in {out_gradient}/')\n",
208
+ "\n",
209
+ "sample = mosaic_paths[0]\n",
210
+ "print(f'\\nFirst 3 records of {sample.name}:')\n",
211
+ "with open(sample) as f:\n",
212
+ " lines = f.read().splitlines()\n",
213
+ "shown = 0\n",
214
+ "i = 0\n",
215
+ "while i < len(lines) and shown < 3:\n",
216
+ " if lines[i].startswith('>'):\n",
217
+ " print(' ', lines[i])\n",
218
+ " if i+1 < len(lines):\n",
219
+ " seq = lines[i+1]\n",
220
+ " print(' ', seq[:80] + ('...' if len(seq) > 80 else ''))\n",
221
+ " shown += 1\n",
222
+ " i += 2\n",
223
+ " else:\n",
224
+ " i += 1"
225
+ ]
226
+ },
227
+ {
228
+ "cell_type": "markdown",
229
+ "metadata": {},
230
+ "source": [
231
+ "## 6. Bring your own protein\n",
232
+ "\n",
233
+ "The demo bundle is tiny and CPU-friendly. For your own target:\n",
234
+ "\n",
235
+ "1. **Build an MSA.** Use the official [ColabFold notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) (`mmseqs2_uniref_env` mode) to generate a deep `.a3m`, then filter it (e.g. 25%-gap filter) to obtain `your_msa.a3m`.\n",
236
+ "2. **Compute the FI matrix.** Run [FrustrAI-Seq](https://huggingface.co/leuschj/FrustrAI-Seq) on `your_msa.a3m` to obtain a per-residue Frustration Index matrix `your_fi.npy` of shape `(N_seq, L)`. **A GPU is required for this step.** See the FrustrAI-Seq model card for inference details.\n",
237
+ "3. **Re-run the cells above.** Just point `A3M` and `FI` at your files and re-execute from \u00a73 onward. The package will raise a `ValueError` if `N_seq` disagrees between the two."
238
+ ]
239
+ },
240
+ {
241
+ "cell_type": "markdown",
242
+ "metadata": {},
243
+ "source": [
244
+ "## 7. Next: run AF2 on each subset\n",
245
+ "\n",
246
+ "Feed each subset A3M into the official [ColabFold AlphaFold2 notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) \u2014 one subset per AF2 run. Aggregate per the SF-Cluster \u00a79.1 hit criterion:\n",
247
+ "\n",
248
+ "- C\u03b1 RMSD \u2264 3.0 \u00c5 on the `common_core` residues vs. each reference state,\n",
249
+ "- mean pLDDT \u2265 70 overall,\n",
250
+ "- mean pLDDT \u2265 70 inside the `switch_region`.\n",
251
+ "\n",
252
+ "**Compute budget disclosure (per `docs/protocol_lock.md`).** The SF-Cluster paper locks AF2 at 3 recycles \u00d7 4 seeds \u00d7 5 models for KaiB / Mpt53, and 0 recycles \u00d7 8 seeds \u00d7 5 models for the GA/GB cases. The GA/GB row was further trimmed to **4 subsets per case** during refinement to stay within the compute envelope. Global seed: `20260422`. Per-case seed = `hash(case_name) mod 2^31`; per-subset seed = `base_seed + subset_index`. All inference uses `templates=OFF`, `relax=OFF`, `dropout=OFF`."
253
+ ]
254
+ },
255
+ {
256
+ "cell_type": "markdown",
257
+ "metadata": {},
258
+ "source": [
259
+ "## 8. Citation, license, companion repo\n",
260
+ "\n",
261
+ "```bibtex\n",
262
+ "@misc{sf_cluster_2026,\n",
263
+ " title = {SF-Cluster: frustration-guided MSA subset builders for AF2 multi-conformer prediction},\n",
264
+ " author = {Cao, Hanqun and {Chatterjee Lab}},\n",
265
+ " year = {2026},\n",
266
+ " note = {Workshop release. Companion code: https://huggingface.co/ChatterjeeLab/SF-Cluster},\n",
267
+ " url = {https://huggingface.co/ChatterjeeLab/SF-Cluster}\n",
268
+ "}\n",
269
+ "```\n",
270
+ "\n",
271
+ "**License:** MIT. See `LICENSE` in the OSS repo.\n",
272
+ "\n",
273
+ "**Companion private dev repo.** Full Phase II benchmark code (DBSCAN baselines, all four arms, evaluation harness, region partition ablation) lives in the SF-Cluster private dev repository. The OSS release here is a slim, dependency-light subset \u2014 only the `mosaic` and `gradient` arms and their scoring function \u2014 intended for reuse, not full reproduction of the benchmark."
274
+ ]
275
+ }
276
+ ],
277
+ "metadata": {
278
+ "kernelspec": {
279
+ "display_name": "Python 3",
280
+ "language": "python",
281
+ "name": "python3"
282
+ },
283
+ "language_info": {
284
+ "name": "python",
285
+ "version": "3.10"
286
+ }
287
+ },
288
+ "nbformat": 4,
289
+ "nbformat_minor": 5
290
+ }
examples/data/KaiB_fi_matrix.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d7db0c13d574732a3465d209e6d542ce9737105c4e586a1fd53c976c2b79cfd
3
+ size 132624
examples/data/KaiB_filtered.a3m ADDED
@@ -0,0 +1,728 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ >101
2
+ RKTYVLKLYVAGNTPNSVRALKTLNNILEKEFKGVYALKVIDVLKNPQLAEEDKILATPTLAKVLPPPVRRIIGDLSNREKVLIGLDLLYE
3
+ >MGYP000886600007 114 0.454 1.369E-26 2 89 91 23 110 116
4
+ --PYVLRLYIAGKTERSMHAIEQIRSVLEQRLPGRYELEVIDVHQHPEMVRADQVIAVPTLVKKLPEPLRKIIGSMADQDRLLIGLDLLP-
5
+ >UniRef100_A0A971TK21 109 0.459 4.603E-25 2 88 91 23 109 116
6
+ --PYVLRLYIAGKTERSMHAIEQIRSVLEQRLPGRYELEVIDVHQHPEMVRADQVIAVPTLVKKLPEPLRKIIGSMADQDRLLIGLDLL--
7
+ >ERR1044071_8151622 121 0.516 4.579E-29 1 89 91 15 103 111
8
+ -EEWILRLYVAGHSARSAAALRNLTMICEEHLAGRYRIELIDLLKQPQLARGDQIVAVPALVPHLPPPMKKIIGDLSNEERVLVGLDLLP-
9
+ >UniRef100_A0A1F2SDQ0 122 0.597 1.827E-29 1 87 91 15 101 106
10
+ -KAYVLRLYVAGQTPKSVLAFTNLKQICEDHLQGRYEIEIIDLLKNPQLARGDQILAVPTLVRRLPEPIKKIIGDLSNTERVLVGLDL---
11
+ >SRR5580692_7392111 122 0.483 2.431E-29 1 89 91 38 126 146
12
+ -PDYILRLYIAGATRQSATAIQNIRSICEERLRGRYELEVVDVYQEPAAAREDQVLALPTLIKRLPLPLRQLIGDLSNTKRVLLGLDLKP-
13
+ >K9S6Z6 113 0.505 2.579E-26 3 89 91 65 151 153
14
+ ---YCLRLYIAGGTSRSMSALQRLKEICETYLQGRYELEVIDVYQASPAVLTDNVVAIPTLIKQLPLPLRRVIGDLSDTEKVLLGLDLVP-
15
+ >SRR3954470_14000739 130 0.510 3.149E-32 2 90 91 13 102 125
16
+ --RYVLRLYVTGSTPRSSRAIQNIRAICEEHLRGRYDLEVIDIHQQPVLARGEQIIAAPTLIKTLPAPLRKVVGDLSNTERVLMGLDLrPAE
17
+ >UniRef100_UPI0018DCCDB9 121 0.528 3.442E-29 0 88 91 12 100 116
18
+ RDHWTLRLYVAGLTPRSITAFGNLKRLCETYLAGKYTIEVIDLVDHPERAREDQILAIPTLVRKLPEPVRRIIGDLANTERVLVGLELL--
19
+ >SRR5688500_3073545 124 0.528 4.991E-30 1 87 91 7 93 101
20
+ -ETFRLRLYIAGQTPRSVGALGNLKKICEEHLQGRYELEIIDLMQNPGLARGDQILAVPTLVRRLPEPIKKIIGDLANSERVIVGLDL---
21
+ >MGYP000149628109 121 0.500 3.336E-29 3 90 91 33 120 125
22
+ ---YHLRLFVTGSTLRSQQAIQNLRQICEEHLQGRYKVEVIDVSKDPAQARQHQILAVPTLLKELPPPFRKIVGDLSEKEKVLEGLDIQPQ
23
+ >3740|scaffold08918_4|-3862|00 120 0.563 1.184E-28 2 88 91 23 109 145
24
+ --KWVLRLYVAGQTPKAIAAFNNLKLICEEQLKGIYHIEVIDLLKKPQLARDNQILAVPTLVRKLPLPVKNIIGDLSNTERVLVGLDLI--
25
+ >SRR6187401_1816177 121 0.494 3.336E-29 1 87 91 130 216 223
26
+ -ERWVLKLYVAGQTARSAAALENLKAICDGHLGGKYTIEVIDLAQNPRLARTDQIVAVPTLVRKVPEPMRKIIGDLSNQQRVIVGLRL---
27
+ >SRR5512147_2964451 121 0.471 3.336E-29 3 90 91 82 170 172
28
+ ---YLLRLFVAGTTPRSARAIQNIRAICEERLHGSFALEVVDIYQHPEQAKPEQIIVAPTLVKELPLPVRKLIGDLSDKERVLVGLDIvPRE
29
+ >26123|scaffold_438712_c1_1|+1|10 123 0.505 6.850E-30 1 87 91 48 134 139
30
+ -ENWDLRLYVAGESERSRLAIRNLKKICETHLHASYTIEVIDLLKNPGLARGDQIVAVPTLVRKLPQPMRKIIGDLSNEDRVLVGLDL---
31
+ >12684|Ga0207652_11722284_1|+1|10 125 0.494 1.406E-30 1 90 91 60 150 151
32
+ -ERYILKLYVTGLTTRSARAIENLQVLCQKHLPGRYELQVIDVYQQPELARTEQVVAIPTLIKKLPLPLRRLIGDMSDEERVLVGLDiLPHE
33
+ >SRR5580704_16544830 124 0.453 4.991E-30 3 88 91 25 110 128
34
+ ---YILRLYITGFSPRSARAISNIRKICEAHLEGRYDLEVVDISQEPALAQSEQILAAPTLIKKWPLPARRFIGDMSQSDRILLGLDLP--
35
+ >UniRef100_A0A3N5XMK1 120 0.528 1.222E-28 1 87 91 17 103 121
36
+ -ERYVLRLYTTGMTPRSMRAVESIKAICEEHLKGRYELEIIDIHEQPVLARGDQIIAAPTLIKRLPEPLRRLIGDLSDSERVLLGLDL---
37
+ >UniRef100_A0A7W0IU42 121 0.627 3.442E-29 3 88 91 24 109 113
38
+ ---FVLRLYVAGQTPKSMAAFANLKKICEEHLAGQYQIEVIDLLENPHLARGDQILAIPTLVKKLPPPVRKIIGDLSNTERVLIGLDLL--
39
+ >ERR1043166_4517172 119 0.482 1.625E-28 3 89 91 8 94 120
40
+ ---YLLRLYVTGSTLKSARAIQNIQAICEEKLQGRYSLEVIDIYQHPEQVKPEQIVVAPTLVKKLPLPVRKIIGDLSNTERVLVGLDIKP-
41
+ >MGYP001057477004 125 0.563 1.930E-30 3 89 91 19 105 171
42
+ ---YRLRLYIAGQTPNSVAAITNLRQICRDKLEGRYRIEVIDLLEKPQLAKGDQILAVPTLVRKLPEPLKRIIGDLSNEERVLVGLDILT-
43
+ >26195|Ga0315277_11393154_2|-158|01 124 0.550 3.636E-30 3 90 91 48 136 149
44
+ ---WVLRLYVAGQTPKSLTAFANLKNLCEEHLKGKYKIEVFDLLQHPQLARGDQILAIPTIVRKLPAPVRKIIGDCSNTEHVLIGLGLrPRE
45
+ >UniRef100_A0A8J7LTM0 115 0.518 3.979E-27 3 85 91 20 102 113
46
+ ---WLLTLYVAGQTPRSVTAFSNLKQICEEHLPGRYDIEIVDVVTKPELATRDQVVALPTLVRKLPEPVRKVIGDLSNKEKVLVGL-----
47
+ >12918|scaffold901211_1|+1|10 124 0.528 3.636E-30 1 89 91 71 159 176
48
+ -PQYVLRLYVAGVTPRSVQAIETIKRICERNLQGRYHLEVIDIHQQPTLAKGDQIIAVPTLIRQLPAPLRTLIGDMSNEDRVLIGLDLKP-
49
+ >SRR5215510_15873647 112 0.430 4.859E-26 1 86 91 26 111 116
50
+ -KRYLLKLFIIGTRPNSARAIVNVRKLCDEYLAGRYMLEVVDISKHPERVKEEQVIAAPTLIKELPAPLRRFIGSMSDTEKLLVGLE----
51
+ >SRR5512142_679953 123 0.528 1.290E-29 1 89 91 15 103 105
52
+ -EVWRLRLYIAGQTARAAAAVANLRTICEKHLEGRYALEVVDLLETPQLARGDQILAIPTLVRRLPPPMKKIIGDLSNEERVLVGLDLQP-
53
+ >UniRef100_UPI00190A34C5 123 0.540 9.699E-30 1 87 91 12 98 103
54
+ -KGWCLRLYVAGQSPRSMSALQNLKAICETHLAGHYDIEVIDLMEDPKLARGDQIVAVPTLVRKLPEPVRKIIGDLSNTERVLVGLDL---
55
+ >SRR4051812_34338196 126 0.550 1.025E-30 1 89 91 49 137 141
56
+ -ENWELKLYVAGRTPKSVLALKNLRKYCEEHLEGRYKIEVIDLLEKPQLAEGDQIFAVPTLVRKVPVPIRKIIGDLSNEEKVLVGLNIVP-
57
+ >UniRef100_A0A3N5LPZ6 111 0.465 1.297E-25 0 87 91 20 107 114
58
+ RPKYVLRLYVAGISPRSERAIRSVKEVCEQRLKNRYELEIVDVYQHPESLKDGQVLAVPTLIKQLPLPLRRLIGDMSDKEKLIVGLDL---
59
+ >UniRef100_A0A950DFQ4 111 0.602 9.446E-26 5 87 91 23 105 118
60
+ -----LRLYVAGQTSRSLAAIDNLRRICEKNLKGRYTIEVIDLMQAPQLARTDQIVAIPTLVKKLPPPLRRIIGDLSNSERVLIGLDI---
61
+ >5937|scaffold842798_1|-17|01 115 0.453 7.266E-27 2 87 91 22 107 122
62
+ --KWDLRLYVAGPTTKSLAAFRNLEQLCKDHLPGKYHIEVVDLVKTPQLAKGEQILALPALVRQLPIPIRKVVGDLSDTERVLVGLDL---
63
+ >SRR4030095_15264891 128 0.611 2.104E-31 3 87 91 44 128 134
64
+ ---YQLRLYVAGQTPKSVLALKNLEQICEEHLQGRYEIEVIDLLQNPQLARGDQILALPTLVRRLPEPIKKIIGDLSNKERVLVGLDL---
65
+ >UniRef100_K9UBR2 49 0.287 3.904E-04 1 73 91 13 85 398
66
+ -KSLQLLLFLDERMTSYVQNQEIRDKLAILNAQDAFELQVVDVGKQPDLAEHYRVIATPALLKLYPTPRQILAG-----------------
67
+ >UniRef100_UPI001E525CD0 52 0.315 4.240E-05 2 73 91 19 90 437
68
+ --PLQLVLFI-DRRPSSFQKLREIRNyLLKLHEQYPFDLQIVDVAEQPYLAEYFKLVATPALLKIHPEPRQILAG-----------------
69
+ >UniRef100_A0A845X7U1 48 0.275 1.011E-03 5 73 91 22 90 388
70
+ -----LLLFVDERSGSQQHTQQILDYLNYLEQEHLFELEVLEVGEHPDLAEHFRLIATPSLVRIYPQPKHVLAG-----------------
71
+ >3300017444.a:Ga0185300_10001144_2 52 0.315 4.110E-05 2 73 91 13 84 457
72
+ --PLQLVLFI-DRRPSSFQKLREIRNyLLMLHEQYPFDLQIVDVAEQPYLAEYFKLVATPALLKIHPSPRQVLAG-----------------
73
+ >UniRef100_A0A349JMI1 49 0.324 3.904E-04 4 76 91 36 109 410
74
+ ----QLLLFVDKRLSSREQTRHIRKSLQDLKAEWDFELQIIDVGEQPDLAEHFKLLATPSLLKIHPDPRQTLAGsDLS--------------
75
+ >UniRef100_K9SDH8 56 0.356 1.778E-06 2 73 91 16 87 393
76
+ --PLQLLLFV-DRRPSSREQIRTIRnRLKNLEAETPFALEVIDVGEQPYLAEHFKLVATPALLKICPEPRQTLAG-----------------
77
+ >UniRef100_A0A3M1PH87 54 0.283 8.682E-06 1 73 91 12 84 391
78
+ -PSLQLLLFVDKR-PSSIEKIRQIRnRLKELEADYPFDLQVVDVGEQPHMAEHFRLIATPAIIKIHPEPRQTLAG-----------------
79
+ >UniRef100_A0A1C0V439 54 0.272 1.192E-05 0 73 91 11 84 398
80
+ RAPLQLLLFVDHRSSSWEQAsliQGRLQALKARY---PFAFDIVDVAEQPYVAEHFRLIATPALVKVHPHPRQTLTG-----------------
81
+ >UniRef100_UPI001E4DBDD3 44 0.295 3.304E-02 4 73 91 3 72 369
82
+ ----QLLLFVDER-PSSSEYVRLIRDYIEiIKKSCPCQLEVIEIRKQPHLVEHFRLVATPALVKVSPGQKQILAG-----------------
83
+ >UniRef100_A0A969FMT9 63 0.364 8.088E-09 0 73 91 20 92 421
84
+ QATIQLLLFV-DRRPSSWEQMRYIRRYLENTDEQNFSLEVVDVSKQPYLAEHFKLIATPTLLKLYPEPRQMLAG-----------------
85
+ >UniRef100_A0A0M2Q0K7 49 0.281 5.361E-04 4 73 91 18 87 393
86
+ ----QLLLFVDNRS-SSQEQMWQVRRTLETLsDPQTFQLEVMNVTEQPHLTEHFKLIATPALIRLYPTPQQILAG-----------------
87
+ >UniRef100_A0A978SUS5 52 0.356 4.240E-05 2 73 91 13 84 383
88
+ --PLQLLLFIDKR-PSSLEQSRQIQRYLqQTKAKHAFALQVIEVGEQPYLAEHFKIVATPALIKIYPEPRQIIAG-----------------
89
+ >UniRef100_A0A930TQ40 52 0.293 5.823E-05 2 73 91 13 86 408
90
+ --PLQLLLFVDER-PLSRERTRQIRKRLQElalNSPHAYTLQVVKVAEQPHLVEHFKLVATPSLIKISPEPRQILAG-----------------
91
+ >UniRef100_A0A2W4XV74 47 0.282 3.593E-03 2 73 91 13 89 395
92
+ --PFKLLLFIDKR-PHSSEQIRQiryhLKELMSRELmnKNQISLEIVDVTHQPDLAEHFKLVATPALVKVSADQHQTLTG-----------------
93
+ >UniRef100_A0A939KSU8 61 0.328 5.425E-08 2 73 91 13 84 391
94
+ --PLHLLLFVDKR-PISGEQIGQIsRQLKELHGDCDYELMVVDVGEQPYLAEHFKLVVTPTLIKIYPEPRQTLTG-----------------
95
+ >UniRef100_A0A8K1ZWU9 47 0.338 3.593E-03 4 73 91 19 88 385
96
+ ----QLILFVDKR-AASKRQNQQVQDYLKSiEPSHSWNLQVADVEEQPYLAEYYKLVATPALIKLYPEPRQILTG-----------------
97
+ >UniRef100_UPI001C72C3C0 62 0.342 2.095E-08 2 73 91 19 90 401
98
+ --PLQLLLFVDGR-PQSRQQVQRIRSYLRElETEYSFELQIIDVGQQPYLAEHFKLVATPALIKIHPEPQQTLAG-----------------
99
+ >UniRef100_A0A6P0TIA3 54 0.305 8.682E-06 4 74 91 11 81 119
100
+ ----QFLLFV-DRRPASSEKIRQIRQYLERKqGSSSIGLQVIDVGEQPSLVEHFRLVATPALVKIYPEPQQVFTGE----------------
101
+ >UniRef100_A0A5B8NIC7 51 0.252 7.996E-05 2 73 91 1 72 365
102
+ --PLQLLLFIDERSSsqEHLQGIQNyLEALKEDY---PFELQMVNVAEQPHLAENFRLVATPALVKIAPQPRQTIAG-----------------
103
+ >UniRef100_UPI002012D1AC 61 0.405 3.951E-08 2 73 91 13 84 388
104
+ --PLQLLLFVDER-PSSQEQLLRLHN-CIQELKTDYpfELEVVDVGEQPYLAEHFKLVATPALIKIHPPPQQTIAG-----------------
105
+ >UniRef100_UPI001C0312E0 49 0.310 3.904E-04 2 73 91 6 77 384
106
+ --PLKLLLFIDKR-PSSVEQVRQIRQHVEG-LEGEIDpvLEVVDVSEQPYLAEHFKLVVTPTLIKVSAAGRQTLTG-----------------
107
+ >UniRef100_A0A928Z921 53 0.319 3.088E-05 2 73 91 13 84 183
108
+ --PLQLLLFVDGRPSSWEQLRQVLAYLKEKNNEVDWDLKTIKVSEKPYLVEHFKLVATPALIKIHPEPQQTLAG-----------------
109
+ >UniRef100_A0A351L2B1 59 0.356 1.930E-07 2 73 91 13 84 168
110
+ --PLQLLLFVDER-PSSTEQLQQLRQYLERLKADyPLEFEVVDVGAQPYLAEHFKLVATPALIKINPPPRHVLAG-----------------
111
+ >UniRef100_A0A8J7E209 50 0.292 2.071E-04 1 77 91 18 95 383
112
+ -KSLQLLLFVDER-PSSQehikRIQSHLQSLQTEYL---FELEAINVGEHPDLVEHFRLIATPALVKIHPQPRQILAGsDLID-------------
113
+ >UniRef100_UPI00232B6744 50 0.283 2.843E-04 1 73 91 19 91 385
114
+ -KQLQLLLFVDERISTQNHA-QQIQSYLEELkQRDAFDLEILEISENPDLAEHFRLVATPSLVKIYPSPRQVLAG-----------------
115
+ >UniRef100_UPI0018EFA1D6 63 0.342 1.111E-08 2 73 91 13 84 395
116
+ --PLQLLLFVDGR-PKSRQQVQRIRAYLTELKTGyQFELQIIDVGQQPYMAEHFKLVATPALVKIHPEPRQIIAG-----------------
117
+ >UniRef100_U5DJK1 51 0.315 7.996E-05 2 73 91 19 90 404
118
+ --PLQLLLFVDERS-RTQEPMRRIQNYLLRLQQDNaFGLQVIGVEKQPHLAEHYRIVATPSLVRIWPPPRQTLAG-----------------
119
+ >UniRef100_A0A832M402 56 0.301 1.778E-06 2 73 91 13 84 431
120
+ --PLQLLLFVDKR-PSSREQVRQVRSALkERKEECDFEVQFIDVTEQPYLAEYFRLIATPALIKIHPEPRQTLAG-----------------
121
+ >UniRef100_A0A1C0VJA3 49 0.338 3.904E-04 4 73 91 37 106 411
122
+ ----QLLMFVDKR-PGSHEQIRLLRKSLETLKTDfAFDLQIIDVGEQPYLAEHFKLMATPCLLKIHPSPRQMLAG-----------------
123
+ >UniRef100_A0A969K2I2 52 0.323 5.823E-05 4 73 91 15 84 397
124
+ ----QLVLFVDKR-PSSNQKVRQIRNHLKDLRADYvFDLQIVDVGEQPHLAEYFKLVATPALIKIYPEPRQTLAG-----------------
125
+ >UniRef100_K9X4T7 62 0.328 2.095E-08 2 73 91 13 84 395
126
+ --PLQLLLFIDGR-PKSRQQVQRLRAHLKELEAEySFELQIIDVGQQPHLAEHFKLVATPALIKIHPEPRQVLAG-----------------
127
+ >UniRef100_W7Q8H4 65 0.365 1.206E-09 5 85 91 15 96 100
128
+ -----LILFVTGEAPRSRRAHLNLTAALEASGIGDVQPREIDLLVEPQEAIDFGIFATPALMHIDASGTRRVlYGDLSDEHSLRDFL-----
129
+ >W7Q8H4 68 0.365 1.743E-10 5 85 91 15 96 100
130
+ -----LILFVTGEAPRSRRAHLNLTAALEASGIGDVQPREIDLLVEPQEAIDFGIFATPALMHIDASGTRRVlYGDLSDEHSLRDFL-----
131
+ >A0A1H7MTT5 63 0.341 7.839E-09 5 85 91 16 97 101
132
+ -----LILFVTGEAPRSHRAQRNLAAALDASGISDAPAREIDLLHEPQQAIHFGIFATPALMHIDASGNRRVlYGDLSDEHRLKDFL-----
133
+ >UniRef100_A0A1H7MTT5 62 0.341 2.095E-08 5 88 91 16 100 101
134
+ -----LILFVTGEAPRSHRAQRNLAAALDASGISDAPAREIDLLHEPQQAIHFGIFATPALMHIDASGNRRVlYGDLSDEHRLKDFLSVL--
135
+ >A0A1P8R863 62 0.317 1.478E-08 5 85 91 9 90 94
136
+ -----LILFVTGEAPRSRRARYNLSSALKAAGLDKSPAHQIDLLRDPEQAISFGIFATPALMHIDAAGNRSVlYGDLSDQARLARFL-----
137
+ >UniRef100_A0A1P8R863 62 0.317 2.095E-08 5 85 91 9 90 94
138
+ -----LILFVTGEAPRSRRARYNLSSALKAAGLDKSPAHQIDLLRDPEQAISFGIFATPALMHIDAAGNRSVlYGDLSDQARLARFL-----
139
+ >UniRef100_UPI001CD124AC 61 0.357 3.951E-08 8 90 91 1 84 88
140
+ --------FVTGEAPRSRRAHQNLTTALAASGIGDAAPREIDLLGDPQEAINFGIFATPALMLVDASGRRTVlYGDLSDEHSLKTFLSPLRE
141
+ >UniRef100_A0A2N5Y751 59 0.341 1.405E-07 5 85 91 15 96 100
142
+ -----LILFVTGEAPRSRRAHLNLTAALDETGIDVAPPREIDLLREPQEAISFDIFATPALLhIEGSGHRRVLYGDLSDKQSLKDFL-----
143
+ >UniRef100_A0A2N7UCY5 58 0.337 4.998E-07 5 77 91 15 88 105
144
+ -----LILFITGEAPRSRRAQQHLKTALAASGADLAPAREIDLLSAPQEAIDFGIFATPALMIIDASGKRQVlYGDLSD-------------
145
+ >MGYP001039280987 63 0.333 7.839E-09 5 90 91 14 100 104
146
+ -----LMLFVTGNAPRSVRARRNLAGALDSLDLDDVKPMEIDLLSQPEQTVAYSVFATPALLKTEARGKMSVlYGDLSDEGKLQDFLQNLPE
147
+ >UniRef100_U5T3B0 61 0.341 5.425E-08 5 85 91 3 84 91
148
+ -----LILFVTGNAPRTQRARANLAKMLEQIGRGDLGPHEIDLLKQPQEGLTYSVFATPSLLKTDDgGGGSLLYGDLSDSDRLHRFL-----
149
+ >UniRef100_A0A540VSD6 64 0.304 2.274E-09 5 85 91 3 84 96
150
+ -----LILFVTGNAPRTVRARANLSRVLREMGLDSIKPQEVDLLETPQEGLTYSIFATPSLLRTdSGGDEGLLYGDLSDTDRLRRFL-----
151
+ >UniRef100_A0A2S6G6T8 61 0.337 2.877E-08 4 85 91 13 95 102
152
+ ----VLLLFVTGTAPRSQRARVNLARMLEQVGRDDIQPHEIDLLEKPKEGVKHSVFATPALMKVGQGGdVSVLYGDLSETDRLQQFL-----
153
+ >UniRef100_UPI00124EBEF7 65 0.337 1.656E-09 4 85 91 13 95 103
154
+ ----VLMLFVTGTAPRSQRARTNIAKMLEQLNCTDIQPYEIDLLEQPEQGIKHSVFATPSLLKVSPtGGVSVLYGDLSEEDRLRRFL-----
155
+ >UniRef100_UPI001439CAB5 64 0.350 4.289E-09 5 80 91 15 91 100
156
+ -----LVLFLAGDAPRSRRAHRNLSAALAATCPDLAPVHEIDLLREPQQAIDFGIFATPALLHIDAEGNRRlLYGDLSDEGR----------
157
+ >MGYP000294044467 63 0.317 1.077E-08 5 85 91 3 84 96
158
+ -----LILFVTGDAPRSRRARANLSNMLERLGRSDLTPREVDLLDQPQAGLSYSVFATPSLLKaDQQSDGALLYGDLSDEQRLERFL-----
159
+ >UniRef100_A0A4Q8CZA3 61 0.317 3.951E-08 5 85 91 14 95 104
160
+ -----LILFVTGNAPRTQRARANLARMLEEIGRDDLTPYEIDLLQQPQEGLTYSVFATPSLLKTDDEhGGSLLYGDLSDDDRLYRFL-----
161
+ >3104|Ga0306908_1123748_1|-11|01 68 0.329 9.241E-11 5 85 91 17 98 106
162
+ -----LMLFVTGTAPRSQRARTNLVHMLEQLNRTDVQPYEIDLLEQPEQGIKHSVFATPSLLKVSPTGEVSVlYGDLSEEDRLRRFL-----
163
+ >UniRef100_UPI00133045FA 64 0.329 4.289E-09 5 85 91 14 95 107
164
+ -----LILFVTGDAPRSRRARANLASMLERLGRTDLSPQEIDLLDQPQAGLSYAVFATPSLLKREPErDGALLYGDLSDSDRLERFL-----
165
+ >UniRef100_UPI00201FFA50 61 0.317 2.877E-08 5 85 91 13 94 99
166
+ -----LILFITGTAPRSQRARSNLGKMLDRLNLNDVKPFEVDLLEQPDQGIEHGIFATPSLLKFDSSGEVSIlYGDLSVEERLQKFL-----
167
+ >UniRef100_A0A845V233 60 0.312 1.023E-07 7 85 91 1 80 86
168
+ -------LFVTGSAPRSRRARKNLAAALKSLGLDSVKAMEIDLIDRPEKTVTYSVFATPALLRMDEAGEMRVlYGDLSDESKLLEFL-----
169
+ >UniRef100_UPI00082AD250 59 0.317 1.930E-07 5 85 91 14 95 105
170
+ -----LVLFVTGNSPRSLRARANLAKAVEETASDHITVRHVDLLEDTGGITEYGIFATPALVHVRDGGePAVLYGDLSNEAELQRFL-----
171
+ >MGYP001134272031 63 0.321 5.708E-09 5 90 91 14 100 104
172
+ -----LMLFLTGDAPRSVRARKNLSGALDKLELDEVTPMEIDLLDQPEQTVAYSVFATPALLRTDALGEMSVlYGDLSDEDKLHDFLQNLPE
173
+ >UniRef100_UPI00037CF37C 66 0.337 8.781E-10 4 85 91 13 95 103
174
+ ----VLTLFVTGSAPRSQRARANLARMLEQIGRSDMQPQEIDLLEQPEQGITQSVFATPSLLKTDTHGEVSVlYGDLSEEEKLRRFL-----
175
+ >UniRef100_UPI0003674D18 66 0.325 6.394E-10 4 85 91 13 95 103
176
+ ----VLTLFVTGSAPRSQRARANLARMLEQIGRSDLQPQEVDLLEHPQQGITQSVFATPSLLKTDANGEVSVlYGDLSEEEQLRRFL-----
177
+ >UniRef100_UPI00047687D0 61 0.313 2.877E-08 4 85 91 13 95 102
178
+ ----VLTLFLTGTAPRSQRARANLAHMLEQIGRSDLRPYEIDLLEQPEESITHSVFATPSLLKTSDtGGVLMLYGDLSDEDTLHRFL-----
179
+ >UniRef100_A0A3S0W7L6 59 0.324 1.405E-07 5 80 91 3 79 92
180
+ -----LTLFVTGTAPRSQRARVNLAQMLNRIGRSDIQPYEIDLLEQPGQGITHSVFATPSLVKANeTGEVSVLYGDLSDEER----------
181
+ >UniRef100_UPI001903F48C 63 0.329 1.111E-08 5 85 91 14 95 99
182
+ -----LVLLVTGDAPRSRRARQNLARALEQLGLGDIATREVDLAADPAQTLSYGIFATPALLRPGPnGQPDVLYGDLSERDMLERFL-----
183
+ >SRR4051794_37995438 82 0.354 1.920E-15 7 85 91 61 139 143
184
+ -------LFIAGDGPNSTAAVANLRAFLAQRSASHVEVEIIDVFMEPQRGVSASVFVTPMLVRVEPTPERRILGNLSDRTVLASVL-----
185
+ >UniRef100_A0A318VX16 61 0.301 2.877E-08 4 85 91 14 96 104
186
+ ----VLTLFVTGSAPRSQRARANLARMLEQLGRADLRPREVDLLEQPLQGITQSVFATPSLLKTDTNGEVSVlYGDLAEEEQLRRFL-----
187
+ >UniRef100_A0A372BWN2 67 0.337 2.469E-10 5 86 91 14 96 101
188
+ -----LTLFVTGDAPRSRRARRHLNAALKKLGQDSIKPLEIDLLEHPEQSINHSVFATPALLRARNDGEISVIyGDLSDESKLLDFLG----
189
+ >3300017992.a:Ga0180435_10008823_6 74 0.379 7.937E-13 0 86 91 10 96 103
190
+ RSRYVVRLFVAGDAHNSRIARENLNQLRDLLNDTELSIQVIDVEENPQLAIEHSIYVTPALQIVEPKPPTLVYGNLRDKETLLALFE----
191
+ >UniRef100_UPI000401CEFF 70 0.325 2.681E-11 7 86 91 18 97 110
192
+ -------LYIAGDAPNSRIALQNIKMIQENISQWNLKVAIVDVVATPEVALEKGIYLTPALEIEAHGMESLVYGNLSDKEKILALFG----
193
+ >16161|scaffold59688_2|+220|00 84 0.353 3.934E-16 0 81 91 11 92 102
194
+ KRRYVLRLFVAGDAPNSRIARENLRRLQESVAECDFEVEIVDVMENPQSALDHGVFVTPALQIIEPGPEKLIFGNLTNKEAL---------
195
+ >SRR5690606_35087643 76 0.305 3.066E-13 3 87 91 39 123 125
196
+ ---YVMRLFISDNAVNSRIARENLSNFLAEFPQHSFQIEIVDLYLQPEMALQNGIFITPTLQILAPQPGGIIYGNLSDRNALERILQI---
197
+ >UniRef100_A0A0R3M5F3 93 0.404 2.017E-19 4 87 91 10 93 100
198
+ ----VLRLYIAGNSAGSRRAEQNLHDLRALLDNQAWRIEIIDVMRRPELAEQAGIIATPTLSCEHSGRPRRIVGDLSDKKRVLEFLGI---
199
+ >UniRef100_A0A969HBU9 81 0.352 2.720E-15 1 85 91 12 96 101
200
+ -KHYVIRLFVAGNAPNSRLARENLDRFQAGFPEHEFKVEIIDLDIQPELALENGVFITPTLVVLEPAPGGMIYGNLSDQKVLAQVL-----
201
+ >MGYP000847580960 74 0.284 1.090E-12 0 87 91 10 97 98
202
+ KKRYALRLFVAGNATNSRIARENLEQLLARHPEHEFEVEIVDLNVQPEFALDQGVFISPALQILEPSSGGIVYGNLSQKEVLEKVLNL---
203
+ >A0A0R3M5F3 91 0.395 1.797E-18 4 89 91 10 95 100
204
+ ----VLRLYIAGNSAGSRRAEQNLHDLRALLDNQAWRIEIIDVMRRPELAEQAGIIATPTLSCEHSGRPRRIVGDLSDKKRVLEFLGIET-
205
+ >SRR5262249_39096779 91 0.469 9.533E-19 5 87 91 44 126 134
206
+ -----LRLYIAGNSAISRRAEQNLLHLQSLVKPGAWEVHVIDVLPKPELAEQAGILATPTLSYEHPVRPRRIIGDLSDTTRVLDFLGI---
207
+ >SRR5690348_3124064 88 0.481 1.652E-17 5 87 91 48 130 138
208
+ -----LHLYVAGNTASSRRAQQNLLRLREIMKEPQCEVRIIDVLVEPQLAEEAGILATPTLSYEHPQRPRRIVGDLGESKRILEFLGL---
209
+ >MGYP000666260026 97 0.404 8.218E-21 4 87 91 8 91 97
210
+ ----VLRLYVAGEGPNSVRARANIVDLCDRHLQGAYSLEIVDVFDEPGRALEEGVLMTPMLVVASASPPRRVVGTLDETSVVLTALGL---
211
+ >UniRef100_A0A838IZY5 88 0.392 1.705E-17 5 86 91 11 94 101
212
+ -----LRLYVAGEGPNSRQARENLRVICEAHLAGRHVIEVLDVFEEPERALDDGVYLTPQLLVLVlpPATPRTVVGNLSEREVVLRALG----
213
+ >MGYP001366537082 74 0.301 7.937E-13 4 86 91 16 98 121
214
+ ----VIRFFVAGEAPNSIIARDNLRRLRESLPEIHFEIEIVDVNVNPEIALQKGVFVTPALEVLEPPPGGIFYGNLSNSDPIRRLIE----
215
+ >A0A1Q2HNV8 77 0.394 8.623E-14 2 77 91 6 81 93
216
+ --KMVLRLYVAGKGLNSAMAIENLKQICRTCNSYDYDLKIVDVLKEPQTALDKGIFVTPALEIIEPAPGGMVYGTLAD-------------
217
+ >UniRef100_UPI002011AF87 91 0.471 1.854E-18 5 90 91 36 122 127
218
+ -----LRLYIAGPSATSRRAEQNLRRLRDvAKARDGLAVEIIDVLKNPELAEQAAIIATPTLALEHPVRPRRIIGDLSDVERVLDFLGIESE
219
+ >23258|scaffold4609030_1|+1|11 75 0.337 5.780E-13 6 85 91 20 99 104
220
+ ------RLFVCGDALNSRRARENLQRLREMFPHVEFKVEVIDVGETPQAALDQGIFVNPALQVLEPGPGMLIYGDLSDLQALAAML-----
221
+ >SRR5579871_6120579 90 0.435 3.387E-18 5 89 91 48 132 138
222
+ -----LRLYIAGNSASSRRAEHNLEHLRKFMNAEGWKIEVIDVLARPELAEEASILATPTLSYEYSGRPRRIVGDLSDTKRVLKFLGIEP-
223
+ >MGYP000105995723 87 0.428 4.277E-17 4 87 91 2 85 93
224
+ ----VLRLYIAGNSPSSRLAQQNLKHLRMLMKGGNEQVEVVDVLANPELAEKASILATPTLCYEHSGRQRRIVGDLGDPKRILAFLGI---
225
+ >SRR3954468_6301146 102 0.431 2.518E-22 0 87 91 46 133 148
226
+ KAPMVLRLYVAGDAPNSTRARANLRRLLSAVDPSRYNLEVIDFLTEPLRALDDGVLVTPTLMRVDPPPPQVVVGTLSALDRVADALDI---
227
+ >MGYP001433622665 94 0.494 1.037E-19 1 87 91 3 89 94
228
+ -KPYQLMLFVAQGQPNSVRAQKNLRQICEEVIPGKYHLKVIDVVKEPELAVENGIYLTPMLVVSDPPPPASITGDMAERKTVLAALKI---
229
+ >UniRef100_A0A3D6C093 96 0.494 2.194E-20 1 87 91 3 89 94
230
+ -KPYQLMLFVAQGQPNSVRAQKNLRQICEEVIPGKYHLKVIDVVKEPELAVENGIYLTPMLVVSDPPPPASITGDMAERKTVLAALKI---
231
+ >A0A1W6LJH8 78 0.418 6.280E-14 4 77 91 8 81 93
232
+ ----VLRLYVAGKNVNSTLAIENLEKLCRRCNSFEYDLKIVDVLKNPETALEKGIFVTPALEILEPAPGGMVYGTLSD-------------
233
+ >SRR5687767_11767969 91 0.404 1.797E-18 4 87 91 38 121 122
234
+ ----VLRLYIAGNSASSRRAEQNLHALRASLAQNAWEVEIIDVLSKPELAEQAGVIATPTLSYEHSGRSRRIIGDLSDKKRILEFLGI---
235
+ >SRR4051794_40104329 92 0.459 5.058E-19 5 90 91 41 127 130
236
+ -----LRLYIAGPSATSRRAEQNLLRLRDvAKAPNGLEVEVIDVLENPELAEQAAIIATPTLAFEHPVRPRRIIGDLSDVERVLDFLGIESE
237
+ >SRR3954454_22706284 106 0.441 1.060E-23 2 87 91 11 96 109
238
+ --PLVLRLYVAGDAPNSARARANLTRLLSDLDSSRYTLEIIDCLDEPARALGDGVFVTPTLVRLGPPPQRTIVGTLSATDRVADALDL---
239
+ >MGYP000010225417 75 0.407 4.209E-13 5 85 91 104 183 188
240
+ -----LVLYVAGDGPYSRRARANLQALMRE-AGIAAEVTVVDVLKSPDRALEHGIFATPALIVVHGKHETLIMGDLSERDTALEAL-----
241
+ >2271|Ga0209795_10171170_2|-245|01 90 0.406 2.467E-18 4 89 91 30 115 121
242
+ ----VLRLYIAGNSASSRRAEQNLHRMQAFIKSEAWDVEIIDVLSKPELAEKAGIIATPTLSFEHSARPRRIVGDLSDTKRVLEFLGIET-
243
+ >MGYP000738482073 88 0.329 1.652E-17 1 90 91 15 105 111
244
+ -PIYRLRLFIAGDEPNSVRAREALARLRNERLGPQCEVEVVDVFQDYQAAITHGVSVVPTLKIEGPRGGRTIVGSLRDEAVVLAALGLsPTE
245
+ >SRR6202012_1017248 94 0.447 1.037E-19 3 87 91 12 96 104
246
+ ---FSLRLYIAGDSITSRRARQQLARIREILKQHKFDVETIDVLAQPQLAEQERILATPTLASEHGGPPKRIVGDLSDTKRVLEFLGI---
247
+ >26133|Ga0268298_10010625_3|-7238|00 74 0.317 1.090E-12 3 87 91 15 99 100
248
+ ---YALRLFVAGNAANSQIARENLERLRARYPDYEFEVEVIDLNIDPEVALTHGIFISPALQVIDPPTGGVIYGNLSDERVLERVLKL---
249
+ >SRR5262249_6174883 86 0.452 5.872E-17 4 87 91 14 97 102
250
+ ----VLRLYIARNSPSSRRAEQNLDYLRRLMKADGGRVEVIDVLANPELAERESILATPTLCYEHSGHRRRIIGDLGDPERILAFLGI---
251
+ >UniRef100_UPI001904043C 77 0.383 8.896E-14 2 87 91 2 86 90
252
+ --PYQLRLFVSGPNPLCRKAERAIRELLIERGV-AYELDVIDVLADPDAAEEYALVATPTLECTAPPPVRRVVGYYEHYAEVFDALGI---
253
+ >MGYP001146183833 101 0.583 3.457E-22 4 87 91 2 85 90
254
+ ----ELRLYVIGKTPSAIKATEHLRALLEDQYKDEYALEVVDVLENPILASDDKILATPTVVRRLPHPIRKVIGDLSEREKVLLGLDL---
255
+ >UniRef100_A0A2U2N9L6 76 0.329 2.303E-13 4 85 91 7 88 93
256
+ ----QLTLFVAGDSPRSRHAREVLRRALAERGLDPGALELVDVLAEPERTLEHGVFATPALVLRADGATRSLYGDLSDEQGLQQFL-----
257
+ >UniRef100_A0A127EN01 85 0.433 1.568E-16 5 87 91 11 92 99
258
+ -----LRLYIAGRSAISQRAESHLRQ-LHRSIKLECNIEIIDVLKSPELAEQAGVLATPTLSYEHPSRSRRIIGDLSDTKRIVEFLGI---
259
+ >A0A127EN01 87 0.421 4.277E-17 5 87 91 11 92 99
260
+ -----LRLYIAGRSAISQRAESHLRQL-HRSIKLECNIEIIDVLKSPELAEQAGVLATPTLSYEHPSRSRRIIGDLSDTKRIVEFLGI---
261
+ >U2E7T8 67 0.387 3.286E-10 7 85 91 13 92 100
262
+ -------LFVAGDSPSSRRARRALESLIGSQSNEQkAQFEVVDVLREPERALESNLLATPTLLIERGGHVSRYVGDLHEREDVREEL-----
263
+ >UniRef100_UPI00190730C9 67 0.308 3.391E-10 5 85 91 15 95 97
264
+ -----LTVFIAGDAPSSRQAMAHLTGVLDSIGIPPERLQVVDVLTDPGAALDAGALVTPSLQIKRGERARWFLGDLTDQRDLLAFL-----
265
+ >UniRef100_UPI000D3E5BC6 66 0.347 6.394E-10 2 73 91 3 74 103
266
+ --RFRVNVYVVGGSNHASRAVALLRDVADTHFGGDAEITVIDVTSEPALADAAGVITTPTYDLLAPLPRRRIIG-----------------
267
+ >UniRef100_A0A1Y6FIV8 61 0.308 5.425E-08 5 85 91 11 91 95
268
+ -----LTLLVAGESQATRSARATLDSLIGDGLAEASHVRVIDILQQPDYALRYKAFFTPSLIVETPTTTTTIVGDLHELDEVRSLL-----
269
+ >SRR5919109_3969706 72 0.309 5.322E-12 2 85 91 7 90 100
270
+ --PVSLVLYVSAESPASQRARRHLESLLAQFDASQLAVEVCDVSADPVRGETDHVVFTPTLVARSGGLATWVLGDLADRSMLVDLL-----
271
+ >MGYP001077603090 119 0.505 1.625E-28 1 87 91 17 103 106
272
+ -EKWNFTLYVAGDNLSARRAKKNLQGICDEYLEGRYAIEIVDLVEHPEIAEEDQILAAPTLVRKLPLPLRRIIGDLSSREKVLIGLEI---
273
+ >SRR3954463_15126113 100 0.392 8.943E-22 3 86 91 17 100 101
274
+ ---FKFRLYVASSTPNAAKATANLQQLCREHLPGRHAIEVVDVFKQPKRALADQIYLTPTLLRLAPMPVRKIVGNLSEASALLAALG----
275
+ >ERR1039457_666370 113 0.443 3.540E-26 2 89 91 12 99 108
276
+ --RYRFKLYVTDHTLRSRQALAQLRKLCDEQFPQQYELEVVDVLEHPDEAAAQHIFATPTVVRERPLPIRRVIGDLSDMGKVLAGLALPP-
277
+ >UniRef100_UPI0005BD3569 63 0.370 8.088E-09 7 84 91 7 86 98
278
+ -------LFVAGGAPRSAAALRNLTAAIAatGRPEGTFRIELIDVLRDPARALEAGLLATPSLALTaANGRRRWFIGDF-DRPELLAG------
279
+ >10876|scaffold_592705_c1_2|-157|01 111 0.425 9.155E-26 1 87 91 56 142 164
280
+ -PVWKLHLYVADTTPRSVLATENLHSFCDQYLPGQYRVTIIDIVKQPALAREHEILATPTLIRVFPGPERTVVGSLSDTARVARALEL---
281
+ >UniRef100_A0A7V9DA05 104 0.360 5.330E-23 3 88 91 3 88 99
282
+ ---YSFRLYVTGETTLSREAEANLRALCKNRLVDDYEIEIVDILERSALAEEEQIVATPTIMRLAPLPRLRVIGDLSDHERAARAFGLP--
283
+ >MGYP000147404972 93 0.414 2.683E-19 7 88 91 17 98 99
284
+ -------LFVAGNAPNSVSAQANLRQVCEQRLKNGWELKIIDVLEDYGTALDHGILVTPALVILEPLPAVTVFGDLSDTDRLLKALRLI--
285
+ >SRR5207249_9234194 77 0.373 8.623E-14 3 85 91 34 115 125
286
+ ---FELVLYVSPGSPACARAQRNIHELLGRLDRAQVDLDVRDVSEDAERAEADRILFTPTLVVRRPL-LTWIVGDLTNGEEVLRVL-----
287
+ >UniRef100_A0A8T3N6J2 109 0.500 4.603E-25 2 87 91 1 86 97
288
+ --KYRLRLFVTGHTPASLSAQKNLRKLCEGELRGWCEFEVVDVLKQPELAEEARIIATPTLVKLTPEPQRKVIGDLSNHDQLLHVLDM---
289
+ >SRR3954451_13963006 68 0.325 1.269E-10 5 87 91 69 149 159
290
+ -----FTLYVDGPE-QGRHVSRRLLELCQPWGIAP-DLSVVDVGDGPDQAEQANIIGTPTVVREAPAPRRRIIGALDDDRRVVEALGL---
291
+ >UniRef100_A0A1T4Y342 59 0.308 1.930E-07 5 85 91 11 91 95
292
+ -----LTLLVAGESSAARSARATLDTLIGDGLAEASHVRVIDVLQQPDYALRYKVYFTPSLIVETSTTTTTIVGDLHEIDEVRSLL-----
293
+ >UniRef100_A0A2V7TZK6 77 0.373 1.222E-13 3 85 91 21 102 112
294
+ ---FELVLYVSPGSPACARAQRNIHELLGRLDRSQVDLDVRDVSEDAERAEADRILFTPTLVVRRPL-LTWIVGDLTNGEEVLRVL-----
295
+ >UniRef100_A0A831PRG7 79 0.301 1.823E-14 4 86 91 30 112 122
296
+ ----HLRLYILGTSARASLARQRVEEFCGQFPPGRLRLEVIDLLVDGEVAERDRIIATPSLRRVMPLPVVSLVGDMGDEQQLVALVN----
297
+ >SRR6185295_5137210 81 0.372 2.636E-15 0 85 91 100 184 194
298
+ RRPVELVLYVSPASPASARAQHNIHELIARLGGSKVDLDVCDVSEDAERAEADRILFTPTLVVRRPL-LTWIVGDLTNGEEVLRIL-----
299
+ >SRR5687767_10342612 86 0.380 8.062E-17 2 85 91 60 142 156
300
+ --PVELALYVTLPWPASLKAKRNLDRVLSGFSRSQVSLTVCDLAQEPERAEQDGIVFSPTLVKRMPEPRAWVMGDLSDR-KVLSNL-----
301
+ >SRR5881409_182087 64 0.290 4.157E-09 2 87 91 5 88 111
302
+ --PMVFTAYVDG-TEMGGQVRTRLLELCASRDVTA-EIRVVDVLSEPAAAETGNVVGVPTVVREQPHPRRRVIGVLDDTRRVAEALGL---
303
+ >UniRef100_A0A4Q3W5A8 102 0.459 1.893E-22 1 87 91 12 98 102
304
+ -KEFVLRLFVTGASPNSLKALNNIREICENHAKGNYSLEVIDVYQNAELVQQEQIIALPLLVRKNPLPERKLIGDLSEKEKVIKYLGL---
305
+ >UniRef100_A0A934QHJ2 66 0.320 8.781E-10 5 85 91 16 96 98
306
+ -----LTVFIAGDAPSSRQAMTNLTGVLDSLDIPPERLEVVDVLTNPAAALKAGALVTPSLQVKRGQQVYWFLGDLTEQRDLLAFL-----
307
+ >4460|scaffold_415991_c1_2|-159|00 113 0.448 2.579E-26 0 86 91 24 110 118
308
+ KARYTLRLYVAGFRQSSRSAIANIRRICDKHFEGSANLEVIDIYQQPELAAAQQIIASPTLIKEAPAPFRRVIGDLSDTTKVLAALG----
309
+ >MGYP000603749840 111 0.443 1.725E-25 2 89 91 69 156 158
310
+ --RFLLQLYVAGNSHRCVNARKNLREICEEHLPDSYTLEIIDIVENPEAAEEADIVAVPTLVKRSPSPVRKVVGDMSRTQNVLSGLNIEP-
311
+ >UPI0003E01BF4 75 0.317 4.209E-13 2 85 91 167 251 261
312
+ --RIELALYISASSPSSLKALRNLTRLLADHDAAQVRFTTYDLSkEHIAAAQEDRIAFTPTLVKRWPEPKVWILGDLDDIRVVSDLL-----
313
+ >10796|Ga0318514_12978242_1|+2|11 63 0.291 5.708E-09 0 77 91 12 89 92
314
+ QPRVELSLYTSSGSPSSLKAVRNLMSLLSNYDPLQVRLSVRDLsREAHEQAAEDRIAFTPTLVKRN-EPKVWVLGSLDD-------------
315
+ >UniRef100_UPI0021E125E4 71 0.313 1.035E-11 1 85 91 139 224 234
316
+ -KRIAFTLYISEASTASLRALRNLQKLLDGYDASQIDLRVVDLSkERPASFDEDRITFTPTLVKRSPEPRVYLLGTLEHIQSVADLL-----
317
+ >SRR3712207_3115678 67 0.311 2.393E-10 2 77 91 61 137 142
318
+ --RLELFFYVSSASACSLKALRNLDRFLADYQGAQVRLRVFDLSQDyPAEAEEDRIAFTPSLVRRYPTPKTWLLGSLDD-------------
319
+ >SRR6185295_1711097 83 0.388 1.018E-15 2 85 91 94 178 188
320
+ --RVQLTLYISTSSPSSLKALRNLQKLLNDYDPGTVSLTVCDLSRDtTGSAEEDRIVFTPTLVKRVPEPKVWILGDLENAEIVSDLL-----
321
+ >UniRef100_UPI00214A28C1 63 0.337 8.088E-09 5 86 91 139 221 230
322
+ -----LMLYISEASPASLRALRKLEKLLAGYERSQVRLTVVDLAkERPPSFDEDRIAFTPTLVKRYPTPRAYYLGALDQLQAVTDLLN----
323
+ >UniRef100_UPI00193BA132 68 0.337 1.798E-10 5 86 91 139 221 230
324
+ -----LTLYISEASPASLRALRNLEKLLANYERSQVRLAVVDLSkERPPSFDEDRIAFTPTLVKRFPAPRAYYLGTLDQFQAVTDLLN----
325
+ >SRR5688500_8380976 82 0.345 1.398E-15 2 85 91 168 251 262
326
+ --PIELVLYYTPPWPSSMKARRNLEKILGKYEADAVRLTVRDLGEHPDLAEADGVVFSPTLLKKSPGAPVWMLGDLSDATAVTDLL-----
327
+ >ETNmetMinimDraft_32_1059908.scaffolds.fasta_scaffold895325_1 83 0.365 5.401E-16 4 85 91 130 211 221
328
+ ----ELVLYYTPPWTSSLKALRNLEKILEGFDKDAVHLNVRDLAEHPEQAEADGVVFSPTLVKKAPGPPVWMLGDLSDSRAVTDLL-----
329
+ >SRR5512132_2975018 81 0.341 4.970E-15 4 85 91 33 114 125
330
+ ----ELVLYYTPPWPSSMKARRNLEKILEGYEAEAVHLTLRDLGDHPDLAESDGVVFSPTLIKRSPGAPVWMLGDLSDGSAVTDLL-----
331
+ >13960|scaffold210726_2|+957|01 84 0.341 2.865E-16 1 85 91 155 239 249
332
+ -PKAEFVLYISSASPSSLKALRNMQRLLGEYQASQVRFTVCDLLKEPGCFEEDHVAFTPTLVKRLPGPKTWIVGDLQDSSMVTDLL-----
333
+ >12613|JGI10216J12902_106548506_1|+3|10 64 0.365 4.157E-09 5 85 91 84 164 178
334
+ -----LTLYVS-DSMLSLRAAKNLRMVLARYRDEQVALTVINLSHDVDhHAEEDRIVVTPTLLRTFPAPRVWLVGNLDKRDLVERLL-----
335
+ >SRR5688572_30282090 83 0.341 7.416E-16 4 85 91 58 139 150
336
+ ----ELILYYTPPWPSSIKARRNLEKILEGYDADAVHLTVRDIAEQPDLAEADGVVFSPTLIKKSPGAPVWMLGDLSDASGITELL-----
337
+ >SRR5919106_6413091 63 0.320 7.839E-09 5 81 91 24 100 104
338
+ -----FVMYVNG-SNRSRRALRKVRALFDEYDAAQLTWSTIDVSSDtASRVEQDRIVVTPTLLKTYPSPAVWITGELENTDVV---------
339
+ >SRR5688572_2308330 79 0.317 2.426E-14 4 85 91 137 218 229
340
+ ----ELVFYYTPPWPSSMKARRNLEKILSGYAADAVHLTVRDLGEHPDLAEADGVVFSPTLIKKSPGAPVWMLGDLSDASGITELL-----
341
+ >3300018984.a:Ga0193605_1004274_5 73 0.318 2.055E-12 0 86 91 168 255 265
342
+ QKRIGFVLYISEASSSSLRALRNLQRLLDEYETSQIDLKVVDLSkERPASFDEDRITFTPTLVKRNPEPRVHLLGTLEHIQSVAELLG----
343
+ >GraSoiStandDraft_41_1057321.scaffolds.fasta_scaffold894400_3 78 0.346 6.280E-14 5 79 91 90 164 168
344
+ -----LVLYVSAASPASIQARRNLERMLSRFEPGQVRWTVRDLEREPLAGEEDRIAFTPTLVKRFPEPRMWVLGNLREAD-----------
345
+ >UniRef100_A0A7X0U868 87 0.345 2.341E-17 4 87 91 21 104 113
346
+ ----HFRLYVSTTSPISLRAIANARRILQEAYPGAHRLTVLNIAEHVALARTDQIIVSPTLLRLAPLPQRRFIGDLSDLNRLRRALGM---
347
+ >SRR5690242_21852208 83 0.395 1.018E-15 2 86 91 14 99 108
348
+ --RIELVLYTAGRSPASMRALRQMKNLLAQYESAQVDFKVFDLAEgRPASAEEDHILLTPTLVQRSPPPRTWVVGDLEDTTLIADLLD----
349
+ >SRR4051812_14954157 76 0.301 1.626E-13 2 83 91 38 120 132
350
+ --RLELVLYISAQSPASLRAIRNFQAILNEFDADEVDYSVCDLATDiSGAADEDRIAFTPSLVKRHPLPNEWFLGDLTNTDPIRA-------
351
+ >SRR5687768_5535903 80 0.329 6.824E-15 4 85 91 139 220 230
352
+ ----ELLLYYTPPWPSSMKARRNLEKILKAYEADAVHLTLRDLGEHPDLAEQDGVVFSPTLIKRSPGAPVWMLGDLSDSTAVTDLL-----
353
+ >GraSoiStandDraft_2_1057267.scaffolds.fasta_scaffold1451175_1 77 0.317 8.623E-14 4 85 91 155 236 246
354
+ ----EFVLYVSASSAASSQARRNFEQLLEGYDATQVRYTVCDLGRDPLAGDEDRVAFTPTLVKRYPPPRMWLIGNLRELEIVADIL-----
355
+ >SRR6185437_3447168 72 0.307 5.322E-12 2 79 91 62 139 141
356
+ --PIELMLYVSRHSAHSAQAIRNITSVLSRFKATQVKLTVCDLSADPGAGAKDNITYTPTLVRSGPGPRTYILGHISNPE-----------
357
+ >SRR4051794_7772101 82 0.376 1.920E-15 2 85 91 91 175 188
358
+ --RIELVLYVSGSSPSSLKAMRNLDGVLRQFDLACIQLEICDLSRgYPATAEEDRIAFTPTLVKRGPAPRAWVIGNLENRDLVADLL-----
359
+ >SRR5829696_6517560 79 0.294 1.767E-14 2 86 91 36 120 128
360
+ --KVELVLYISSASPSSIVARRNLEKVLDRFDSAQIHVTVCDLVADPLAGERDRIAFTPTLVKTYPAPKMWVLGNLRDPAIVEDLLG----
361
+ >MGYP000536513446 72 0.362 3.875E-12 5 81 91 139 218 236
362
+ -----LKLYFSGVSTESRRAVRNLRRVLKEFDRHRIRLDVHDISDRAAPVaplEQDRIVVTPTLVRKHPLPKLWILGDLSKLEVV---------
363
+ >SRR5687768_1210620 76 0.358 2.232E-13 5 81 91 136 216 217
364
+ -----LALYVSGTSPSSRKALRNLTQVLRNVEPHRVAVTVHDISEsdHPwvEAAEDDRVVVIPTLVRHAPLPRVWIAGDLSEIDTV---------
365
+ >SRR6059058_1924189 81 0.378 2.636E-15 2 75 91 28 101 104
366
+ --RIELVLYVSASSPPSVRAVANLRRILQRYKSDRIRCSICDLTANPEEGDVDQIAFTPTLCKRQPEPPMWILGDL---------------
367
+ >SRR5688500_14284300 87 0.383 4.277E-17 2 86 91 26 111 119
368
+ --RIELVLYVAGPSPASTRALRQMKNLLAQYDAAQVDFRVIDLAQgRPASAEEDRVLLTPALVRRSPLPRTWVVGDLEDTTLVADLLD----
369
+ >SRR5688572_7102903 68 0.373 1.269E-10 5 75 91 47 121 122
370
+ -----LVLYVSGDSQASRRALRNVRRALEGVDPAAIELDVRDVSSSdgaaIAAADADRIVVTPTLVRAAPSPKVWIAGDL---------------
371
+ >SRR4029453_1913048 75 0.261 5.780E-13 2 85 91 54 137 147
372
+ --RVEFVLYVSPNSAASLQARRNFDKLLARFDATQVKYSICDLIRDPLAGDSDRVAFTPTLVKRYPAPRMWLIGNLRETEVLADIL-----
373
+ >SRR6185503_893473 70 0.360 2.599E-11 2 87 91 34 118 126
374
+ --PIELVLYVAPNSPACVRARANLEAALDAYDRSRIRLTVCDVSRDFEDAERDRIVFTPTLL-LRGAEAGCVVGDLSLGDAVDALLSL---
375
+ >SRR5262245_8274589 79 0.285 1.767E-14 2 85 91 152 235 245
376
+ --RVELVLYVSPSSPACAQARHNLERVLDHFDPSQIKYSVFDLVRDPLAGEDDRVAFTPTLVKRYPAPRTWVLGNLRDTQIVGDLL-----
377
+ >SoimicMinimDraft_3_1059731.scaffolds.fasta_scaffold2181823_1 76 0.292 3.066E-13 4 85 91 147 228 238
378
+ ----ELVLYVSSASPACIQARRNLEQLLEKFDVSQVRFSICDLGREPTAGDADRIAFTPTLVKRYPEPKMWVLGNLREPQIIADLL-----
379
+ >23892|Ga0310888_10269646_2|-120|01 75 0.292 4.209E-13 4 85 91 189 270 280
380
+ ----ELVLYVSSASPASTQARRNLELVLDGFDRSQIKYTICDLGRDPMAGEIDRVAFTPTLVKRYPEPRMWLLGTLRETDLVADLL-----
381
+ >SRR3954471_22207506 105 0.388 1.455E-23 3 87 91 77 161 171
382
+ ---FRFRLYVAGTTPNSVQARANLSALCRRHLPGRYKIEIVDVSKQPDRALIEGIFMTPSLMKISPSPTRMIVGTLSPSDALMRALGL---
383
+ >SRR5581483_9829629 105 0.383 1.455E-23 3 88 91 11 96 111
384
+ ---YSFRLYLAGGTARAMQAEAQLRHLCETRLPEGFELEVLDVTDHPDRAEEDRVLVTPTVIRLSPPPARRVLGSLSDEHRVGLALGLP--
385
+ >Cyp2metagenome_2_1107375.scaffolds.fasta_scaffold1445432_1 88 0.363 1.652E-17 0 87 91 148 235 244
386
+ RPRVQLVLYVDAVWVTSVRAQENLEKVLAGFDRSQVHLRICDVAREPLDAEKDQIVFTPTLVKRSPAPRAWVVGDLSDHDVVTALLEM---
387
+ >UPI00034656C1 83 0.383 7.416E-16 2 87 91 153 238 252
388
+ --KLELALYVTMPWPSSLRAKANLSRVLARVPEGHVRLAVCDLAREPERAEIDNVLFSPTLVKVWPAPKMWILGDLSEANVLTDLLSL---
389
+ >UniRef100_A0A2V7Y706 73 0.413 2.912E-12 2 76 91 165 239 260
390
+ --KIELALYVTLPWPSSLRAQSNLSRVLSGVPDGEVRLSVCDLAREPERAERDNVLFSPTLVKVWPEPKMWILGDLS--------------
391
+ >SRR3954454_15902270 80 0.309 6.824E-15 2 85 91 154 237 247
392
+ --RVELVLYVSSASAASVQARRNLEQVLERFERSQIKCSVCDLVRDPLAGTDDRVAFTPTLVKRFPEPRMWVIGNFRDPEVVADLL-----
393
+ >SRR5438045_7802661 70 0.270 2.599E-11 0 84 91 42 125 126
394
+ RRKVELVLYTSAASEKCQRAIRSIQQVLERYERDQVSFTICDLCCDPEAGDADAVIFTPTLVKRGAEPKRWIVGSL-DRPRLGAG------
395
+ >AntDryMetagUQ889_1029465.scaffolds.fasta_scaffold07537_1 79 0.297 1.287E-14 2 85 91 66 149 159
396
+ --RVELVLYVSSASPASVQARRNLERLLAGFDGSQVKFTVCDLVRDPTAGDSDRVAFTPTLVKRYPEPRMWVLGNLREPQIVADML-----
397
+ >SRR5678816_3727367 82 0.333 1.920E-15 0 86 91 82 168 182
398
+ RPPIELVLYVSSLSPHSIAALRNLRQTLAQYGGGAVKLTVCDLSKDPSLADRDGVHFTPSLVTTGHGPRTWIVGHLGNPQVLQAFLE----
399
+ >18084|scaffold527937_2|-1158|01 84 0.380 3.934E-16 2 85 91 186 269 284
400
+ --KVELALYVTLPWPSSLRAQTNLSRVLARVPEGEVRLSVCDLARDPGRAERDNVLFSPTLVKVWPEPKMWILGDLTETEALADLL-----
401
+ >MGYP000592549691 85 0.345 1.520E-16 2 85 91 141 224 234
402
+ --KIELVLYVSAASPASMQAQGNMERVLASFNRDEVAYSVCDLQQNPETADHDRVVFTPTLVKRHPSPRLWIIGDLRDGDIVADLL-----
403
+ >BarGraIncu01122A_1022018.scaffolds.fasta_scaffold128416_1 83 0.290 7.416E-16 2 87 91 153 238 246
404
+ --KVELVLYVSSASPASLQARRNLEQVLSRFAAGQVRWTIRDLGREPLAGEDDRIAFTPTLVKRFPEPRMWVLGNLRDTDILADMLRI---
405
+ >3300006028.a:Ga0070717_10004647_5 86 0.395 8.062E-17 5 85 91 13 93 103
406
+ -----LRLYVAGDAPNSVAALVHLRAALAELPADRVDLEIIDVLQEPERGLRDDVLMTPMLVRHRPAPERRVLGNLGAARALRNVL-----
407
+ >SRR6185436_10889035 74 0.400 1.090E-12 5 76 91 30 104 105
408
+ -----LRLYVTGHTSSADRARAALRDVerrLAQQGEAQVIAEVIDVLDDPESAGRDQVFATPTLIRLTPAPQIRLFGDLS--------------
409
+ >SRR5438132_14018022 99 0.386 1.685E-21 0 87 91 7 94 105
410
+ RESFVVRLYVADREATAVRAIANLEALCREFLPRGCELEIIDILREPQRGLDDSIMVTPTLINLAPQPVRRICGDLGHPGRLRDGLGL---
411
+ >SRR5438552_3472945 79 0.369 1.287E-14 2 85 91 378 461 478
412
+ --KVELALYVTLPWPSSLRARSNPPRVLNGVPEGEVRLDVCDLAREPDRAERDNVLFSPTLVKVWPEPKLWILGDLSEPAVLTDLL-----
413
+ >GraSoiStandDraft_28_1057319.scaffolds.fasta_scaffold3853791_1 78 0.306 3.331E-14 0 87 91 124 211 219
414
+ RHKVELVLYVSSASPASIQARRNLEMLLSRFATNQVQWSVRDLGRDPLAGVEDRITFTPTLVKRFPEPRMWVLGNLRETDLLADMLRL---
415
+ >SRR5688572_14330442 104 0.411 3.763E-23 3 87 91 54 138 144
416
+ ---FKLRLYVAGNTPNSAQARTNLRALCRTHLSGRHEIEIVDVTREPNRALTDGIYMTPSLLKLAPSPVRMIVGTLSHPESLMDALGL---
417
+ >UniRef100_A0A512HC53 79 0.382 1.327E-14 5 85 91 3 83 91
418
+ -----LDLYLAGRSRNSMRALHNLKEWLAQYGGEGVELRVIDVLEHPDRALDEGVLVTPTVIRREPDPIRIVVGTLDLPDDVTVLL-----
419
+ >SRR5687768_18073040 77 0.329 8.623E-14 4 85 91 30 110 117
420
+ ----QLVLYATAGSTSSSRARRNLEAVLERFDPATYELAICDPSVEPLRAEDDRVVFAPTLVRRGPQPG-WFLGDLSNTAALHDML-----
421
+ >SRR5215510_1611124 79 0.297 2.426E-14 2 85 91 156 239 241
422
+ --RVEFVLYVSASSPASGQARRNLEQLLDRFDAAQVKYAICDLGRDPMAGEHDRVAFTPTLVKRYPPPRMWLIGSLRETETIADIL-----
423
+ >SRR4029453_630509 82 0.329 1.920E-15 0 90 91 63 153 157
424
+ KPSIELVLYVSSDSPHSVAALRNLRRTLAQYAGDAVRLTVCDLSKDPSLAERDGVHFTPSLVTAGRGPRTWIVGHLGNPQVLQAFLESALE
425
+ >SRR5678816_1688620 78 0.325 4.573E-14 2 87 91 85 170 177
426
+ --RVELVLYISPASFPSRAAERELRTILSQYDAGRVSLRIADVSRETSDAARDHVIFTPTLVKRRPEPLVWVVGDLTHVEVVHDLLQL---
427
+ >SRR5262252_1015889 79 0.311 1.287E-14 1 90 91 52 141 145
428
+ -PSIELVLYVSSVSPHSMAALRNLRRTLAQYGGDAVRLTVCDLSKDPSLADRDGVHFTPSLVTTGHGPRTWIVGHLGNPQVLQAFLESALE
429
+ >SRR5687768_1317515 79 0.309 1.287E-14 2 85 91 203 286 295
430
+ --KIELILYVSSESPASLQALRNLDRALARFDASQVKLTVHDLARTPEAGAADRVAFSPTLVKTFPEPRMWVIGNLRDPEVLEDLL-----
431
+ >UniRef100_A0A958FYA2 80 0.382 9.667E-15 2 82 91 1 81 90
432
+ --KISLRLYYTGGSPVSELARVALKALQSRHSSVQFEIEEIDVVLYPDAAEADGILATPTVIKFSPLPIAKIVGDISSLEQVL--------
433
+ >MGYP001204866127 86 0.357 5.872E-17 4 87 91 50 133 142
434
+ ----HLRLFVTGGTSLSAAAVARLKELEEKLPADFLSMEIVDVLEDPDSAENNRVLATPTLIRMSPLPMIRVVGDVESVDRLMQLLDL---
435
+ >SRR6185503_18435269 106 0.418 5.624E-24 4 89 91 38 123 130
436
+ ----QLRLYVAGDSPRSEQAIRSIRRLDGTRLAGRYDLEVIDVLTQPERAESDHVLATPTLLRLSPGPCRRILGDLGDLDRLITALVPPP-
437
+ >UniRef100_A0A964QAK1 95 0.414 4.135E-20 4 85 91 7 88 99
438
+ ----QLRLYVAGDSPRSQLAIRSLRRLDGTPLAGRYDLEVVDVHDQPYRAEVDHVLATPTLLRLSPGPCRRILGDMGDLDLLMRNL-----
439
+ >SRR3954462_9021407 91 0.455 1.797E-18 3 90 91 18 107 115
440
+ ---YALRLYVAGGTKHDGAAVRAVERLRERLGakGATIELEVIDVLAAPDRAEQDRILATPTLVRVTPQPARKIVGDLGDVERVSRMLDLGPE
441
+ >MGYP001286529226 86 0.373 5.872E-17 5 87 91 9 91 108
442
+ -----LRLFVTGGSLYSRRALVTIAELTRRMPDLHCEGEVIDLLEQPERAGLERIMATPTLIRLEPEPARRIVGDLRDADSLMSVLEL---
443
+ >SRR5688572_18846285 104 0.436 2.741E-23 4 90 91 42 127 133
444
+ ----QLRLYVAGASPRSEQAIKHLQRLDGTELAGRYDLEVVDVFREPDRAEADRVMATPTLLRIAPGPCRRILGDLGDLDALLRALG-PAE
445
+ >SRR5580700_4777658 89 0.452 6.385E-18 3 75 91 46 118 119
446
+ ---YQLTLFVSGASELSARAIVDARRLCEMGGPGRYQLVVVDVHDEPDAALANDILATPTLIKHRPLPVRRLVGDL---------------
447
+ >UniRef100_K9VYB0 94 0.421 1.070E-19 3 85 91 53 135 141
448
+ ---YIFRLFVSGHNLDTERTLQILHRLLEQSLGHPYTLKVIDIFKHPEQAEANSISATPTLIRISPQPIKRIVGELDDVERVLKLL-----
449
+ >14399|Ga0335069_12962689_1|-154|01 106 0.528 1.060E-23 1 89 91 8 96 109
450
+ -EHLSLKLYVAGHTARSECAVAQARRLSELQFGGRCTLEIVDVVENPEIAEEERILATPTLVKVSPPPVRRIIGDLTRLDDVLAGLGLLP-
451
+ >A0A1Z4NNA9 93 0.518 1.955E-19 3 85 91 166 248 256
452
+ ---YVLRLFIAGHTLHTERILQTLHELLEKHLSHPYTLKVVDVLTHPDQAEINQVSATPTLVKVFPPPMRRIIGNLESAERILQML-----
453
+ >UniRef100_A0A1Z4NNA9 95 0.518 5.677E-20 3 85 91 166 248 256
454
+ ---YVLRLFIAGHTLHTERILQTLHELLEKHLSHPYTLKVVDVLTHPDQAEINQVSATPTLVKVFPPPMRRIIGNLESAERILQML-----
455
+ >UniRef100_A0A0V7ZRD6 98 0.505 6.176E-21 4 86 91 166 248 256
456
+ ----VFRLFIAGHNPATEHILQTLHEILEKYLGHPYTLKVIDVLSHPEQAEANQVTATPTLVKVWPHPIRRIVGDLNNIGKILQNLG----
457
+ >UniRef100_K9TAR4 93 0.493 2.017E-19 3 83 91 162 242 247
458
+ ---YVLRLFVAGNDLTTKRTLETLHQVLEQQLQHPYTLKVIDILKHPELAETNQVSATPTLVRVWPRPVRRIVGELEDLQRAIQ-------
459
+ >SRR5262249_2417559 100 0.447 6.515E-22 3 87 91 15 99 103
460
+ ---YRFHLYVAGASMQSRRAIMRINEIGRRYLDGSYELQVVDILQNPEKVAEAGVVATPTLIKTSPPPVRYFVGDLSDTKKIVTGLAI---
461
+ >UniRef100_UPI002021B723 99 0.518 2.387E-21 3 85 91 209 291 297
462
+ ---YVLRLFVAGHSATTERILQTLHQLLEQYLHHAYTLKVIDVFKHPEQAEADQVSATPTLVKVWPQPIRRLVGELDNLEKLLQIL-----
463
+ >HubBroStandDraft_5_1064220.scaffolds.fasta_scaffold442189_2 95 0.505 5.502E-20 3 85 91 192 274 280
464
+ ---YVLRLFVSGSNPNTEHTLVTVHQLLEQSLNHPYTLKVIDVFKHPEQAESDQISATPTLIKIWPKPVRRIVGELNDAEKIRRLL-----
465
+ >S7VCH7 116 0.523 2.810E-27 3 86 91 1 84 90
466
+ ---YSLTLFITGNGPASARAEQNLRRICDHAMDGQVRLEIVDVLQSPELAEEEGILATPTLIKRAPPPIRRLIGDLSDEAQVLAGLD----
467
+ >UniRef100_S7VCH7 113 0.523 1.939E-26 3 86 91 1 84 90
468
+ ---YSLTLFITGNGPASARAEQNLRRICDHAMDGQVRLEIVDVLQSPELAEEEGILATPTLIKRAPPPIRRLIGDLSDEAQVLAGLD----
469
+ >CoawatStandDraft_6_1074263.scaffolds.fasta_scaffold645439_1 82 0.439 1.398E-15 4 85 91 165 246 250
470
+ ----VLRLFVSGHSAMTEQILTTLQGVLESSRYQPYTLQMVDVSKHPEQAEADQVAATPTLVRVSPRPVRRLVGDLDNPRAILSLL-----
471
+ >UniRef100_UPI00045E5B31 63 0.308 8.088E-09 5 84 91 137 217 233
472
+ -----LTLYVVSLNSETRRLVEQITvALAKLYDPGHWVLDVVEVLGMPEKALEKDVFATPMLVRDVPEPVLKLLGDLSRVPSVIAA------
473
+ >UniRef100_A0A969VCT1 102 0.530 2.598E-22 3 85 91 181 263 269
474
+ ---YVLRLYVSGSNPSTERTLVTIHQLLEQSLHHPYTLKVIDVFKHPEQAEEDQISATPTLIKIWPKPVRRIVGELNDAEKIMRLL-----
475
+ >UniRef100_A0A517P709 83 0.341 5.573E-16 3 81 91 19 97 153
476
+ ---YRFQLFVTGNSLLSRRAREHVERHLVGPLGNRAEVEIVDLIADPIAARRERIVATPTLIRLEPSPVVRLIGDLTDFDRV---------
477
+ >SRR3954454_16723622 97 0.457 8.218E-21 5 87 91 31 112 116
478
+ -----LTLYVVRGTPASERAIATIEQLRAA-LPGTVKIEVIDVADQPEVAETERIVATPMLVRVAPAPVRRIVGDLSDLDRVRWGLGL---
479
+ >SRR5581483_7974966 86 0.344 5.872E-17 1 87 91 35 121 126
480
+ -PPLHLRLFVSGSSTTSLHARAAVDRLQRDGFVVAESVEIVDVLAEPERAAADRVLVTPTLLRVAPAPSRRVLGDLSDLAAVARALGL---
481
+ >SRR4030095_4975553 78 0.320 4.573E-14 5 85 91 40 120 126
482
+ -----LVLYISANSRYASVARCNCQRLLDRFDPRQVRFEVCDIGAHPERAEEDSVCYTPMLVKRHPLPRAYVLGDLSNGEPLVHLL-----
483
+ >UniRef100_D0LT55 65 0.315 1.206E-09 0 87 91 141 232 237
484
+ KERVELCLYLHSGTSASAAAETNLQNALADFDTTRLQLESIDLARSPRAAhPEDKVVFTPTLVRRGPGSRLALVGDLGDRallDSVLLAAGL---
485
+ >22902|Ga0257122_1006421_4|-3359|00 108 0.510 1.154E-24 2 85 91 1 84 99
486
+ --KYYLTLYVTGETPNSQRAIANLEKLSEECDADEFDIQIIDLLKHPDLAAEDEIIAVPTLVKKLPKPMQKIVGDLSNCEEVLLGL-----
487
+ >UniRef100_A0A6L9ZJT6 103 0.530 1.004E-22 3 85 91 168 250 262
488
+ ---YVLRLFVSGNSIGTERAMKSLHQILEQSLSHPYTLKVIDVLQHPEQAEADQITATPTLIRVWPLPVRRIVGEFNDVEKILTLL-----
489
+ >A0A0M1JMY3 85 0.421 1.520E-16 3 85 91 182 264 269
490
+ ---YVFHLFVSGRSAITQRTMEILHQILEDSLGMTYTLKVIDISRHPEQTEIYQITATPTLVKIWPLPMRKIVGDLENLDKLRQVL-----
491
+ >UniRef100_UPI001E41D742 88 0.421 1.705E-17 3 85 91 178 260 265
492
+ ---YVFHLFVSGRSAITQRTMEILHQILEDSLGMTYTLKVIDISRHPEQTEIYQITATPTLVKIWPLPMRKIVGDLENLDKLRQVL-----
493
+ >SRR6185503_14955318 93 0.400 1.955E-19 3 87 91 17 101 116
494
+ ---YEFDLFVVGGSEKAKRAEENLRRLGDEVLGGAYELRIIDVLENGEAAEAANIVATPALVRRAPLPVRMIVGDLSEPTWLAHGLGL---
495
+ >SRR6185312_15773949 92 0.407 5.058E-19 2 82 91 38 118 119
496
+ --KYELELFVVGGSVKAQRAEQNLRRLCDALLAGRYELRITDVLDNADAAEEANIVATPALLRRAPLPVRMVVGDLSERDALL--------
497
+ >SRR6476619_1219783 96 0.411 1.549E-20 3 87 91 58 142 149
498
+ ---YKLELFVVGGSVKAQRAEQNLRRLCDSALAGRYELRITDVLDNADAAEEANIVATPALVRRAPLPVRMVVGDLSERDALAYGLGL---
499
+ >SRR5690606_38631219 98 0.395 5.986E-21 2 87 91 31 116 122
500
+ --KYQLELFVVGHSSKAQRAEHNLRRLCDARIAGRYELQITDVLENADAAEAANIVATPALVRRAPLPVRMVVGDLSERSALVYGLGL---
501
+ >SRR5688572_23129716 89 0.360 4.650E-18 2 87 91 11 96 109
502
+ --KYDLELFVVGGSARGRHAEDNLRRLCDASIAGHYTLRVTDVLENAEAADAANVIATPAVLRHSPLPRRMIVGDLSRRDALVHGLGL---
503
+ >SRR5688572_20232672 92 0.412 5.058E-19 2 81 91 74 153 154
504
+ --KYELELFVVGHSSMARRAEHNLRRLCDQTIAGRYELRVTDVLENAEAAEAANIIATPTLVRRAPLPVRMVVGDLSRRDAL---------
505
+ >SRR6187401_742178 94 0.400 1.037E-19 3 87 91 21 105 118
506
+ ---FQLELFVVGRSAKAQKAEQNLRRLCEAKLAGRYELRVTDVLENADAAEAANIVATPALVRRAPLPVRMVVGDLSERSALAYGLGL---
507
+ >SRR5687768_3218532 105 0.395 1.997E-23 0 90 91 9 99 111
508
+ RPTYQLELFVVGHSAKAQRAEHNLRRLCDEKLAGQYELRITDVLENADRAEAANVVATPALIRRAPLPVRMVVGDLSERDALAYGLGLEAE
509
+ >SRR6187401_2120497 94 0.383 7.553E-20 2 87 91 11 96 111
510
+ --KYELELFVVGHSSKAQKSKHNLRRLCEAMLAGRYELRVTDVLENADAAEAANIVATPALVRRAPLPIRMVVGDLSERKALVFGLGL---
511
+ >SRR5262245_48067131 116 0.445 2.047E-27 5 87 91 16 98 109
512
+ -----LRLYMTGTRSRSVRALENIRKICEEFLPGEYELEVVDLYQQPEKAAQEQIVAAPTLVKYYPLPARRVIGDMSDSDRVLHGLEL---
513
+ >UniRef100_A0A934ZFS7 93 0.383 2.017E-19 2 87 91 3 88 103
514
+ --RYQLELFVVAHSTKAQRAEHNLRRLCDAKLAGQYDLRITDVLEDAAAAEAANIVATPSLVRRAPLPVRVVVGDLSRRESLLYALGL---
515
+ >UniRef100_A0A852ZR84 106 0.390 1.093E-23 2 88 91 35 121 136
516
+ --PYVLTLFVFGPDESSRRAATHLRRLCDELVGGQYRLEVVDVGEDPELAEEFGIFVTPTVVRTQPLPQFRVIGDLSDDARTAAALGFP--
517
+ >ERR1700733_14812660 121 0.482 4.579E-29 3 87 91 67 151 163
518
+ ---YRLRLIVAGRTTRSQRAIENLRRICDEHLGGQVDLEVIDIYQQPELAEKYQVIAAPTLIKLLPLPIRRVIGDLSEKERVLRGLEI---
519
+ >MGYP001377341570 91 0.400 1.309E-18 3 87 91 3 87 96
520
+ ---YQFKLFVTGETVHAAAARATVDRLCDALQIEAHAVRIIDVLEEPDLAAADRIIATPTLIRTSPQPERRVIGDLSDLDTLLKTMRL---
521
+ >SRR5690242_13802588 108 0.441 1.154E-24 5 90 91 20 105 118
522
+ -----LELYVMGDSPKSRAALDNLRRICERRLAGRYDLQVIDVIEQPDAAEAANVVATPALIRRGPAPVVRVVGDLSDRTALVHALGLDAE
523
+ >UniRef100_UPI001FB8EED0 112 0.404 5.013E-26 3 86 91 13 96 110
524
+ ---YQLRLFIAGTSPRSQRTIENLRRICREHLADRHSLVIVDIYQQPELAEAAQVVAAPTLLKLTPEPLRRIVGDLSDEARVLRGLG----
525
+ >UniRef100_A0A521U920 100 0.441 6.722E-22 2 87 91 3 88 109
526
+ --RYELELFVVGHSAKAERAESNLRRLCEARLAGRYDLRVTDVLEDAEAAEEANIIATPTLLRRAPLPVRMVVGDLSHRGALLRGLGL---
527
+ >UniRef100_A0A933USY1 86 0.416 6.058E-17 4 87 91 8 90 97
528
+ ----VLRLYIAGHSPNSVVALANLDWLRRAHFQDA-TVDIVDILREPERAMADRVITAPALFKVAPSPPRMLLGNLSDATKVLQGLGL---
529
+ >SRR5471030_2470326 96 0.480 1.549E-20 3 79 91 115 191 193
530
+ ---WQLRLYVVDQTVKAVTAYTNLKKIYESRLKGRYRITVIDLLKHPQLAKGDQILAIPTVVRKLPVPIRTIIGNLSDTD-----------
531
+ >UniRef100_UPI001F066344 109 0.411 6.318E-25 3 87 91 14 98 109
532
+ ---FHLRLYVAGQTPRSALAQANLYALCEARLPGRHHIEIVDLMDEPARARTDGVIAVPTLIRVSPTPVRRVVGDLSDTARLLAGLEL---
533
+ >14341|Ga0209698_10565634_1|+23|00 112 0.348 6.670E-26 1 89 91 33 121 134
534
+ -PRYEFRLYIAGTNLNSVRAIENVRRLRKSLRPSRCKLEIIDLYQQPALAKRDQVVAAPALVKLYPLPRRTFVGDLSDSARVVAGLGIIT-
535
+ >UniRef100_UPI001BEB70D6 108 0.465 1.191E-24 2 87 91 10 95 99
536
+ --RYRLRLVIAGNSERSRRAIENLQHLCAEHLSGQVDLEVVDIYQRPELAEEYQVIAAPTLVKLLPLPVRRIIGDLSQEDRVLHGLEI---
537
+ >SRR5215218_6711373 87 0.404 2.269E-17 4 87 91 12 95 102
538
+ ----VLRLYIAGNSSSSRRAEQNVMRLRDHMTADAWKIEIIDVLATPELAEQASILATPTLSYDNAGRPRRIVGDLSDTKRILDYLGI---
539
+ >SRR6188768_3491203 81 0.366 2.636E-15 2 72 91 41 111 112
540
+ --KYELELFVVGRSSKAQKAEHNLRRLCEARLAGRYELRITDVLENADAAEAANIVATPALVRRAPLPVRMVV------------------
541
+ >SRR6187402_177262 96 0.470 2.126E-20 4 88 91 0 84 99
542
+ ----ELTLFVAGDTAKSALAATKLRHICESLARGNYTLAIVDVLKDSAAAEREKILVTPTLIKRSPPPTRRLLGDLTATAKVVETLGLP--
543
+ >SRR6059058_1539947 101 0.404 4.746E-22 3 86 91 13 96 103
544
+ ---FHFRLYVAGDTPNSERARVNLGALCRKHLVGRYKIQIVDVFKDPNRAMIEGIFMTPTLIKVAPSPIRMVVGTLSQSAALMEALG----
545
+ >SRR3984957_851882 121 0.476 3.336E-29 2 87 91 14 99 105
546
+ --RWLLRLYVAGQSPKSLQAFANLMRIRDEHLGSEYEIEIVDLLENPQLAEGDEIVAIPTLVRRLPHPMRKIIGDLSDTDRVLVGLQL---
547
+ >SRR3954470_18350552 106 0.447 5.624E-24 3 87 91 19 103 112
548
+ ---YQFRLYVAGDTPNSERARVNLGALCRKYLVGRYKIQIVDVFKDPDRAMVEGILMTPTLIKLAPSPVRMVVGTLSPSESLMDALGL---
549
+ >MGYP001434196702 99 0.388 2.314E-21 3 87 91 34 118 123
550
+ ---YKFRLFVADDTLNSAQASVNLAALCRAHLPGRHEIEIVDVLLEPKRALAEGVFLTPTLIKFSPLPVRRIVGTLSEPLTVLRALGL---
551
+ >UniRef100_A0A4V1DI91 40 0.289 5.717E-01 0 74 91 10 85 373
552
+ QRRYLKLLLVAAPHHRATPDLRGLVAFLEnQDFGFDVSLEIADPAERPELLELHRLVATPALIKLDPTPKQVFAGN----------------
553
+ >MGYP000867160857 47 0.297 3.483E-03 2 74 91 29 102 379
554
+ --RYLKLLLVAAPHHRANPDLRGLVAFLENQDFGfDVTLEIADPAERPELLELHRLVATPALIKLEPTPKQVFAGN----------------
555
+ >UniRef100_UPI002001187B 44 0.311 4.536E-02 0 75 91 9 85 372
556
+ RRPHLKLLLVAGTRHRASADVRSLVAFLEKEDFGfEVSLELADPAQRPELLELHRLVATPALIKLEPAPKQVFAGNM---------------
557
+ >MGYP001295239604 51 0.328 1.461E-04 7 75 91 50 119 406
558
+ -------LLVAGTRHRASADVRSLVAFLEKEDFGfEVSLELADPAQRPELLELHRLVATPALIKLEPAPKQVFAGNM---------------
559
+ >MGYP000025237920 48 0.358 9.797E-04 7 81 91 40 117 121
560
+ -------LLVASPHHRATPDLRGLMAFLEHEDFGfDVQLDVVDPALRPELLELHRLVATPALIKLEPSPRQVFaeIGRASCRERV---------
561
+ >UniRef100_A0A560LTW2 39 0.277 2.028E+00 5 74 91 17 88 381
562
+ -----LTLLIVATSQHlSSPGLRGVLQFLESHDYGfELNLQIADPAKRPELLELYRLVATPAVVKLHPAPRQVFAGN----------------
563
+ >UniRef100_A0A076H9F6 41 0.329 4.166E-01 7 84 91 14 90 383
564
+ -------LLVAARHHLSGQDLRSLVQYLEREDVGfEVTLQLADPSQQPELLELHRLVVTPALIKLSPSPKQVFAG--SNIHQQLKG------
565
+ >MGYP000311810024 47 0.306 2.536E-03 0 73 91 19 92 206
566
+ RQPLKL-LLVAARHHLSGQDLRGLVQFLErEDLGFEVTLQVADPSQQPELLELHRLVVTPALIKLAPNPKQVFAG-----------------
567
+ >UniRef100_A0A968YHU9 46 0.295 4.933E-03 4 73 91 22 91 390
568
+ ----QLLLFVDKRAT-AKEQIQKISQYLETlEPQCDFELHVVEVAEQPYLVEHYKLVATPALVKIRPEPRHILAG-----------------
569
+ >UniRef100_UPI000B35C714 42 0.269 1.611E-01 1 74 91 13 88 385
570
+ -ERQVLHLLLV--ATRQQLAGQDLRTLLqllrREDLGFEVSLEVADPRRQPELLELHRLLATPALVKLAPAPKQVFAGN----------------
571
+ >MGYP001088475876 43 0.310 8.285E-02 1 73 91 12 84 189
572
+ -KKLELIL-VAGRKHLSRKDISEMLKFLEsKECNFEVSIQLSDPTKQPELLELHRLVAIPALIKIFPEPKQIFAG-----------------
573
+ >UniRef100_A0A7Y3TRI1 56 0.281 2.441E-06 4 73 91 15 84 443
574
+ ----QLLLFIDKR-PSSREQVQQVRMALKELReECDFELQIVDVSEQPYLAEYFRLIATPALVKLHPEPRQILAG-----------------
575
+ >UniRef100_A0A352XFA5 54 0.309 8.682E-06 4 73 91 15 84 396
576
+ ----QLLLFIDER-PTSRKHIHRIRSYLETLRADyPFELMLISVGEHPYLAEHFRLVATPALIKIHPPPRQTLAG-----------------
577
+ >UniRef100_A0A5J6Q9E6 42 0.306 2.211E-01 0 73 91 3 76 394
578
+ RPELRLLL-VASKAHAASQDVRSMMALLEQDDCGfQVTLKLADPRQQPELLELHRLVATPALVKLLPLPRQTFVG-----------------
579
+ >UniRef100_A0A139WUD4 60 0.328 7.451E-08 2 73 91 13 84 395
580
+ --PLQLLLFVDGR-PKSRQQVQRILSYLEELQADcKFELQIVDVGQKPYLAEHFKLVATPALIKIHPEPRQILAG-----------------
581
+ >UniRef100_A0A926Y835 54 0.352 8.682E-06 4 73 91 15 84 391
582
+ ----QLLLFV-DDRPSSRKQLQQIYSYLEQIKADNsFELQVVEVGEEPYLAEHFKIVATPALIKIHPAPRQALAG-----------------
583
+ >UniRef100_UPI0020A7ECEA 56 0.295 2.441E-06 4 73 91 15 84 397
584
+ ----QLLLFIDER-PSSRKHIHRIRSYLETLRADyPFELMVVSVGEHPYLAEHFRLVATPALIKIHPLPRQTLAG-----------------
585
+ >UniRef100_A0A068MZV1 50 0.260 2.071E-04 2 73 91 13 84 383
586
+ --PLQFLLFI-DDRPSSQDSVQEISQCLGTLVDGHsYDLQILQISKHPHLVEHFRLVATPSLIKLQPEPRQVLAG-----------------
587
+ >UniRef100_UPI0016873C22 61 0.352 3.951E-08 4 73 91 16 85 382
588
+ ----QLLLFV-DQRPSSQEHIRQVRQFLEELNaQDEFELQIIDVGEQPYLAEHFKLIATPTLIKIHPEPRQVLAG-----------------
589
+ >UniRef100_A0A0M1JTJ0 57 0.347 9.426E-07 4 74 91 15 85 394
590
+ ----QLLLFVDSR-PHSAEQIQEIRNYLKQWRTEfPYNLEIINVVEEPYLAEHYKLIATPTLLKLYPEPRQVLTGN----------------
591
+ >UniRef100_A0A261KTE7 53 0.309 3.088E-05 4 73 91 30 99 405
592
+ ----QLLLFINKR-PGSQEQIQAIRKSLSKLKTDyPFEFNVIDVGEQPYMAEHFKLIATPALLKIHPEPRQTLTG-----------------
593
+ >UniRef100_A0A3C0N9T6 52 0.323 4.240E-05 4 73 91 15 84 398
594
+ ----QLLLFVDER-LSSRKHIQRIRNYLKTLRIDyPFELMVVDVGEQPYLAEHFKLVATPALIKIHPKPRQILAG-----------------
595
+ >UniRef100_A0A6J4I559 51 0.323 1.098E-04 4 73 91 15 84 398
596
+ ----QLLLFVDER-LSSQKDLQQISSYLETLRAEyPFELMIVDVGEHPYLAEHFKLVATPSLIKIHPKPRQILAG-----------------
597
+ >UniRef100_A0A6P0QL46 59 0.352 1.930E-07 4 73 91 15 84 397
598
+ ----QLLLFVDER-PSSRKHIQRIRSYLETLKADyPFELTVVDVGEQPYLAEHFKLVATPALIKIHPNPRQTIAG-----------------
599
+ >UniRef100_A0A846D500 52 0.309 4.240E-05 4 73 91 21 90 392
600
+ ----QLLLFVDER-PSSQENIQQIHSYLESLKADyPFELQVIEIAEQPHLVEHFRLLATPALVKIFPAPRQTLAG-----------------
601
+ >UniRef100_A0A350Y740 56 0.309 2.441E-06 4 73 91 15 84 386
602
+ ----QLLLF-TDERPSSRKHIHRVRSYLETLRANyPFELRIVDVGEQPYLAEHFKVVATPALIKIHPLPRQTLAG-----------------
603
+ >UniRef100_A0YTH4 56 0.380 1.778E-06 4 73 91 15 84 394
604
+ ----QLLLF-ADQRPSSKEQIGEIRQFLEKLNcEEAYELQVIDVGQQPYLAEYFKLVATPALVKIFPEPRHILAG-----------------
605
+ >UniRef100_L8M9H5 47 0.295 2.617E-03 4 73 91 33 102 405
606
+ ----QLLLFVDKR-PGYRKKIQRVQAYLDDLkLEQDFQLEVIEIDKQPHLVEYFKLVATPALVKISPQPRQVLAG-----------------
607
+ >UniRef100_UPI001D02CB6E 63 0.315 5.889E-09 2 73 91 13 84 395
608
+ --PLHLLLFVDGR-PKSRQQVQRIRAYLKELQAEySFELEIIDVGQQPYLAEHFRLVATPALIKIHPEPRQILAG-----------------
609
+ >UniRef100_A0A6P0XB78 55 0.338 4.604E-06 4 73 91 15 84 390
610
+ ----QLLLF-TDERLSSRKTIQQIRHYLESlRMEYPFELKVVDVGKQPDLAEHFKLVATPALIKIHPQPRQTLAG-----------------
611
+ >UniRef100_A0A6M0G3C9 53 0.338 3.088E-05 4 73 91 15 84 390
612
+ ----QLLLF-TDERLSSRKNIQKIRHYLESLkMEYPFELKVVDVGKQPDLAEHFKLVATPALIKIHPQPRQVLTG-----------------
613
+ >UniRef100_A0A937N7H3 72 0.290 3.998E-12 2 87 91 2 87 396
614
+ --KFTLTLYIVGKGTDWASVVERLEAICKTDLAGHYGIESVDVANGQDLSGDMRILAPDAVMLWLPAPLQAPMNDLVNAKPGLVGLDL---
615
+ >23040|scaffold_1553752_c1_1|+3|10 100 0.448 1.228E-21 1 87 91 18 104 123
616
+ -PRFKFSVYIAGQTRQSELALARLRKICDEEIPANYEIEIIDLAKNPHLAKEHQILATPSIFRTLPAPVRKSIGNLSKADKTLLGLDL---
617
+ >ERR1700732_1595525 94 0.402 1.037E-19 0 86 91 15 101 108
618
+ KPRFKFRVYIARPTRKSDLALARLRALCAEAFPDDYDIEMIDLAKSPHLAKEHQILATPAVFRTLPAPVRKSIGDLSKTDKSRLGLD----
619
+ >SRR5882762_5740304 94 0.397 1.424E-19 0 87 91 36 123 136
620
+ KAQFKFQVYIARPTRQSDLALARLRAICDEAIPDDYDIEIIDLAKRPGLAGKFQIVATPTILRTLPAPIRKSIGDLSKTDKALLGLDL---
621
+ >SRR4030081_2822547 95 0.420 4.008E-20 1 88 91 31 118 121
622
+ -PRYKFSVYIASPTRASESALARLRKICEEQIPNEYEIEVFHLSKNPQLARDHNIIATPAIFRTLPAPVLKSIGDLSRTDQALLGLDLL--
623
+ >SRR6185295_12007002 113 0.471 1.879E-26 1 89 91 177 264 270
624
+ -PGWVLRLYVAGMNRTSARAVERVHAICDEYLAGRYELEVIDIYQLPALARGHQIVATPTLIRLLPAPLRRYIGDLSN-ENLVFGLDLKP-
625
+ >SRR5215831_4900645 114 0.558 9.973E-27 1 86 91 71 156 163
626
+ -PGLVLRLFICGTSPRAASAVKNLRFICESELHGAYSLEIIDVLEQPDLAEEAKVLATPTLIKLLPLPLRRIIGDLSDKEKLLIGLE----
627
+ >SRR5712672_2296523 88 0.409 1.652E-17 0 87 91 11 98 106
628
+ KTRFKFSLCIARVTEKSKAALARLRAICDETIPKKYDIKVIDLSKNPELARDHNIIATPAVFRTLPTPVRRSIADLSRNDRALLGLNL---
629
+ >12123|Ga0209625_1018534_1|-121|00 95 0.415 5.502E-20 1 89 91 15 103 115
630
+ -PNFRFQLYIGGPTRASDEVLGRLQAICDEAIPNDYAIEIIDLSKNPQLAKDHQIIATPSVFRTLPEPMRKSIGDLSLKQRTIIGLDLLT-
631
+ >SRR4051812_48803690 106 0.404 5.624E-24 1 84 91 45 128 135
632
+ -EHYSLALYITGSSPRSALAISAIRKICDTHLLGCYSLEIIDLTQQPLRARSEQIVATPTLIRRLPFPIRRFIGDMSLVERQLLG------
633
+ >UniRef100_A0A932SCN4 121 0.579 6.484E-29 0 87 91 15 102 111
634
+ QKVWTLRLYVAGQTPKSVTALSNLERICEAHLEGKYRIEVVDLLKSPQLARGDQIIATPTLVRRLPPPVKKIIGDLSNADRVLVGLDL---
635
+ >SRR5579871_1513814 128 0.540 2.104E-31 1 87 91 59 145 153
636
+ -ETWNLRLYVAGQSPKSLTAFSNLKRICETYLPGKYHIEVLDLLKNPQLAEGDQVVAIPTLVRRLPEPLRKIIGDLSNTERTLVGLDL---
637
+ >UniRef100_A0A517YME2 122 0.505 2.508E-29 1 87 91 17 103 120
638
+ -ETWELRLYIAGQTPKSVAAFRNLKKLCEEHLPGRYQIEVIDLMQHPQLAAGDQIVAIPTLVRRLPEPLRRIVGDLSNTERTLVGLQL---
639
+ >ERR1035437_1623514 104 0.453 2.741E-23 2 87 91 28 113 122
640
+ --KFVLRLFVAGATPRSRHAVRRVRELCETELKGNCELEVIDIYQQPGLARENQIVATPTLIIAFPPPLRRFIGNRTNITGLFVELDL---
641
+ >SRR4030081_12347 93 0.397 3.684E-19 0 87 91 29 116 123
642
+ KTRFKFAVYIARPSAESDAALARLRKICDETIPKNYDIRVIDLSKNPELARDHQIVATPAVFRTLPTTVRRTIGDLSNNDRALLGLNL---
643
+ >UniRef100_A0A062V6S3 115 0.505 7.496E-27 1 87 91 18 104 111
644
+ -EVWELRLYIAGQTARSDAALANLKRICEEHLAGKYRIEVIDLLKNPQIARDHQILATPTVIRKLPEPLKKTIGDLSQTERVLVGLDL---
645
+ >ERR1700676_5527870 97 0.393 1.128E-20 1 89 91 58 146 156
646
+ -PHFKFQLYIGRSTRASDAAITRLQAICDETIPDDYAIEIIDLSKNPQLAKDHQIIATPSVFRTLPEPIRKSIGDLSLKHKAIVGLDLPT-
647
+ >15488|Ga0208981_1210027_1|+2|10 90 0.363 3.387E-18 0 87 91 24 111 119
648
+ KTRFKFAVYIARPSAESDAALARLRKICDETIPKNYDIRVIDLSKNPELARDHQIVATPAVFRTLPTPVRKSVGELSSKDRTLLGLNL---
649
+ >SRR5258708_39684100 96 0.400 1.549E-20 0 89 91 38 127 150
650
+ KAHFEFCVYIANHTLRSDLALKRLKKICEENVPGDYEIEVVDIAKPPDIAKDRQIVATPAVFRTLPAPFRRLIGDLPHEERSLLGLDLFT-
651
+ >MGYP000303408308 113 0.574 3.540E-26 1 87 91 1 87 102
652
+ -KKFELKLYVTGQTARTETAMGNLKDLFDKELAEQYDLEVIDVLERPQLAEDERILATPTLIRKLPVPIRRIVGDLSNREQVLLGLDL---
653
+ >SRR5258708_26871038 87 0.363 3.115E-17 0 87 91 32 119 129
654
+ KHRFKFQVFIGKPSQKSDLAVARLREVCEAEIPGEYDIEIIDLSRTPELAGENNIVATPAVFRTLPAPVRKSIGDLVEKHKVLLALDL---
655
+ >ERR1044072_7554525 109 0.529 8.406E-25 2 86 91 123 207 214
656
+ --RYILKLYVTGRTSRAERAIANLRRLCEDELEGCYQLEGIDIVEHPQLAEDERVRATPTLVKQLPPPLRRGVGDLSSRAKGLFGLD----
657
+ >18065|scaffold45210_3|+2239|01 117 0.465 1.491E-27 0 87 91 45 132 140
658
+ RPYWNLRLYVAGSSPRSLAAVTNLTKVWEEHLPGPYSIEVVDLLEHPNLARADQILATPARVGALPSPIRRVIGDLSSRDRVLVGLEI---
659
+ >SRR5262249_21381219 117 0.534 1.491E-27 1 86 91 16 101 117
660
+ -EPFVLKLFICGASPRANSAVANLRHICEHDLQGHFTLEIIDVLEQPDLAEESKVLATPTLIKLLPPPLRRIIGDLSDKQKLLVGLD----
661
+ >SRR4051812_10739495 120 0.593 1.184E-28 2 87 91 18 102 110
662
+ --KFVLKLYIAGSSPRSQRAIANLHRICAEELPGS-EVDVIDVLQQPHLAEGARIMATPTLIKELPPPVRRIIGDLSDAEQVLLGLDL---
663
+ >SRR6266404_266030 86 0.417 8.062E-17 9 87 91 0 78 113
664
+ ---------IASPTRESQLALTRLRKICDEQIPNEYEIEVFDLRKHPELATRYDIVATPAICRTLPAPLRKSVGDLSKTEKALLGLDL---
665
+ >SRR5260370_32467315 102 0.422 1.834E-22 0 89 91 34 123 146
666
+ KPHYKFSVYIANHTLRSNSALERLKKICEENVPGDYEIEVIDIAKSPGLATDHQIVATPAVFRTLPAPLRKSIGDLSQKDKALLGLDLFT-
667
+ >UniRef100_V4JFK2 110 0.529 3.353E-25 3 87 91 3 87 94
668
+ ---YLLRLYIVGSTLQSERAIRNLRSICNKALHNRYRLEIIDVIEHPEAAQDAHIIATPTLIKELPPPLMRIIGDMSNQEKVLVGLDL---
669
+ >SRR5258708_27969647 100 0.448 1.228E-21 3 89 91 14 100 104
670
+ ---FKLRVYIGGEALESDRAVARLRKICDEAAPNDYEIEVVDLSKNPQLASRYQIVATPTVIRTLPSPVRKTIGHMSKREKVLLGLDLVP-
671
+ >UniRef100_UPI001565DCEC 109 0.482 8.673E-25 1 87 91 15 101 115
672
+ -EVWELRLYVAGQTARSMTAFANLKRIAEQHLRGRYRIEVIDLKADPQRADEDGILAMPTVVCKLPPPLRKVVGDLSDTEKALVGLKL---
673
+ >SRR5262245_2800084 119 0.541 1.625E-28 3 87 91 29 113 122
674
+ ---YVLRLYVAGMTARSMDAISRLKAICEEHLGEHYKLETIDIHQQPGLARDQQIVAAPTLIKELPPPVRRLVGDLTNRERVLVGLDL---
675
+ >ERR1039458_2802747 93 0.413 3.684E-19 1 87 91 15 101 123
676
+ -ESVELCLFVAGDAGPSARARRELEGLLVELGGGAWSIEVVDVLVRPDLAERARIVATPVLIRLAPLPRRSIIGDLSDWQVVAEVLEL---
677
+ >ERR1700677_35954 83 0.416 5.401E-16 4 87 91 12 95 102
678
+ ----ELCLFVAGNTGPSARARRELEWLRVELEEGGWSIEGIDVTERPDLAERARILATPVLTRLAPLPRLSMIGDLSDWKVVAEVLEL---
679
+ >SRR5450755_1004340 90 0.448 2.467E-18 4 90 91 126 212 214
680
+ ----ELCLFVAGEAGPSVRARRELDRLRMGLEGGGGRVDVIDVMERPDLAEQAGILATPVLIRLAPLPRRSIIGDLSDWEVVADVLELALE
681
+ >MGYP000397992808 97 0.447 1.128E-20 3 87 91 3 87 95
682
+ ---FSFQLFVAGDTPRSHLAASNLRDLLDRVAPDDYDLEVIDVLERPDLAEKERILATPFVLKISPPPTRRVVGDLTDLALAARALDL---
683
+ >SRR5437763_3353870 89 0.418 8.766E-18 2 87 91 37 122 129
684
+ --PVELCLFVVGESGPSVRARRELEAFRVARGGDGWRVVVIDVLERPDVAERERILATPVLIRMAPLPRRGIIGDLSDWEAVAEVLEL---
685
+ >SRR5450755_2832987 95 0.372 5.502E-20 2 87 91 20 105 112
686
+ --RVSLRMYVASDTAPSADARRQLAALCERLGGERWEVEVVDVFERPALAEADRIVATPVLIRLFPAPRLSVIGDFSDLDAVAAALDL---
687
+ >MGYP000274140882 87 0.419 2.269E-17 7 87 91 19 98 105
688
+ -------LYVAGRSERSALAEQNLRAV-TQRLHGPVQIEVIDLTRRPDLAEELDIVATPMVLRVLPEPPRRVVGDLSDQALLAQALDL---
689
+ >ERR1039458_272673 74 0.400 1.090E-12 14 88 91 24 97 113
690
+ --------------PRSLTA-PQLERLRPELEGGGLGVEVVDVMQRPDLAERARILATPVLMRLAPLPRRSIIGDLSDWRLVTEVLELP--
691
+ >ERR1051325_10531525 81 0.305 4.970E-15 1 85 91 28 112 122
692
+ -PRIELVLYVTAASSHSAAATRNCEALLSRFDRRSVVFEICDISLHPERAEVDGICFTPVLMKRMPLPRAYVIGDLSNTAALVDLL-----
693
+ >SRR5688500_6194403 81 0.329 3.619E-15 2 86 91 52 134 135
694
+ --RLSLRLFVAGDSPDSETAIANLEALFPN--GSEAEIEIVDIQREPARAARESIMLTPTLLKLAPSPACRILGNLKNRDALLELLG----
695
+ >SRR5438477_4004941 90 0.360 2.467E-18 0 85 91 467 552 558
696
+ RERVALRLYVSPASPPSVKARRNMEKLLERIGPVNVDFEVLDLALEPLRAETDNVVFTPTLVKHWPEPRVWILGDLSDPVVVGDLL-----
697
+ >SRR3954468_23038545 80 0.409 9.369E-15 5 87 91 32 114 129
698
+ -----LRLYVAGEGPNSARARANLQRLLADVDSSRYVLEVVDCLDEPLRALSDGVPPPQTLMPVTPPPQRTIVGSLSAMDHVADALEI---
699
+ >SRR5271165_1490500 102 0.383 1.336E-22 2 87 91 3 88 112
700
+ --KFKFRIYVAGDALNSAQALANLDAICREYLPDRHEIEVVDVFREPKRALTDGVFMTPTLVKVAPFPTRRIVGTLSQTRLVLQAVGL---
701
+ >UniRef100_A0A142HLX2 109 0.482 4.603E-25 3 87 91 14 98 109
702
+ ---YELVLYVAGATPNSTRAVRNIKAICEEYLPGRYALRILDIYQQPELAQQAQLVALPTLVRLRPLPQRRLVGDLSNRPVVLSVLGL---
703
+ >A0A142HLX2 117 0.482 1.087E-27 3 87 91 14 98 109
704
+ ---YELVLYVAGATPNSTRAVRNIKAICEEYLPGRYALRILDIYQQPELAQQAQLVALPTLVRLRPLPQRRLVGDLSNRPVVLSVLGL---
705
+ >SRR5512142_3015464 83 0.397 7.416E-16 0 87 91 23 110 118
706
+ RSEIRLCLYVAGNAPNSVAARANLSAALAALDNVSAAVEIVDVFERPDLAVQNEVYVTPMLLRLAPPPKCRIVGSLSDRDAIVNILDI---
707
+ >SRR5215208_4287972 81 0.285 3.619E-15 2 85 91 19 102 112
708
+ --PIELVLYISAASAHTAAARRNCEALLARFDQRRVHFEICDVSQHPDRADTDGICFTPVLMKKLPLPRTYVVGDLSNTTALVDLL-----
709
+ >SRR3954469_15268427 97 0.441 1.128E-20 2 87 91 18 103 115
710
+ --RLVLRLYVAGDAPNSALARANLKRLLDSLDRDQYALEIIDCLDEPLRALNDGVLVTPTLLRLSPEPGRTIVGSLSAIDHVADALDL---
711
+ >SRR5450432_1605686 81 0.348 2.636E-15 2 87 91 153 238 249
712
+ --KIEITLYLTLPWPSSARAQANLQRVLARVPDGHVRLNTCDLAQEPGRAEKDNVLFSPTLVKVWPAPKMWILGDLSEAGVLTDLLEL---
713
+ >SRR5688572_14714411 99 0.418 2.314E-21 2 87 91 50 135 150
714
+ --PVLLRLYVAGDAPNSSRARANLRRLLADVDPAKYDLEIIDCLDEPLRALNDGVLVTPTLVRVQPEPQRTVVGTLSALDHVADALDI---
715
+ >SRR5687768_10781478 82 0.305 1.398E-15 2 86 91 172 256 262
716
+ --PIELVLYISAHSPRSAAAIENIKRVLARFSSSRVSLTICDLSLEPHKGEADSVAFTPTLVKRSPGPRTYILGHLANPEVLVELLD----
717
+ >SRR5438105_8210914 83 0.302 7.416E-16 0 85 91 54 139 149
718
+ KTRIELVLYVSAASSHTATARRNCEALLARFDQRRVRFEVCDISRHPDRAESDGICFTPVLMKKRPLPRAYVIGDLSNTAALMDLL-----
719
+ >UniRef100_A0A2V8GSN4 79 0.302 2.503E-14 0 85 91 17 102 112
720
+ KTRIELVLYVSAASSHTATARRNCEALLARFDQRRVRFEVCDISRHPDRAESDGICFTPVLMKKRPLPRAYVIGDLSNTAALMDLL-----
721
+ >12689|scaffold1791547_1|-2|11 80 0.317 9.369E-15 1 85 91 85 169 187
722
+ -PSVELVLYVSTASSYAASATRNCEALLARFDRRAVRLEICDVSEHPDRAETDGICFTPVLLKKQPLPRTYILGDLSNTAALVDLL-----
723
+ >UniRef100_A0A948CES1 102 0.388 1.379E-22 3 87 91 56 140 151
724
+ ---YRFQLFVSGSSPRSTLARANLTKVCDETVPGNYTIEVVDVLLRPDLAEESSILATPLVVRVSPHPPRRAVGDFTDLERLAAAMGL---
725
+ >SRR6185295_9098939 82 0.395 1.920E-15 2 87 91 339 424 440
726
+ --RIELALYVTMPWPSSLRAKANLGRVLARVPEGLVHLNVCDLAREPLRAELDNVLFSPTLVKVWPAPKMWILGDLSEPEVLTDLLAL---
727
+ >SRR4051812_7426335 87 0.392 3.115E-17 5 88 91 15 98 101
728
+ -----LRLYVAGNAPNSTKARRNLDALLASFEPSSYQLEVIDCLSEAGRTLADGVIVTPTLVKFEPAPAATVIGTLSDADAVRAILRGP--
examples/data/KaiB_seq_ids.txt ADDED
@@ -0,0 +1,364 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 101
2
+ MGYP000886600007
3
+ UniRef100_A0A971TK21
4
+ ERR1044071_8151622
5
+ UniRef100_A0A1F2SDQ0
6
+ SRR5580692_7392111
7
+ K9S6Z6
8
+ SRR3954470_14000739
9
+ UniRef100_UPI0018DCCDB9
10
+ SRR5688500_3073545
11
+ MGYP000149628109
12
+ 3740|scaffold08918_4|-3862|00
13
+ SRR6187401_1816177
14
+ SRR5512147_2964451
15
+ 26123|scaffold_438712_c1_1|+1|10
16
+ 12684|Ga0207652_11722284_1|+1|10
17
+ SRR5580704_16544830
18
+ UniRef100_A0A3N5XMK1
19
+ UniRef100_A0A7W0IU42
20
+ ERR1043166_4517172
21
+ MGYP001057477004
22
+ 26195|Ga0315277_11393154_2|-158|01
23
+ UniRef100_A0A8J7LTM0
24
+ 12918|scaffold901211_1|+1|10
25
+ SRR5215510_15873647
26
+ SRR5512142_679953
27
+ UniRef100_UPI00190A34C5
28
+ SRR4051812_34338196
29
+ UniRef100_A0A3N5LPZ6
30
+ UniRef100_A0A950DFQ4
31
+ 5937|scaffold842798_1|-17|01
32
+ SRR4030095_15264891
33
+ UniRef100_K9UBR2
34
+ UniRef100_UPI001E525CD0
35
+ UniRef100_A0A845X7U1
36
+ 3300017444.a:Ga0185300_10001144_2
37
+ UniRef100_A0A349JMI1
38
+ UniRef100_K9SDH8
39
+ UniRef100_A0A3M1PH87
40
+ UniRef100_A0A1C0V439
41
+ UniRef100_UPI001E4DBDD3
42
+ UniRef100_A0A969FMT9
43
+ UniRef100_A0A0M2Q0K7
44
+ UniRef100_A0A978SUS5
45
+ UniRef100_A0A930TQ40
46
+ UniRef100_A0A2W4XV74
47
+ UniRef100_A0A939KSU8
48
+ UniRef100_A0A8K1ZWU9
49
+ UniRef100_UPI001C72C3C0
50
+ UniRef100_A0A6P0TIA3
51
+ UniRef100_A0A5B8NIC7
52
+ UniRef100_UPI002012D1AC
53
+ UniRef100_UPI001C0312E0
54
+ UniRef100_A0A928Z921
55
+ UniRef100_A0A351L2B1
56
+ UniRef100_A0A8J7E209
57
+ UniRef100_UPI00232B6744
58
+ UniRef100_UPI0018EFA1D6
59
+ UniRef100_U5DJK1
60
+ UniRef100_A0A832M402
61
+ UniRef100_A0A1C0VJA3
62
+ UniRef100_A0A969K2I2
63
+ UniRef100_K9X4T7
64
+ UniRef100_W7Q8H4
65
+ W7Q8H4
66
+ A0A1H7MTT5
67
+ UniRef100_A0A1H7MTT5
68
+ A0A1P8R863
69
+ UniRef100_A0A1P8R863
70
+ UniRef100_UPI001CD124AC
71
+ UniRef100_A0A2N5Y751
72
+ UniRef100_A0A2N7UCY5
73
+ MGYP001039280987
74
+ UniRef100_U5T3B0
75
+ UniRef100_A0A540VSD6
76
+ UniRef100_A0A2S6G6T8
77
+ UniRef100_UPI00124EBEF7
78
+ UniRef100_UPI001439CAB5
79
+ MGYP000294044467
80
+ UniRef100_A0A4Q8CZA3
81
+ 3104|Ga0306908_1123748_1|-11|01
82
+ UniRef100_UPI00133045FA
83
+ UniRef100_UPI00201FFA50
84
+ UniRef100_A0A845V233
85
+ UniRef100_UPI00082AD250
86
+ MGYP001134272031
87
+ UniRef100_UPI00037CF37C
88
+ UniRef100_UPI0003674D18
89
+ UniRef100_UPI00047687D0
90
+ UniRef100_A0A3S0W7L6
91
+ UniRef100_UPI001903F48C
92
+ SRR4051794_37995438
93
+ UniRef100_A0A318VX16
94
+ UniRef100_A0A372BWN2
95
+ 3300017992.a:Ga0180435_10008823_6
96
+ UniRef100_UPI000401CEFF
97
+ 16161|scaffold59688_2|+220|00
98
+ SRR5690606_35087643
99
+ UniRef100_A0A0R3M5F3
100
+ UniRef100_A0A969HBU9
101
+ MGYP000847580960
102
+ A0A0R3M5F3
103
+ SRR5262249_39096779
104
+ SRR5690348_3124064
105
+ MGYP000666260026
106
+ UniRef100_A0A838IZY5
107
+ MGYP001366537082
108
+ A0A1Q2HNV8
109
+ UniRef100_UPI002011AF87
110
+ 23258|scaffold4609030_1|+1|11
111
+ SRR5579871_6120579
112
+ MGYP000105995723
113
+ SRR3954468_6301146
114
+ MGYP001433622665
115
+ UniRef100_A0A3D6C093
116
+ A0A1W6LJH8
117
+ SRR5687767_11767969
118
+ SRR4051794_40104329
119
+ SRR3954454_22706284
120
+ MGYP000010225417
121
+ 2271|Ga0209795_10171170_2|-245|01
122
+ MGYP000738482073
123
+ SRR6202012_1017248
124
+ 26133|Ga0268298_10010625_3|-7238|00
125
+ SRR5262249_6174883
126
+ UniRef100_UPI001904043C
127
+ MGYP001146183833
128
+ UniRef100_A0A2U2N9L6
129
+ UniRef100_A0A127EN01
130
+ A0A127EN01
131
+ U2E7T8
132
+ UniRef100_UPI00190730C9
133
+ UniRef100_UPI000D3E5BC6
134
+ UniRef100_A0A1Y6FIV8
135
+ SRR5919109_3969706
136
+ MGYP001077603090
137
+ SRR3954463_15126113
138
+ ERR1039457_666370
139
+ UniRef100_UPI0005BD3569
140
+ 10876|scaffold_592705_c1_2|-157|01
141
+ UniRef100_A0A7V9DA05
142
+ MGYP000147404972
143
+ SRR5207249_9234194
144
+ UniRef100_A0A8T3N6J2
145
+ SRR3954451_13963006
146
+ UniRef100_A0A1T4Y342
147
+ UniRef100_A0A2V7TZK6
148
+ UniRef100_A0A831PRG7
149
+ SRR6185295_5137210
150
+ SRR5687767_10342612
151
+ SRR5881409_182087
152
+ UniRef100_A0A4Q3W5A8
153
+ UniRef100_A0A934QHJ2
154
+ 4460|scaffold_415991_c1_2|-159|00
155
+ MGYP000603749840
156
+ UPI0003E01BF4
157
+ 10796|Ga0318514_12978242_1|+2|11
158
+ UniRef100_UPI0021E125E4
159
+ SRR3712207_3115678
160
+ SRR6185295_1711097
161
+ UniRef100_UPI00214A28C1
162
+ UniRef100_UPI00193BA132
163
+ SRR5688500_8380976
164
+ ETNmetMinimDraft_32_1059908.scaffolds.fasta_scaffold895325_1
165
+ SRR5512132_2975018
166
+ 13960|scaffold210726_2|+957|01
167
+ 12613|JGI10216J12902_106548506_1|+3|10
168
+ SRR5688572_30282090
169
+ SRR5919106_6413091
170
+ SRR5688572_2308330
171
+ 3300018984.a:Ga0193605_1004274_5
172
+ GraSoiStandDraft_41_1057321.scaffolds.fasta_scaffold894400_3
173
+ UniRef100_A0A7X0U868
174
+ SRR5690242_21852208
175
+ SRR4051812_14954157
176
+ SRR5687768_5535903
177
+ GraSoiStandDraft_2_1057267.scaffolds.fasta_scaffold1451175_1
178
+ SRR6185437_3447168
179
+ SRR4051794_7772101
180
+ SRR5829696_6517560
181
+ MGYP000536513446
182
+ SRR5687768_1210620
183
+ SRR6059058_1924189
184
+ SRR5688500_14284300
185
+ SRR5688572_7102903
186
+ SRR4029453_1913048
187
+ SRR6185503_893473
188
+ SRR5262245_8274589
189
+ SoimicMinimDraft_3_1059731.scaffolds.fasta_scaffold2181823_1
190
+ 23892|Ga0310888_10269646_2|-120|01
191
+ SRR3954471_22207506
192
+ SRR5581483_9829629
193
+ Cyp2metagenome_2_1107375.scaffolds.fasta_scaffold1445432_1
194
+ UPI00034656C1
195
+ UniRef100_A0A2V7Y706
196
+ SRR3954454_15902270
197
+ SRR5438045_7802661
198
+ AntDryMetagUQ889_1029465.scaffolds.fasta_scaffold07537_1
199
+ SRR5678816_3727367
200
+ 18084|scaffold527937_2|-1158|01
201
+ MGYP000592549691
202
+ BarGraIncu01122A_1022018.scaffolds.fasta_scaffold128416_1
203
+ 3300006028.a:Ga0070717_10004647_5
204
+ SRR6185436_10889035
205
+ SRR5438132_14018022
206
+ SRR5438552_3472945
207
+ GraSoiStandDraft_28_1057319.scaffolds.fasta_scaffold3853791_1
208
+ SRR5688572_14330442
209
+ UniRef100_A0A512HC53
210
+ SRR5687768_18073040
211
+ SRR5215510_1611124
212
+ SRR4029453_630509
213
+ SRR5678816_1688620
214
+ SRR5262252_1015889
215
+ SRR5687768_1317515
216
+ UniRef100_A0A958FYA2
217
+ MGYP001204866127
218
+ SRR6185503_18435269
219
+ UniRef100_A0A964QAK1
220
+ SRR3954462_9021407
221
+ MGYP001286529226
222
+ SRR5688572_18846285
223
+ SRR5580700_4777658
224
+ UniRef100_K9VYB0
225
+ 14399|Ga0335069_12962689_1|-154|01
226
+ A0A1Z4NNA9
227
+ UniRef100_A0A1Z4NNA9
228
+ UniRef100_A0A0V7ZRD6
229
+ UniRef100_K9TAR4
230
+ SRR5262249_2417559
231
+ UniRef100_UPI002021B723
232
+ HubBroStandDraft_5_1064220.scaffolds.fasta_scaffold442189_2
233
+ S7VCH7
234
+ UniRef100_S7VCH7
235
+ CoawatStandDraft_6_1074263.scaffolds.fasta_scaffold645439_1
236
+ UniRef100_UPI00045E5B31
237
+ UniRef100_A0A969VCT1
238
+ UniRef100_A0A517P709
239
+ SRR3954454_16723622
240
+ SRR5581483_7974966
241
+ SRR4030095_4975553
242
+ UniRef100_D0LT55
243
+ 22902|Ga0257122_1006421_4|-3359|00
244
+ UniRef100_A0A6L9ZJT6
245
+ A0A0M1JMY3
246
+ UniRef100_UPI001E41D742
247
+ SRR6185503_14955318
248
+ SRR6185312_15773949
249
+ SRR6476619_1219783
250
+ SRR5690606_38631219
251
+ SRR5688572_23129716
252
+ SRR5688572_20232672
253
+ SRR6187401_742178
254
+ SRR5687768_3218532
255
+ SRR6187401_2120497
256
+ SRR5262245_48067131
257
+ UniRef100_A0A934ZFS7
258
+ UniRef100_A0A852ZR84
259
+ ERR1700733_14812660
260
+ MGYP001377341570
261
+ SRR5690242_13802588
262
+ UniRef100_UPI001FB8EED0
263
+ UniRef100_A0A521U920
264
+ UniRef100_A0A933USY1
265
+ SRR5471030_2470326
266
+ UniRef100_UPI001F066344
267
+ 14341|Ga0209698_10565634_1|+23|00
268
+ UniRef100_UPI001BEB70D6
269
+ SRR5215218_6711373
270
+ SRR6188768_3491203
271
+ SRR6187402_177262
272
+ SRR6059058_1539947
273
+ SRR3984957_851882
274
+ SRR3954470_18350552
275
+ MGYP001434196702
276
+ UniRef100_A0A4V1DI91
277
+ MGYP000867160857
278
+ UniRef100_UPI002001187B
279
+ MGYP001295239604
280
+ MGYP000025237920
281
+ UniRef100_A0A560LTW2
282
+ UniRef100_A0A076H9F6
283
+ MGYP000311810024
284
+ UniRef100_A0A968YHU9
285
+ UniRef100_UPI000B35C714
286
+ MGYP001088475876
287
+ UniRef100_A0A7Y3TRI1
288
+ UniRef100_A0A352XFA5
289
+ UniRef100_A0A5J6Q9E6
290
+ UniRef100_A0A139WUD4
291
+ UniRef100_A0A926Y835
292
+ UniRef100_UPI0020A7ECEA
293
+ UniRef100_A0A068MZV1
294
+ UniRef100_UPI0016873C22
295
+ UniRef100_A0A0M1JTJ0
296
+ UniRef100_A0A261KTE7
297
+ UniRef100_A0A3C0N9T6
298
+ UniRef100_A0A6J4I559
299
+ UniRef100_A0A6P0QL46
300
+ UniRef100_A0A846D500
301
+ UniRef100_A0A350Y740
302
+ UniRef100_A0YTH4
303
+ UniRef100_L8M9H5
304
+ UniRef100_UPI001D02CB6E
305
+ UniRef100_A0A6P0XB78
306
+ UniRef100_A0A6M0G3C9
307
+ UniRef100_A0A937N7H3
308
+ 23040|scaffold_1553752_c1_1|+3|10
309
+ ERR1700732_1595525
310
+ SRR5882762_5740304
311
+ SRR4030081_2822547
312
+ SRR6185295_12007002
313
+ SRR5215831_4900645
314
+ SRR5712672_2296523
315
+ 12123|Ga0209625_1018534_1|-121|00
316
+ SRR4051812_48803690
317
+ UniRef100_A0A932SCN4
318
+ SRR5579871_1513814
319
+ UniRef100_A0A517YME2
320
+ ERR1035437_1623514
321
+ SRR4030081_12347
322
+ UniRef100_A0A062V6S3
323
+ ERR1700676_5527870
324
+ 15488|Ga0208981_1210027_1|+2|10
325
+ SRR5258708_39684100
326
+ MGYP000303408308
327
+ SRR5258708_26871038
328
+ ERR1044072_7554525
329
+ 18065|scaffold45210_3|+2239|01
330
+ SRR5262249_21381219
331
+ SRR4051812_10739495
332
+ SRR6266404_266030
333
+ SRR5260370_32467315
334
+ UniRef100_V4JFK2
335
+ SRR5258708_27969647
336
+ UniRef100_UPI001565DCEC
337
+ SRR5262245_2800084
338
+ ERR1039458_2802747
339
+ ERR1700677_35954
340
+ SRR5450755_1004340
341
+ MGYP000397992808
342
+ SRR5437763_3353870
343
+ SRR5450755_2832987
344
+ MGYP000274140882
345
+ ERR1039458_272673
346
+ ERR1051325_10531525
347
+ SRR5688500_6194403
348
+ SRR5438477_4004941
349
+ SRR3954468_23038545
350
+ SRR5271165_1490500
351
+ UniRef100_A0A142HLX2
352
+ A0A142HLX2
353
+ SRR5512142_3015464
354
+ SRR5215208_4287972
355
+ SRR3954469_15268427
356
+ SRR5450432_1605686
357
+ SRR5688572_14714411
358
+ SRR5687768_10781478
359
+ SRR5438105_8210914
360
+ UniRef100_A0A2V8GSN4
361
+ 12689|scaffold1791547_1|-2|11
362
+ UniRef100_A0A948CES1
363
+ SRR6185295_9098939
364
+ SRR4051812_7426335
examples/data/provenance.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # KaiB demo data — provenance
2
+
3
+ This directory contains a concatenated demo asset for the SF-Cluster Colab
4
+ notebook. It is derived from the SF-Cluster Phase II benchmark's KaiB
5
+ `diverse_sf` arm and the FrustrAI-Seq per-residue Frustration Index (FI)
6
+ outputs.
7
+
8
+ ## Files
9
+
10
+ | File | Shape / size | Description |
11
+ |-----------------------|------------------------|-------------|
12
+ | `KaiB_filtered.a3m` | 364 records, L=91 | Subset of the KaiB filtered MSA. Query (`>101`, UniProt Q79V61 residues 5–95) is row 0. Lowercase insertion-state letters preserved. |
13
+ | `KaiB_fi_matrix.npy` | (364, 91) float32 | Per-residue FI matrix. Row `i` corresponds to record `i` in the A3M. |
14
+ | `KaiB_seq_ids.txt` | 364 lines | One short sequence ID per line, in the same order as the A3M / FI matrix. |
15
+
16
+ ## Source paths (private dev repo, read-only)
17
+
18
+ - Filtered MSA:
19
+ `/data1/hanqun/SF-Design/SF-Cluster/data/processed/msa/KaiB/KaiB/KaiB_KaiBTE_91aa_UniProt_Q79V61_5to95_2QKE_chainB.filtered.a3m`
20
+ (depth 6821, L=91)
21
+ - FI artifacts (per-subset):
22
+ `/data1/hanqun/SF-Design/SF-Cluster/results/frustai_artifacts/KaiB/diverse_sf/KaiB/{000..011}/`
23
+ with files `fi_matrix.npy` ((32, 91) float32), `metadata.json`,
24
+ `fi_residual_matrix.npy`, `entropy_matrix.npy`.
25
+ - Source subset A3Ms (used to map FI rows → sequence IDs):
26
+ `/data1/hanqun/SF-Design/SF-Cluster/results/baseline_p8/diverse_sf/KaiB/KaiB/screen/diversesf_KaiB_KaiB_seed{000..011}.a3m`.
27
+
28
+ ## Construction recipe
29
+
30
+ 1. For each of the 12 `diverse_sf` subsets, load `fi_matrix.npy` ((32, 91)
31
+ float32) and the corresponding `diversesf_KaiB_KaiB_seed{NNN}.a3m`.
32
+ 2. Concatenate rows in subset-index order; track the parallel sequence-ID
33
+ list from the A3M records.
34
+ 3. **Deduplication policy**: first occurrence wins. A sequence ID seen in an
35
+ earlier subset is skipped (both in the FI matrix and the ID list). This
36
+ reduces 12 × 32 = 384 raw rows to 364 unique rows.
37
+ 4. Extract the corresponding sequences (with their full headers and
38
+ lowercase insertion states) from the filtered MSA, preserving the order
39
+ established in step 3. The query (`>101`) is always row 0.
40
+
41
+ All 364 unique IDs were found in the filtered MSA (0 missing).
42
+
43
+ ## Models
44
+
45
+ - **FrustrAI-Seq weights**: HF repo `leuschj/FrustrAI-Seq`, commit
46
+ `ee5a01a29fde00630f4a1157f0e6cb8343ac434b`. Inference in fp16 with LoRA
47
+ adapters merged.
48
+
49
+ ## License
50
+
51
+ This demo asset is released under MIT alongside the SF-Cluster OSS package.
52
+ The KaiB sequence (UniProt Q79V61, *Thermosynechococcus elongatus*) and its
53
+ MSA neighbors are public-domain sequence records via UniRef100 / Mgnify;
54
+ no proprietary structures are included. FrustrAI-Seq outputs are derived
55
+ features (floating-point FI values) and are released by the FrustrAI-Seq
56
+ authors under their own license — see
57
+ https://huggingface.co/leuschj/FrustrAI-Seq.