Add Colab demo notebook + KaiB demo data
Browse files- examples/SF_Cluster_Demo.ipynb +290 -0
- examples/data/KaiB_fi_matrix.npy +3 -0
- examples/data/KaiB_filtered.a3m +728 -0
- examples/data/KaiB_seq_ids.txt +364 -0
- examples/data/provenance.md +57 -0
examples/SF_Cluster_Demo.ipynb
ADDED
|
@@ -0,0 +1,290 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "markdown",
|
| 5 |
+
"metadata": {},
|
| 6 |
+
"source": [
|
| 7 |
+
"# SF-Cluster \u2014 frustration-guided MSA subset builder\n",
|
| 8 |
+
"\n",
|
| 9 |
+
"**What this notebook does.** Installs the open-source `sf_cluster` package, downloads a small KaiB demo bundle (a 364-sequence MSA + a per-residue Frustration Index matrix from FrustrAI-Seq), and builds two flavours of stratified MSA subsets (`mosaic` and `gradient`) using the contrast-HV/LV score. Everything runs on CPU in roughly two minutes.\n",
|
| 10 |
+
"\n",
|
| 11 |
+
"**Who it is for.** Biologists who want reproducible, frustration-stratified MSA slices to feed into an AF-Cluster-style multi-conformer prediction loop.\n",
|
| 12 |
+
"\n",
|
| 13 |
+
"**What you do next.** Take the 12 mosaic or 12 gradient A3M subsets emitted at the end of this notebook, run each through ColabFold AF2, and aggregate per the SF-Cluster \u00a79.1 hit criterion.\n",
|
| 14 |
+
"\n",
|
| 15 |
+
"---\n",
|
| 16 |
+
"\n",
|
| 17 |
+
"> ## LIMITATIONS \u2014 please read\n",
|
| 18 |
+
"> A controlled comparison on the Main-21 cases shows that **uniform random subsampling performs equivalently on most cases**. The frustration signal is **not** the active ingredient here \u2014 depth reduction is. See the OSS README for the full ablation.\n",
|
| 19 |
+
">\n",
|
| 20 |
+
"> Use this tool when you want **stratified, reproducible MSA subsets** with a clear provenance story \u2014 not as a guaranteed conformational diversity engine. It is a research baseline, not a turnkey accuracy improvement."
|
| 21 |
+
]
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"cell_type": "markdown",
|
| 25 |
+
"metadata": {},
|
| 26 |
+
"source": [
|
| 27 |
+
"## 1. Install the package\n",
|
| 28 |
+
"\n",
|
| 29 |
+
"Pulls the OSS release from Hugging Face. Pure-Python; only depends on `numpy` and `scipy`."
|
| 30 |
+
]
|
| 31 |
+
},
|
| 32 |
+
{
|
| 33 |
+
"cell_type": "code",
|
| 34 |
+
"metadata": {},
|
| 35 |
+
"execution_count": null,
|
| 36 |
+
"outputs": [],
|
| 37 |
+
"source": [
|
| 38 |
+
"!pip install -q git+https://huggingface.co/ChatterjeeLab/SF-Cluster"
|
| 39 |
+
]
|
| 40 |
+
},
|
| 41 |
+
{
|
| 42 |
+
"cell_type": "markdown",
|
| 43 |
+
"metadata": {},
|
| 44 |
+
"source": [
|
| 45 |
+
"## 2. Download the KaiB demo bundle\n",
|
| 46 |
+
"\n",
|
| 47 |
+
"Three files, ~200 KB total: a filtered MSA, a per-residue FI matrix from FrustrAI-Seq, and the parallel sequence-ID list."
|
| 48 |
+
]
|
| 49 |
+
},
|
| 50 |
+
{
|
| 51 |
+
"cell_type": "code",
|
| 52 |
+
"metadata": {},
|
| 53 |
+
"execution_count": null,
|
| 54 |
+
"outputs": [],
|
| 55 |
+
"source": [
|
| 56 |
+
"from huggingface_hub import hf_hub_download\n",
|
| 57 |
+
"from pathlib import Path\n",
|
| 58 |
+
"import os\n",
|
| 59 |
+
"\n",
|
| 60 |
+
"REPO = 'ChatterjeeLab/SF-Cluster'\n",
|
| 61 |
+
"FILES = ['examples/data/KaiB_filtered.a3m',\n",
|
| 62 |
+
" 'examples/data/KaiB_fi_matrix.npy',\n",
|
| 63 |
+
" 'examples/data/KaiB_seq_ids.txt']\n",
|
| 64 |
+
"\n",
|
| 65 |
+
"local = {}\n",
|
| 66 |
+
"for fname in FILES:\n",
|
| 67 |
+
" p = hf_hub_download(repo_id=REPO, filename=fname, repo_type='model')\n",
|
| 68 |
+
" local[fname] = p\n",
|
| 69 |
+
" print(f'{fname:50s} {os.path.getsize(p)/1024:7.1f} KB -> {p}')\n",
|
| 70 |
+
"\n",
|
| 71 |
+
"A3M = local['examples/data/KaiB_filtered.a3m']\n",
|
| 72 |
+
"FI = local['examples/data/KaiB_fi_matrix.npy']\n",
|
| 73 |
+
"IDS = local['examples/data/KaiB_seq_ids.txt']"
|
| 74 |
+
]
|
| 75 |
+
},
|
| 76 |
+
{
|
| 77 |
+
"cell_type": "markdown",
|
| 78 |
+
"metadata": {},
|
| 79 |
+
"source": [
|
| 80 |
+
"## 3. Build the pool and stratified subsets\n",
|
| 81 |
+
"\n",
|
| 82 |
+
"The `pool_msa` call ties the MSA records to their per-residue FI vectors. `contrast_hvlv` computes the per-sequence high-variance / low-variance FI contrast (see README for the formula). `method_mosaic` and `method_gradient` then deterministically draw 12 subsets of 32 sequences each."
|
| 83 |
+
]
|
| 84 |
+
},
|
| 85 |
+
{
|
| 86 |
+
"cell_type": "code",
|
| 87 |
+
"metadata": {},
|
| 88 |
+
"execution_count": null,
|
| 89 |
+
"outputs": [],
|
| 90 |
+
"source": [
|
| 91 |
+
"import numpy as np\n",
|
| 92 |
+
"from sf_cluster import pool_msa, contrast_hvlv, method_mosaic, method_gradient\n",
|
| 93 |
+
"\n",
|
| 94 |
+
"pool = pool_msa(A3M, FI)\n",
|
| 95 |
+
"print(f'pool: N_seq={pool.n_seq}, L={pool.n_cols}, query={pool.headers[0]!r}')\n",
|
| 96 |
+
"\n",
|
| 97 |
+
"score = contrast_hvlv(pool.fi_matrix)\n",
|
| 98 |
+
"print(f'contrast_hvlv: shape={score.shape}, '\n",
|
| 99 |
+
" f'min={score.min():+.3f}, median={np.median(score):+.3f}, max={score.max():+.3f}')\n",
|
| 100 |
+
"\n",
|
| 101 |
+
"mosaic_subsets = method_mosaic(score)\n",
|
| 102 |
+
"gradient_subsets = method_gradient(score)\n",
|
| 103 |
+
"\n",
|
| 104 |
+
"def summarize(name, subsets):\n",
|
| 105 |
+
" print(f'\\n[{name}] {len(subsets)} subsets')\n",
|
| 106 |
+
" print(f'{\"subset_id\":>10} {\"n_seqs\":>7} {\"mean_contrast\":>14}')\n",
|
| 107 |
+
" for i, sub in enumerate(subsets):\n",
|
| 108 |
+
" m = float(np.mean(score[sub]))\n",
|
| 109 |
+
" print(f'{i:>10d} {len(sub):>7d} {m:>+14.4f}')\n",
|
| 110 |
+
"\n",
|
| 111 |
+
"summarize('mosaic', mosaic_subsets)\n",
|
| 112 |
+
"summarize('gradient', gradient_subsets)"
|
| 113 |
+
]
|
| 114 |
+
},
|
| 115 |
+
{
|
| 116 |
+
"cell_type": "markdown",
|
| 117 |
+
"metadata": {},
|
| 118 |
+
"source": [
|
| 119 |
+
"## 4. Visualise\n",
|
| 120 |
+
"\n",
|
| 121 |
+
"Three plots: the contrast score distribution with tercile / quartile boundaries marked, the per-subset mean contrast score for both methods, and the pairwise sequence-overlap heatmap between mosaic and gradient subsets."
|
| 122 |
+
]
|
| 123 |
+
},
|
| 124 |
+
{
|
| 125 |
+
"cell_type": "code",
|
| 126 |
+
"metadata": {},
|
| 127 |
+
"execution_count": null,
|
| 128 |
+
"outputs": [],
|
| 129 |
+
"source": [
|
| 130 |
+
"import matplotlib.pyplot as plt\n",
|
| 131 |
+
"import numpy as np\n",
|
| 132 |
+
"\n",
|
| 133 |
+
"fig, axes = plt.subplots(1, 3, figsize=(15, 4))\n",
|
| 134 |
+
"\n",
|
| 135 |
+
"# (a) score histogram with tercile + quartile lines\n",
|
| 136 |
+
"ax = axes[0]\n",
|
| 137 |
+
"ax.hist(score, bins=40, color='#4477AA', edgecolor='white', alpha=0.85)\n",
|
| 138 |
+
"sorted_s = np.sort(score)\n",
|
| 139 |
+
"N = len(sorted_s)\n",
|
| 140 |
+
"terciles = [sorted_s[N//3], sorted_s[2*N//3]]\n",
|
| 141 |
+
"quartiles = [sorted_s[N//4], sorted_s[N//2], sorted_s[3*N//4]]\n",
|
| 142 |
+
"for t in terciles:\n",
|
| 143 |
+
" ax.axvline(t, color='#CC6677', linestyle='--', label='tercile (mosaic)' if t==terciles[0] else None)\n",
|
| 144 |
+
"for q in quartiles:\n",
|
| 145 |
+
" ax.axvline(q, color='#117733', linestyle=':', label='quartile (gradient)' if q==quartiles[0] else None)\n",
|
| 146 |
+
"ax.set_xlabel('contrast_hvlv')\n",
|
| 147 |
+
"ax.set_ylabel('count')\n",
|
| 148 |
+
"ax.set_title('(a) per-sequence contrast score')\n",
|
| 149 |
+
"ax.legend(fontsize=8)\n",
|
| 150 |
+
"\n",
|
| 151 |
+
"# (b) per-subset mean contrast\n",
|
| 152 |
+
"ax = axes[1]\n",
|
| 153 |
+
"x = np.arange(12)\n",
|
| 154 |
+
"m_means = np.array([score[s].mean() for s in mosaic_subsets])\n",
|
| 155 |
+
"g_means = np.array([score[s].mean() for s in gradient_subsets])\n",
|
| 156 |
+
"w = 0.4\n",
|
| 157 |
+
"ax.bar(x - w/2, m_means, width=w, label='mosaic', color='#4477AA')\n",
|
| 158 |
+
"ax.bar(x + w/2, g_means, width=w, label='gradient', color='#CC6677')\n",
|
| 159 |
+
"ax.axhline(0, color='black', lw=0.5)\n",
|
| 160 |
+
"ax.set_xlabel('subset id')\n",
|
| 161 |
+
"ax.set_ylabel('mean contrast_hvlv')\n",
|
| 162 |
+
"ax.set_title('(b) per-subset mean score')\n",
|
| 163 |
+
"ax.legend(fontsize=8)\n",
|
| 164 |
+
"\n",
|
| 165 |
+
"# (c) pairwise overlap heatmap (mosaic x gradient)\n",
|
| 166 |
+
"ax = axes[2]\n",
|
| 167 |
+
"M = np.zeros((12, 12), dtype=int)\n",
|
| 168 |
+
"for i, si in enumerate(mosaic_subsets):\n",
|
| 169 |
+
" set_i = set(si)\n",
|
| 170 |
+
" for j, sj in enumerate(gradient_subsets):\n",
|
| 171 |
+
" M[i, j] = len(set_i & set(sj))\n",
|
| 172 |
+
"im = ax.imshow(M, cmap='magma', aspect='auto')\n",
|
| 173 |
+
"ax.set_xlabel('gradient subset')\n",
|
| 174 |
+
"ax.set_ylabel('mosaic subset')\n",
|
| 175 |
+
"ax.set_title('(c) sequence overlap (count)')\n",
|
| 176 |
+
"plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)\n",
|
| 177 |
+
"\n",
|
| 178 |
+
"plt.tight_layout()\n",
|
| 179 |
+
"plt.show()"
|
| 180 |
+
]
|
| 181 |
+
},
|
| 182 |
+
{
|
| 183 |
+
"cell_type": "markdown",
|
| 184 |
+
"metadata": {},
|
| 185 |
+
"source": [
|
| 186 |
+
"## 5. Write subsets to A3M files\n",
|
| 187 |
+
"\n",
|
| 188 |
+
"Each subset is written as a ColabFold-compatible A3M with the query as the first record. Downstream you would feed one A3M per AF2 run."
|
| 189 |
+
]
|
| 190 |
+
},
|
| 191 |
+
{
|
| 192 |
+
"cell_type": "code",
|
| 193 |
+
"metadata": {},
|
| 194 |
+
"execution_count": null,
|
| 195 |
+
"outputs": [],
|
| 196 |
+
"source": [
|
| 197 |
+
"from pathlib import Path\n",
|
| 198 |
+
"from sf_cluster import build_subsets\n",
|
| 199 |
+
"\n",
|
| 200 |
+
"out_mosaic = Path('./subsets_mosaic')\n",
|
| 201 |
+
"out_gradient = Path('./subsets_gradient')\n",
|
| 202 |
+
"\n",
|
| 203 |
+
"_, _, _, mosaic_paths = build_subsets(A3M, FI, method='mosaic', out_dir=out_mosaic)\n",
|
| 204 |
+
"_, _, _, gradient_paths = build_subsets(A3M, FI, method='gradient', out_dir=out_gradient)\n",
|
| 205 |
+
"\n",
|
| 206 |
+
"print(f'mosaic -> {len(mosaic_paths):2d} files in {out_mosaic}/')\n",
|
| 207 |
+
"print(f'gradient -> {len(gradient_paths):2d} files in {out_gradient}/')\n",
|
| 208 |
+
"\n",
|
| 209 |
+
"sample = mosaic_paths[0]\n",
|
| 210 |
+
"print(f'\\nFirst 3 records of {sample.name}:')\n",
|
| 211 |
+
"with open(sample) as f:\n",
|
| 212 |
+
" lines = f.read().splitlines()\n",
|
| 213 |
+
"shown = 0\n",
|
| 214 |
+
"i = 0\n",
|
| 215 |
+
"while i < len(lines) and shown < 3:\n",
|
| 216 |
+
" if lines[i].startswith('>'):\n",
|
| 217 |
+
" print(' ', lines[i])\n",
|
| 218 |
+
" if i+1 < len(lines):\n",
|
| 219 |
+
" seq = lines[i+1]\n",
|
| 220 |
+
" print(' ', seq[:80] + ('...' if len(seq) > 80 else ''))\n",
|
| 221 |
+
" shown += 1\n",
|
| 222 |
+
" i += 2\n",
|
| 223 |
+
" else:\n",
|
| 224 |
+
" i += 1"
|
| 225 |
+
]
|
| 226 |
+
},
|
| 227 |
+
{
|
| 228 |
+
"cell_type": "markdown",
|
| 229 |
+
"metadata": {},
|
| 230 |
+
"source": [
|
| 231 |
+
"## 6. Bring your own protein\n",
|
| 232 |
+
"\n",
|
| 233 |
+
"The demo bundle is tiny and CPU-friendly. For your own target:\n",
|
| 234 |
+
"\n",
|
| 235 |
+
"1. **Build an MSA.** Use the official [ColabFold notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) (`mmseqs2_uniref_env` mode) to generate a deep `.a3m`, then filter it (e.g. 25%-gap filter) to obtain `your_msa.a3m`.\n",
|
| 236 |
+
"2. **Compute the FI matrix.** Run [FrustrAI-Seq](https://huggingface.co/leuschj/FrustrAI-Seq) on `your_msa.a3m` to obtain a per-residue Frustration Index matrix `your_fi.npy` of shape `(N_seq, L)`. **A GPU is required for this step.** See the FrustrAI-Seq model card for inference details.\n",
|
| 237 |
+
"3. **Re-run the cells above.** Just point `A3M` and `FI` at your files and re-execute from \u00a73 onward. The package will raise a `ValueError` if `N_seq` disagrees between the two."
|
| 238 |
+
]
|
| 239 |
+
},
|
| 240 |
+
{
|
| 241 |
+
"cell_type": "markdown",
|
| 242 |
+
"metadata": {},
|
| 243 |
+
"source": [
|
| 244 |
+
"## 7. Next: run AF2 on each subset\n",
|
| 245 |
+
"\n",
|
| 246 |
+
"Feed each subset A3M into the official [ColabFold AlphaFold2 notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) \u2014 one subset per AF2 run. Aggregate per the SF-Cluster \u00a79.1 hit criterion:\n",
|
| 247 |
+
"\n",
|
| 248 |
+
"- C\u03b1 RMSD \u2264 3.0 \u00c5 on the `common_core` residues vs. each reference state,\n",
|
| 249 |
+
"- mean pLDDT \u2265 70 overall,\n",
|
| 250 |
+
"- mean pLDDT \u2265 70 inside the `switch_region`.\n",
|
| 251 |
+
"\n",
|
| 252 |
+
"**Compute budget disclosure (per `docs/protocol_lock.md`).** The SF-Cluster paper locks AF2 at 3 recycles \u00d7 4 seeds \u00d7 5 models for KaiB / Mpt53, and 0 recycles \u00d7 8 seeds \u00d7 5 models for the GA/GB cases. The GA/GB row was further trimmed to **4 subsets per case** during refinement to stay within the compute envelope. Global seed: `20260422`. Per-case seed = `hash(case_name) mod 2^31`; per-subset seed = `base_seed + subset_index`. All inference uses `templates=OFF`, `relax=OFF`, `dropout=OFF`."
|
| 253 |
+
]
|
| 254 |
+
},
|
| 255 |
+
{
|
| 256 |
+
"cell_type": "markdown",
|
| 257 |
+
"metadata": {},
|
| 258 |
+
"source": [
|
| 259 |
+
"## 8. Citation, license, companion repo\n",
|
| 260 |
+
"\n",
|
| 261 |
+
"```bibtex\n",
|
| 262 |
+
"@misc{sf_cluster_2026,\n",
|
| 263 |
+
" title = {SF-Cluster: frustration-guided MSA subset builders for AF2 multi-conformer prediction},\n",
|
| 264 |
+
" author = {Cao, Hanqun and {Chatterjee Lab}},\n",
|
| 265 |
+
" year = {2026},\n",
|
| 266 |
+
" note = {Workshop release. Companion code: https://huggingface.co/ChatterjeeLab/SF-Cluster},\n",
|
| 267 |
+
" url = {https://huggingface.co/ChatterjeeLab/SF-Cluster}\n",
|
| 268 |
+
"}\n",
|
| 269 |
+
"```\n",
|
| 270 |
+
"\n",
|
| 271 |
+
"**License:** MIT. See `LICENSE` in the OSS repo.\n",
|
| 272 |
+
"\n",
|
| 273 |
+
"**Companion private dev repo.** Full Phase II benchmark code (DBSCAN baselines, all four arms, evaluation harness, region partition ablation) lives in the SF-Cluster private dev repository. The OSS release here is a slim, dependency-light subset \u2014 only the `mosaic` and `gradient` arms and their scoring function \u2014 intended for reuse, not full reproduction of the benchmark."
|
| 274 |
+
]
|
| 275 |
+
}
|
| 276 |
+
],
|
| 277 |
+
"metadata": {
|
| 278 |
+
"kernelspec": {
|
| 279 |
+
"display_name": "Python 3",
|
| 280 |
+
"language": "python",
|
| 281 |
+
"name": "python3"
|
| 282 |
+
},
|
| 283 |
+
"language_info": {
|
| 284 |
+
"name": "python",
|
| 285 |
+
"version": "3.10"
|
| 286 |
+
}
|
| 287 |
+
},
|
| 288 |
+
"nbformat": 4,
|
| 289 |
+
"nbformat_minor": 5
|
| 290 |
+
}
|
examples/data/KaiB_fi_matrix.npy
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5d7db0c13d574732a3465d209e6d542ce9737105c4e586a1fd53c976c2b79cfd
|
| 3 |
+
size 132624
|
examples/data/KaiB_filtered.a3m
ADDED
|
@@ -0,0 +1,728 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
>101
|
| 2 |
+
RKTYVLKLYVAGNTPNSVRALKTLNNILEKEFKGVYALKVIDVLKNPQLAEEDKILATPTLAKVLPPPVRRIIGDLSNREKVLIGLDLLYE
|
| 3 |
+
>MGYP000886600007 114 0.454 1.369E-26 2 89 91 23 110 116
|
| 4 |
+
--PYVLRLYIAGKTERSMHAIEQIRSVLEQRLPGRYELEVIDVHQHPEMVRADQVIAVPTLVKKLPEPLRKIIGSMADQDRLLIGLDLLP-
|
| 5 |
+
>UniRef100_A0A971TK21 109 0.459 4.603E-25 2 88 91 23 109 116
|
| 6 |
+
--PYVLRLYIAGKTERSMHAIEQIRSVLEQRLPGRYELEVIDVHQHPEMVRADQVIAVPTLVKKLPEPLRKIIGSMADQDRLLIGLDLL--
|
| 7 |
+
>ERR1044071_8151622 121 0.516 4.579E-29 1 89 91 15 103 111
|
| 8 |
+
-EEWILRLYVAGHSARSAAALRNLTMICEEHLAGRYRIELIDLLKQPQLARGDQIVAVPALVPHLPPPMKKIIGDLSNEERVLVGLDLLP-
|
| 9 |
+
>UniRef100_A0A1F2SDQ0 122 0.597 1.827E-29 1 87 91 15 101 106
|
| 10 |
+
-KAYVLRLYVAGQTPKSVLAFTNLKQICEDHLQGRYEIEIIDLLKNPQLARGDQILAVPTLVRRLPEPIKKIIGDLSNTERVLVGLDL---
|
| 11 |
+
>SRR5580692_7392111 122 0.483 2.431E-29 1 89 91 38 126 146
|
| 12 |
+
-PDYILRLYIAGATRQSATAIQNIRSICEERLRGRYELEVVDVYQEPAAAREDQVLALPTLIKRLPLPLRQLIGDLSNTKRVLLGLDLKP-
|
| 13 |
+
>K9S6Z6 113 0.505 2.579E-26 3 89 91 65 151 153
|
| 14 |
+
---YCLRLYIAGGTSRSMSALQRLKEICETYLQGRYELEVIDVYQASPAVLTDNVVAIPTLIKQLPLPLRRVIGDLSDTEKVLLGLDLVP-
|
| 15 |
+
>SRR3954470_14000739 130 0.510 3.149E-32 2 90 91 13 102 125
|
| 16 |
+
--RYVLRLYVTGSTPRSSRAIQNIRAICEEHLRGRYDLEVIDIHQQPVLARGEQIIAAPTLIKTLPAPLRKVVGDLSNTERVLMGLDLrPAE
|
| 17 |
+
>UniRef100_UPI0018DCCDB9 121 0.528 3.442E-29 0 88 91 12 100 116
|
| 18 |
+
RDHWTLRLYVAGLTPRSITAFGNLKRLCETYLAGKYTIEVIDLVDHPERAREDQILAIPTLVRKLPEPVRRIIGDLANTERVLVGLELL--
|
| 19 |
+
>SRR5688500_3073545 124 0.528 4.991E-30 1 87 91 7 93 101
|
| 20 |
+
-ETFRLRLYIAGQTPRSVGALGNLKKICEEHLQGRYELEIIDLMQNPGLARGDQILAVPTLVRRLPEPIKKIIGDLANSERVIVGLDL---
|
| 21 |
+
>MGYP000149628109 121 0.500 3.336E-29 3 90 91 33 120 125
|
| 22 |
+
---YHLRLFVTGSTLRSQQAIQNLRQICEEHLQGRYKVEVIDVSKDPAQARQHQILAVPTLLKELPPPFRKIVGDLSEKEKVLEGLDIQPQ
|
| 23 |
+
>3740|scaffold08918_4|-3862|00 120 0.563 1.184E-28 2 88 91 23 109 145
|
| 24 |
+
--KWVLRLYVAGQTPKAIAAFNNLKLICEEQLKGIYHIEVIDLLKKPQLARDNQILAVPTLVRKLPLPVKNIIGDLSNTERVLVGLDLI--
|
| 25 |
+
>SRR6187401_1816177 121 0.494 3.336E-29 1 87 91 130 216 223
|
| 26 |
+
-ERWVLKLYVAGQTARSAAALENLKAICDGHLGGKYTIEVIDLAQNPRLARTDQIVAVPTLVRKVPEPMRKIIGDLSNQQRVIVGLRL---
|
| 27 |
+
>SRR5512147_2964451 121 0.471 3.336E-29 3 90 91 82 170 172
|
| 28 |
+
---YLLRLFVAGTTPRSARAIQNIRAICEERLHGSFALEVVDIYQHPEQAKPEQIIVAPTLVKELPLPVRKLIGDLSDKERVLVGLDIvPRE
|
| 29 |
+
>26123|scaffold_438712_c1_1|+1|10 123 0.505 6.850E-30 1 87 91 48 134 139
|
| 30 |
+
-ENWDLRLYVAGESERSRLAIRNLKKICETHLHASYTIEVIDLLKNPGLARGDQIVAVPTLVRKLPQPMRKIIGDLSNEDRVLVGLDL---
|
| 31 |
+
>12684|Ga0207652_11722284_1|+1|10 125 0.494 1.406E-30 1 90 91 60 150 151
|
| 32 |
+
-ERYILKLYVTGLTTRSARAIENLQVLCQKHLPGRYELQVIDVYQQPELARTEQVVAIPTLIKKLPLPLRRLIGDMSDEERVLVGLDiLPHE
|
| 33 |
+
>SRR5580704_16544830 124 0.453 4.991E-30 3 88 91 25 110 128
|
| 34 |
+
---YILRLYITGFSPRSARAISNIRKICEAHLEGRYDLEVVDISQEPALAQSEQILAAPTLIKKWPLPARRFIGDMSQSDRILLGLDLP--
|
| 35 |
+
>UniRef100_A0A3N5XMK1 120 0.528 1.222E-28 1 87 91 17 103 121
|
| 36 |
+
-ERYVLRLYTTGMTPRSMRAVESIKAICEEHLKGRYELEIIDIHEQPVLARGDQIIAAPTLIKRLPEPLRRLIGDLSDSERVLLGLDL---
|
| 37 |
+
>UniRef100_A0A7W0IU42 121 0.627 3.442E-29 3 88 91 24 109 113
|
| 38 |
+
---FVLRLYVAGQTPKSMAAFANLKKICEEHLAGQYQIEVIDLLENPHLARGDQILAIPTLVKKLPPPVRKIIGDLSNTERVLIGLDLL--
|
| 39 |
+
>ERR1043166_4517172 119 0.482 1.625E-28 3 89 91 8 94 120
|
| 40 |
+
---YLLRLYVTGSTLKSARAIQNIQAICEEKLQGRYSLEVIDIYQHPEQVKPEQIVVAPTLVKKLPLPVRKIIGDLSNTERVLVGLDIKP-
|
| 41 |
+
>MGYP001057477004 125 0.563 1.930E-30 3 89 91 19 105 171
|
| 42 |
+
---YRLRLYIAGQTPNSVAAITNLRQICRDKLEGRYRIEVIDLLEKPQLAKGDQILAVPTLVRKLPEPLKRIIGDLSNEERVLVGLDILT-
|
| 43 |
+
>26195|Ga0315277_11393154_2|-158|01 124 0.550 3.636E-30 3 90 91 48 136 149
|
| 44 |
+
---WVLRLYVAGQTPKSLTAFANLKNLCEEHLKGKYKIEVFDLLQHPQLARGDQILAIPTIVRKLPAPVRKIIGDCSNTEHVLIGLGLrPRE
|
| 45 |
+
>UniRef100_A0A8J7LTM0 115 0.518 3.979E-27 3 85 91 20 102 113
|
| 46 |
+
---WLLTLYVAGQTPRSVTAFSNLKQICEEHLPGRYDIEIVDVVTKPELATRDQVVALPTLVRKLPEPVRKVIGDLSNKEKVLVGL-----
|
| 47 |
+
>12918|scaffold901211_1|+1|10 124 0.528 3.636E-30 1 89 91 71 159 176
|
| 48 |
+
-PQYVLRLYVAGVTPRSVQAIETIKRICERNLQGRYHLEVIDIHQQPTLAKGDQIIAVPTLIRQLPAPLRTLIGDMSNEDRVLIGLDLKP-
|
| 49 |
+
>SRR5215510_15873647 112 0.430 4.859E-26 1 86 91 26 111 116
|
| 50 |
+
-KRYLLKLFIIGTRPNSARAIVNVRKLCDEYLAGRYMLEVVDISKHPERVKEEQVIAAPTLIKELPAPLRRFIGSMSDTEKLLVGLE----
|
| 51 |
+
>SRR5512142_679953 123 0.528 1.290E-29 1 89 91 15 103 105
|
| 52 |
+
-EVWRLRLYIAGQTARAAAAVANLRTICEKHLEGRYALEVVDLLETPQLARGDQILAIPTLVRRLPPPMKKIIGDLSNEERVLVGLDLQP-
|
| 53 |
+
>UniRef100_UPI00190A34C5 123 0.540 9.699E-30 1 87 91 12 98 103
|
| 54 |
+
-KGWCLRLYVAGQSPRSMSALQNLKAICETHLAGHYDIEVIDLMEDPKLARGDQIVAVPTLVRKLPEPVRKIIGDLSNTERVLVGLDL---
|
| 55 |
+
>SRR4051812_34338196 126 0.550 1.025E-30 1 89 91 49 137 141
|
| 56 |
+
-ENWELKLYVAGRTPKSVLALKNLRKYCEEHLEGRYKIEVIDLLEKPQLAEGDQIFAVPTLVRKVPVPIRKIIGDLSNEEKVLVGLNIVP-
|
| 57 |
+
>UniRef100_A0A3N5LPZ6 111 0.465 1.297E-25 0 87 91 20 107 114
|
| 58 |
+
RPKYVLRLYVAGISPRSERAIRSVKEVCEQRLKNRYELEIVDVYQHPESLKDGQVLAVPTLIKQLPLPLRRLIGDMSDKEKLIVGLDL---
|
| 59 |
+
>UniRef100_A0A950DFQ4 111 0.602 9.446E-26 5 87 91 23 105 118
|
| 60 |
+
-----LRLYVAGQTSRSLAAIDNLRRICEKNLKGRYTIEVIDLMQAPQLARTDQIVAIPTLVKKLPPPLRRIIGDLSNSERVLIGLDI---
|
| 61 |
+
>5937|scaffold842798_1|-17|01 115 0.453 7.266E-27 2 87 91 22 107 122
|
| 62 |
+
--KWDLRLYVAGPTTKSLAAFRNLEQLCKDHLPGKYHIEVVDLVKTPQLAKGEQILALPALVRQLPIPIRKVVGDLSDTERVLVGLDL---
|
| 63 |
+
>SRR4030095_15264891 128 0.611 2.104E-31 3 87 91 44 128 134
|
| 64 |
+
---YQLRLYVAGQTPKSVLALKNLEQICEEHLQGRYEIEVIDLLQNPQLARGDQILALPTLVRRLPEPIKKIIGDLSNKERVLVGLDL---
|
| 65 |
+
>UniRef100_K9UBR2 49 0.287 3.904E-04 1 73 91 13 85 398
|
| 66 |
+
-KSLQLLLFLDERMTSYVQNQEIRDKLAILNAQDAFELQVVDVGKQPDLAEHYRVIATPALLKLYPTPRQILAG-----------------
|
| 67 |
+
>UniRef100_UPI001E525CD0 52 0.315 4.240E-05 2 73 91 19 90 437
|
| 68 |
+
--PLQLVLFI-DRRPSSFQKLREIRNyLLKLHEQYPFDLQIVDVAEQPYLAEYFKLVATPALLKIHPEPRQILAG-----------------
|
| 69 |
+
>UniRef100_A0A845X7U1 48 0.275 1.011E-03 5 73 91 22 90 388
|
| 70 |
+
-----LLLFVDERSGSQQHTQQILDYLNYLEQEHLFELEVLEVGEHPDLAEHFRLIATPSLVRIYPQPKHVLAG-----------------
|
| 71 |
+
>3300017444.a:Ga0185300_10001144_2 52 0.315 4.110E-05 2 73 91 13 84 457
|
| 72 |
+
--PLQLVLFI-DRRPSSFQKLREIRNyLLMLHEQYPFDLQIVDVAEQPYLAEYFKLVATPALLKIHPSPRQVLAG-----------------
|
| 73 |
+
>UniRef100_A0A349JMI1 49 0.324 3.904E-04 4 76 91 36 109 410
|
| 74 |
+
----QLLLFVDKRLSSREQTRHIRKSLQDLKAEWDFELQIIDVGEQPDLAEHFKLLATPSLLKIHPDPRQTLAGsDLS--------------
|
| 75 |
+
>UniRef100_K9SDH8 56 0.356 1.778E-06 2 73 91 16 87 393
|
| 76 |
+
--PLQLLLFV-DRRPSSREQIRTIRnRLKNLEAETPFALEVIDVGEQPYLAEHFKLVATPALLKICPEPRQTLAG-----------------
|
| 77 |
+
>UniRef100_A0A3M1PH87 54 0.283 8.682E-06 1 73 91 12 84 391
|
| 78 |
+
-PSLQLLLFVDKR-PSSIEKIRQIRnRLKELEADYPFDLQVVDVGEQPHMAEHFRLIATPAIIKIHPEPRQTLAG-----------------
|
| 79 |
+
>UniRef100_A0A1C0V439 54 0.272 1.192E-05 0 73 91 11 84 398
|
| 80 |
+
RAPLQLLLFVDHRSSSWEQAsliQGRLQALKARY---PFAFDIVDVAEQPYVAEHFRLIATPALVKVHPHPRQTLTG-----------------
|
| 81 |
+
>UniRef100_UPI001E4DBDD3 44 0.295 3.304E-02 4 73 91 3 72 369
|
| 82 |
+
----QLLLFVDER-PSSSEYVRLIRDYIEiIKKSCPCQLEVIEIRKQPHLVEHFRLVATPALVKVSPGQKQILAG-----------------
|
| 83 |
+
>UniRef100_A0A969FMT9 63 0.364 8.088E-09 0 73 91 20 92 421
|
| 84 |
+
QATIQLLLFV-DRRPSSWEQMRYIRRYLENTDEQNFSLEVVDVSKQPYLAEHFKLIATPTLLKLYPEPRQMLAG-----------------
|
| 85 |
+
>UniRef100_A0A0M2Q0K7 49 0.281 5.361E-04 4 73 91 18 87 393
|
| 86 |
+
----QLLLFVDNRS-SSQEQMWQVRRTLETLsDPQTFQLEVMNVTEQPHLTEHFKLIATPALIRLYPTPQQILAG-----------------
|
| 87 |
+
>UniRef100_A0A978SUS5 52 0.356 4.240E-05 2 73 91 13 84 383
|
| 88 |
+
--PLQLLLFIDKR-PSSLEQSRQIQRYLqQTKAKHAFALQVIEVGEQPYLAEHFKIVATPALIKIYPEPRQIIAG-----------------
|
| 89 |
+
>UniRef100_A0A930TQ40 52 0.293 5.823E-05 2 73 91 13 86 408
|
| 90 |
+
--PLQLLLFVDER-PLSRERTRQIRKRLQElalNSPHAYTLQVVKVAEQPHLVEHFKLVATPSLIKISPEPRQILAG-----------------
|
| 91 |
+
>UniRef100_A0A2W4XV74 47 0.282 3.593E-03 2 73 91 13 89 395
|
| 92 |
+
--PFKLLLFIDKR-PHSSEQIRQiryhLKELMSRELmnKNQISLEIVDVTHQPDLAEHFKLVATPALVKVSADQHQTLTG-----------------
|
| 93 |
+
>UniRef100_A0A939KSU8 61 0.328 5.425E-08 2 73 91 13 84 391
|
| 94 |
+
--PLHLLLFVDKR-PISGEQIGQIsRQLKELHGDCDYELMVVDVGEQPYLAEHFKLVVTPTLIKIYPEPRQTLTG-----------------
|
| 95 |
+
>UniRef100_A0A8K1ZWU9 47 0.338 3.593E-03 4 73 91 19 88 385
|
| 96 |
+
----QLILFVDKR-AASKRQNQQVQDYLKSiEPSHSWNLQVADVEEQPYLAEYYKLVATPALIKLYPEPRQILTG-----------------
|
| 97 |
+
>UniRef100_UPI001C72C3C0 62 0.342 2.095E-08 2 73 91 19 90 401
|
| 98 |
+
--PLQLLLFVDGR-PQSRQQVQRIRSYLRElETEYSFELQIIDVGQQPYLAEHFKLVATPALIKIHPEPQQTLAG-----------------
|
| 99 |
+
>UniRef100_A0A6P0TIA3 54 0.305 8.682E-06 4 74 91 11 81 119
|
| 100 |
+
----QFLLFV-DRRPASSEKIRQIRQYLERKqGSSSIGLQVIDVGEQPSLVEHFRLVATPALVKIYPEPQQVFTGE----------------
|
| 101 |
+
>UniRef100_A0A5B8NIC7 51 0.252 7.996E-05 2 73 91 1 72 365
|
| 102 |
+
--PLQLLLFIDERSSsqEHLQGIQNyLEALKEDY---PFELQMVNVAEQPHLAENFRLVATPALVKIAPQPRQTIAG-----------------
|
| 103 |
+
>UniRef100_UPI002012D1AC 61 0.405 3.951E-08 2 73 91 13 84 388
|
| 104 |
+
--PLQLLLFVDER-PSSQEQLLRLHN-CIQELKTDYpfELEVVDVGEQPYLAEHFKLVATPALIKIHPPPQQTIAG-----------------
|
| 105 |
+
>UniRef100_UPI001C0312E0 49 0.310 3.904E-04 2 73 91 6 77 384
|
| 106 |
+
--PLKLLLFIDKR-PSSVEQVRQIRQHVEG-LEGEIDpvLEVVDVSEQPYLAEHFKLVVTPTLIKVSAAGRQTLTG-----------------
|
| 107 |
+
>UniRef100_A0A928Z921 53 0.319 3.088E-05 2 73 91 13 84 183
|
| 108 |
+
--PLQLLLFVDGRPSSWEQLRQVLAYLKEKNNEVDWDLKTIKVSEKPYLVEHFKLVATPALIKIHPEPQQTLAG-----------------
|
| 109 |
+
>UniRef100_A0A351L2B1 59 0.356 1.930E-07 2 73 91 13 84 168
|
| 110 |
+
--PLQLLLFVDER-PSSTEQLQQLRQYLERLKADyPLEFEVVDVGAQPYLAEHFKLVATPALIKINPPPRHVLAG-----------------
|
| 111 |
+
>UniRef100_A0A8J7E209 50 0.292 2.071E-04 1 77 91 18 95 383
|
| 112 |
+
-KSLQLLLFVDER-PSSQehikRIQSHLQSLQTEYL---FELEAINVGEHPDLVEHFRLIATPALVKIHPQPRQILAGsDLID-------------
|
| 113 |
+
>UniRef100_UPI00232B6744 50 0.283 2.843E-04 1 73 91 19 91 385
|
| 114 |
+
-KQLQLLLFVDERISTQNHA-QQIQSYLEELkQRDAFDLEILEISENPDLAEHFRLVATPSLVKIYPSPRQVLAG-----------------
|
| 115 |
+
>UniRef100_UPI0018EFA1D6 63 0.342 1.111E-08 2 73 91 13 84 395
|
| 116 |
+
--PLQLLLFVDGR-PKSRQQVQRIRAYLTELKTGyQFELQIIDVGQQPYMAEHFKLVATPALVKIHPEPRQIIAG-----------------
|
| 117 |
+
>UniRef100_U5DJK1 51 0.315 7.996E-05 2 73 91 19 90 404
|
| 118 |
+
--PLQLLLFVDERS-RTQEPMRRIQNYLLRLQQDNaFGLQVIGVEKQPHLAEHYRIVATPSLVRIWPPPRQTLAG-----------------
|
| 119 |
+
>UniRef100_A0A832M402 56 0.301 1.778E-06 2 73 91 13 84 431
|
| 120 |
+
--PLQLLLFVDKR-PSSREQVRQVRSALkERKEECDFEVQFIDVTEQPYLAEYFRLIATPALIKIHPEPRQTLAG-----------------
|
| 121 |
+
>UniRef100_A0A1C0VJA3 49 0.338 3.904E-04 4 73 91 37 106 411
|
| 122 |
+
----QLLMFVDKR-PGSHEQIRLLRKSLETLKTDfAFDLQIIDVGEQPYLAEHFKLMATPCLLKIHPSPRQMLAG-----------------
|
| 123 |
+
>UniRef100_A0A969K2I2 52 0.323 5.823E-05 4 73 91 15 84 397
|
| 124 |
+
----QLVLFVDKR-PSSNQKVRQIRNHLKDLRADYvFDLQIVDVGEQPHLAEYFKLVATPALIKIYPEPRQTLAG-----------------
|
| 125 |
+
>UniRef100_K9X4T7 62 0.328 2.095E-08 2 73 91 13 84 395
|
| 126 |
+
--PLQLLLFIDGR-PKSRQQVQRLRAHLKELEAEySFELQIIDVGQQPHLAEHFKLVATPALIKIHPEPRQVLAG-----------------
|
| 127 |
+
>UniRef100_W7Q8H4 65 0.365 1.206E-09 5 85 91 15 96 100
|
| 128 |
+
-----LILFVTGEAPRSRRAHLNLTAALEASGIGDVQPREIDLLVEPQEAIDFGIFATPALMHIDASGTRRVlYGDLSDEHSLRDFL-----
|
| 129 |
+
>W7Q8H4 68 0.365 1.743E-10 5 85 91 15 96 100
|
| 130 |
+
-----LILFVTGEAPRSRRAHLNLTAALEASGIGDVQPREIDLLVEPQEAIDFGIFATPALMHIDASGTRRVlYGDLSDEHSLRDFL-----
|
| 131 |
+
>A0A1H7MTT5 63 0.341 7.839E-09 5 85 91 16 97 101
|
| 132 |
+
-----LILFVTGEAPRSHRAQRNLAAALDASGISDAPAREIDLLHEPQQAIHFGIFATPALMHIDASGNRRVlYGDLSDEHRLKDFL-----
|
| 133 |
+
>UniRef100_A0A1H7MTT5 62 0.341 2.095E-08 5 88 91 16 100 101
|
| 134 |
+
-----LILFVTGEAPRSHRAQRNLAAALDASGISDAPAREIDLLHEPQQAIHFGIFATPALMHIDASGNRRVlYGDLSDEHRLKDFLSVL--
|
| 135 |
+
>A0A1P8R863 62 0.317 1.478E-08 5 85 91 9 90 94
|
| 136 |
+
-----LILFVTGEAPRSRRARYNLSSALKAAGLDKSPAHQIDLLRDPEQAISFGIFATPALMHIDAAGNRSVlYGDLSDQARLARFL-----
|
| 137 |
+
>UniRef100_A0A1P8R863 62 0.317 2.095E-08 5 85 91 9 90 94
|
| 138 |
+
-----LILFVTGEAPRSRRARYNLSSALKAAGLDKSPAHQIDLLRDPEQAISFGIFATPALMHIDAAGNRSVlYGDLSDQARLARFL-----
|
| 139 |
+
>UniRef100_UPI001CD124AC 61 0.357 3.951E-08 8 90 91 1 84 88
|
| 140 |
+
--------FVTGEAPRSRRAHQNLTTALAASGIGDAAPREIDLLGDPQEAINFGIFATPALMLVDASGRRTVlYGDLSDEHSLKTFLSPLRE
|
| 141 |
+
>UniRef100_A0A2N5Y751 59 0.341 1.405E-07 5 85 91 15 96 100
|
| 142 |
+
-----LILFVTGEAPRSRRAHLNLTAALDETGIDVAPPREIDLLREPQEAISFDIFATPALLhIEGSGHRRVLYGDLSDKQSLKDFL-----
|
| 143 |
+
>UniRef100_A0A2N7UCY5 58 0.337 4.998E-07 5 77 91 15 88 105
|
| 144 |
+
-----LILFITGEAPRSRRAQQHLKTALAASGADLAPAREIDLLSAPQEAIDFGIFATPALMIIDASGKRQVlYGDLSD-------------
|
| 145 |
+
>MGYP001039280987 63 0.333 7.839E-09 5 90 91 14 100 104
|
| 146 |
+
-----LMLFVTGNAPRSVRARRNLAGALDSLDLDDVKPMEIDLLSQPEQTVAYSVFATPALLKTEARGKMSVlYGDLSDEGKLQDFLQNLPE
|
| 147 |
+
>UniRef100_U5T3B0 61 0.341 5.425E-08 5 85 91 3 84 91
|
| 148 |
+
-----LILFVTGNAPRTQRARANLAKMLEQIGRGDLGPHEIDLLKQPQEGLTYSVFATPSLLKTDDgGGGSLLYGDLSDSDRLHRFL-----
|
| 149 |
+
>UniRef100_A0A540VSD6 64 0.304 2.274E-09 5 85 91 3 84 96
|
| 150 |
+
-----LILFVTGNAPRTVRARANLSRVLREMGLDSIKPQEVDLLETPQEGLTYSIFATPSLLRTdSGGDEGLLYGDLSDTDRLRRFL-----
|
| 151 |
+
>UniRef100_A0A2S6G6T8 61 0.337 2.877E-08 4 85 91 13 95 102
|
| 152 |
+
----VLLLFVTGTAPRSQRARVNLARMLEQVGRDDIQPHEIDLLEKPKEGVKHSVFATPALMKVGQGGdVSVLYGDLSETDRLQQFL-----
|
| 153 |
+
>UniRef100_UPI00124EBEF7 65 0.337 1.656E-09 4 85 91 13 95 103
|
| 154 |
+
----VLMLFVTGTAPRSQRARTNIAKMLEQLNCTDIQPYEIDLLEQPEQGIKHSVFATPSLLKVSPtGGVSVLYGDLSEEDRLRRFL-----
|
| 155 |
+
>UniRef100_UPI001439CAB5 64 0.350 4.289E-09 5 80 91 15 91 100
|
| 156 |
+
-----LVLFLAGDAPRSRRAHRNLSAALAATCPDLAPVHEIDLLREPQQAIDFGIFATPALLHIDAEGNRRlLYGDLSDEGR----------
|
| 157 |
+
>MGYP000294044467 63 0.317 1.077E-08 5 85 91 3 84 96
|
| 158 |
+
-----LILFVTGDAPRSRRARANLSNMLERLGRSDLTPREVDLLDQPQAGLSYSVFATPSLLKaDQQSDGALLYGDLSDEQRLERFL-----
|
| 159 |
+
>UniRef100_A0A4Q8CZA3 61 0.317 3.951E-08 5 85 91 14 95 104
|
| 160 |
+
-----LILFVTGNAPRTQRARANLARMLEEIGRDDLTPYEIDLLQQPQEGLTYSVFATPSLLKTDDEhGGSLLYGDLSDDDRLYRFL-----
|
| 161 |
+
>3104|Ga0306908_1123748_1|-11|01 68 0.329 9.241E-11 5 85 91 17 98 106
|
| 162 |
+
-----LMLFVTGTAPRSQRARTNLVHMLEQLNRTDVQPYEIDLLEQPEQGIKHSVFATPSLLKVSPTGEVSVlYGDLSEEDRLRRFL-----
|
| 163 |
+
>UniRef100_UPI00133045FA 64 0.329 4.289E-09 5 85 91 14 95 107
|
| 164 |
+
-----LILFVTGDAPRSRRARANLASMLERLGRTDLSPQEIDLLDQPQAGLSYAVFATPSLLKREPErDGALLYGDLSDSDRLERFL-----
|
| 165 |
+
>UniRef100_UPI00201FFA50 61 0.317 2.877E-08 5 85 91 13 94 99
|
| 166 |
+
-----LILFITGTAPRSQRARSNLGKMLDRLNLNDVKPFEVDLLEQPDQGIEHGIFATPSLLKFDSSGEVSIlYGDLSVEERLQKFL-----
|
| 167 |
+
>UniRef100_A0A845V233 60 0.312 1.023E-07 7 85 91 1 80 86
|
| 168 |
+
-------LFVTGSAPRSRRARKNLAAALKSLGLDSVKAMEIDLIDRPEKTVTYSVFATPALLRMDEAGEMRVlYGDLSDESKLLEFL-----
|
| 169 |
+
>UniRef100_UPI00082AD250 59 0.317 1.930E-07 5 85 91 14 95 105
|
| 170 |
+
-----LVLFVTGNSPRSLRARANLAKAVEETASDHITVRHVDLLEDTGGITEYGIFATPALVHVRDGGePAVLYGDLSNEAELQRFL-----
|
| 171 |
+
>MGYP001134272031 63 0.321 5.708E-09 5 90 91 14 100 104
|
| 172 |
+
-----LMLFLTGDAPRSVRARKNLSGALDKLELDEVTPMEIDLLDQPEQTVAYSVFATPALLRTDALGEMSVlYGDLSDEDKLHDFLQNLPE
|
| 173 |
+
>UniRef100_UPI00037CF37C 66 0.337 8.781E-10 4 85 91 13 95 103
|
| 174 |
+
----VLTLFVTGSAPRSQRARANLARMLEQIGRSDMQPQEIDLLEQPEQGITQSVFATPSLLKTDTHGEVSVlYGDLSEEEKLRRFL-----
|
| 175 |
+
>UniRef100_UPI0003674D18 66 0.325 6.394E-10 4 85 91 13 95 103
|
| 176 |
+
----VLTLFVTGSAPRSQRARANLARMLEQIGRSDLQPQEVDLLEHPQQGITQSVFATPSLLKTDANGEVSVlYGDLSEEEQLRRFL-----
|
| 177 |
+
>UniRef100_UPI00047687D0 61 0.313 2.877E-08 4 85 91 13 95 102
|
| 178 |
+
----VLTLFLTGTAPRSQRARANLAHMLEQIGRSDLRPYEIDLLEQPEESITHSVFATPSLLKTSDtGGVLMLYGDLSDEDTLHRFL-----
|
| 179 |
+
>UniRef100_A0A3S0W7L6 59 0.324 1.405E-07 5 80 91 3 79 92
|
| 180 |
+
-----LTLFVTGTAPRSQRARVNLAQMLNRIGRSDIQPYEIDLLEQPGQGITHSVFATPSLVKANeTGEVSVLYGDLSDEER----------
|
| 181 |
+
>UniRef100_UPI001903F48C 63 0.329 1.111E-08 5 85 91 14 95 99
|
| 182 |
+
-----LVLLVTGDAPRSRRARQNLARALEQLGLGDIATREVDLAADPAQTLSYGIFATPALLRPGPnGQPDVLYGDLSERDMLERFL-----
|
| 183 |
+
>SRR4051794_37995438 82 0.354 1.920E-15 7 85 91 61 139 143
|
| 184 |
+
-------LFIAGDGPNSTAAVANLRAFLAQRSASHVEVEIIDVFMEPQRGVSASVFVTPMLVRVEPTPERRILGNLSDRTVLASVL-----
|
| 185 |
+
>UniRef100_A0A318VX16 61 0.301 2.877E-08 4 85 91 14 96 104
|
| 186 |
+
----VLTLFVTGSAPRSQRARANLARMLEQLGRADLRPREVDLLEQPLQGITQSVFATPSLLKTDTNGEVSVlYGDLAEEEQLRRFL-----
|
| 187 |
+
>UniRef100_A0A372BWN2 67 0.337 2.469E-10 5 86 91 14 96 101
|
| 188 |
+
-----LTLFVTGDAPRSRRARRHLNAALKKLGQDSIKPLEIDLLEHPEQSINHSVFATPALLRARNDGEISVIyGDLSDESKLLDFLG----
|
| 189 |
+
>3300017992.a:Ga0180435_10008823_6 74 0.379 7.937E-13 0 86 91 10 96 103
|
| 190 |
+
RSRYVVRLFVAGDAHNSRIARENLNQLRDLLNDTELSIQVIDVEENPQLAIEHSIYVTPALQIVEPKPPTLVYGNLRDKETLLALFE----
|
| 191 |
+
>UniRef100_UPI000401CEFF 70 0.325 2.681E-11 7 86 91 18 97 110
|
| 192 |
+
-------LYIAGDAPNSRIALQNIKMIQENISQWNLKVAIVDVVATPEVALEKGIYLTPALEIEAHGMESLVYGNLSDKEKILALFG----
|
| 193 |
+
>16161|scaffold59688_2|+220|00 84 0.353 3.934E-16 0 81 91 11 92 102
|
| 194 |
+
KRRYVLRLFVAGDAPNSRIARENLRRLQESVAECDFEVEIVDVMENPQSALDHGVFVTPALQIIEPGPEKLIFGNLTNKEAL---------
|
| 195 |
+
>SRR5690606_35087643 76 0.305 3.066E-13 3 87 91 39 123 125
|
| 196 |
+
---YVMRLFISDNAVNSRIARENLSNFLAEFPQHSFQIEIVDLYLQPEMALQNGIFITPTLQILAPQPGGIIYGNLSDRNALERILQI---
|
| 197 |
+
>UniRef100_A0A0R3M5F3 93 0.404 2.017E-19 4 87 91 10 93 100
|
| 198 |
+
----VLRLYIAGNSAGSRRAEQNLHDLRALLDNQAWRIEIIDVMRRPELAEQAGIIATPTLSCEHSGRPRRIVGDLSDKKRVLEFLGI---
|
| 199 |
+
>UniRef100_A0A969HBU9 81 0.352 2.720E-15 1 85 91 12 96 101
|
| 200 |
+
-KHYVIRLFVAGNAPNSRLARENLDRFQAGFPEHEFKVEIIDLDIQPELALENGVFITPTLVVLEPAPGGMIYGNLSDQKVLAQVL-----
|
| 201 |
+
>MGYP000847580960 74 0.284 1.090E-12 0 87 91 10 97 98
|
| 202 |
+
KKRYALRLFVAGNATNSRIARENLEQLLARHPEHEFEVEIVDLNVQPEFALDQGVFISPALQILEPSSGGIVYGNLSQKEVLEKVLNL---
|
| 203 |
+
>A0A0R3M5F3 91 0.395 1.797E-18 4 89 91 10 95 100
|
| 204 |
+
----VLRLYIAGNSAGSRRAEQNLHDLRALLDNQAWRIEIIDVMRRPELAEQAGIIATPTLSCEHSGRPRRIVGDLSDKKRVLEFLGIET-
|
| 205 |
+
>SRR5262249_39096779 91 0.469 9.533E-19 5 87 91 44 126 134
|
| 206 |
+
-----LRLYIAGNSAISRRAEQNLLHLQSLVKPGAWEVHVIDVLPKPELAEQAGILATPTLSYEHPVRPRRIIGDLSDTTRVLDFLGI---
|
| 207 |
+
>SRR5690348_3124064 88 0.481 1.652E-17 5 87 91 48 130 138
|
| 208 |
+
-----LHLYVAGNTASSRRAQQNLLRLREIMKEPQCEVRIIDVLVEPQLAEEAGILATPTLSYEHPQRPRRIVGDLGESKRILEFLGL---
|
| 209 |
+
>MGYP000666260026 97 0.404 8.218E-21 4 87 91 8 91 97
|
| 210 |
+
----VLRLYVAGEGPNSVRARANIVDLCDRHLQGAYSLEIVDVFDEPGRALEEGVLMTPMLVVASASPPRRVVGTLDETSVVLTALGL---
|
| 211 |
+
>UniRef100_A0A838IZY5 88 0.392 1.705E-17 5 86 91 11 94 101
|
| 212 |
+
-----LRLYVAGEGPNSRQARENLRVICEAHLAGRHVIEVLDVFEEPERALDDGVYLTPQLLVLVlpPATPRTVVGNLSEREVVLRALG----
|
| 213 |
+
>MGYP001366537082 74 0.301 7.937E-13 4 86 91 16 98 121
|
| 214 |
+
----VIRFFVAGEAPNSIIARDNLRRLRESLPEIHFEIEIVDVNVNPEIALQKGVFVTPALEVLEPPPGGIFYGNLSNSDPIRRLIE----
|
| 215 |
+
>A0A1Q2HNV8 77 0.394 8.623E-14 2 77 91 6 81 93
|
| 216 |
+
--KMVLRLYVAGKGLNSAMAIENLKQICRTCNSYDYDLKIVDVLKEPQTALDKGIFVTPALEIIEPAPGGMVYGTLAD-------------
|
| 217 |
+
>UniRef100_UPI002011AF87 91 0.471 1.854E-18 5 90 91 36 122 127
|
| 218 |
+
-----LRLYIAGPSATSRRAEQNLRRLRDvAKARDGLAVEIIDVLKNPELAEQAAIIATPTLALEHPVRPRRIIGDLSDVERVLDFLGIESE
|
| 219 |
+
>23258|scaffold4609030_1|+1|11 75 0.337 5.780E-13 6 85 91 20 99 104
|
| 220 |
+
------RLFVCGDALNSRRARENLQRLREMFPHVEFKVEVIDVGETPQAALDQGIFVNPALQVLEPGPGMLIYGDLSDLQALAAML-----
|
| 221 |
+
>SRR5579871_6120579 90 0.435 3.387E-18 5 89 91 48 132 138
|
| 222 |
+
-----LRLYIAGNSASSRRAEHNLEHLRKFMNAEGWKIEVIDVLARPELAEEASILATPTLSYEYSGRPRRIVGDLSDTKRVLKFLGIEP-
|
| 223 |
+
>MGYP000105995723 87 0.428 4.277E-17 4 87 91 2 85 93
|
| 224 |
+
----VLRLYIAGNSPSSRLAQQNLKHLRMLMKGGNEQVEVVDVLANPELAEKASILATPTLCYEHSGRQRRIVGDLGDPKRILAFLGI---
|
| 225 |
+
>SRR3954468_6301146 102 0.431 2.518E-22 0 87 91 46 133 148
|
| 226 |
+
KAPMVLRLYVAGDAPNSTRARANLRRLLSAVDPSRYNLEVIDFLTEPLRALDDGVLVTPTLMRVDPPPPQVVVGTLSALDRVADALDI---
|
| 227 |
+
>MGYP001433622665 94 0.494 1.037E-19 1 87 91 3 89 94
|
| 228 |
+
-KPYQLMLFVAQGQPNSVRAQKNLRQICEEVIPGKYHLKVIDVVKEPELAVENGIYLTPMLVVSDPPPPASITGDMAERKTVLAALKI---
|
| 229 |
+
>UniRef100_A0A3D6C093 96 0.494 2.194E-20 1 87 91 3 89 94
|
| 230 |
+
-KPYQLMLFVAQGQPNSVRAQKNLRQICEEVIPGKYHLKVIDVVKEPELAVENGIYLTPMLVVSDPPPPASITGDMAERKTVLAALKI---
|
| 231 |
+
>A0A1W6LJH8 78 0.418 6.280E-14 4 77 91 8 81 93
|
| 232 |
+
----VLRLYVAGKNVNSTLAIENLEKLCRRCNSFEYDLKIVDVLKNPETALEKGIFVTPALEILEPAPGGMVYGTLSD-------------
|
| 233 |
+
>SRR5687767_11767969 91 0.404 1.797E-18 4 87 91 38 121 122
|
| 234 |
+
----VLRLYIAGNSASSRRAEQNLHALRASLAQNAWEVEIIDVLSKPELAEQAGVIATPTLSYEHSGRSRRIIGDLSDKKRILEFLGI---
|
| 235 |
+
>SRR4051794_40104329 92 0.459 5.058E-19 5 90 91 41 127 130
|
| 236 |
+
-----LRLYIAGPSATSRRAEQNLLRLRDvAKAPNGLEVEVIDVLENPELAEQAAIIATPTLAFEHPVRPRRIIGDLSDVERVLDFLGIESE
|
| 237 |
+
>SRR3954454_22706284 106 0.441 1.060E-23 2 87 91 11 96 109
|
| 238 |
+
--PLVLRLYVAGDAPNSARARANLTRLLSDLDSSRYTLEIIDCLDEPARALGDGVFVTPTLVRLGPPPQRTIVGTLSATDRVADALDL---
|
| 239 |
+
>MGYP000010225417 75 0.407 4.209E-13 5 85 91 104 183 188
|
| 240 |
+
-----LVLYVAGDGPYSRRARANLQALMRE-AGIAAEVTVVDVLKSPDRALEHGIFATPALIVVHGKHETLIMGDLSERDTALEAL-----
|
| 241 |
+
>2271|Ga0209795_10171170_2|-245|01 90 0.406 2.467E-18 4 89 91 30 115 121
|
| 242 |
+
----VLRLYIAGNSASSRRAEQNLHRMQAFIKSEAWDVEIIDVLSKPELAEKAGIIATPTLSFEHSARPRRIVGDLSDTKRVLEFLGIET-
|
| 243 |
+
>MGYP000738482073 88 0.329 1.652E-17 1 90 91 15 105 111
|
| 244 |
+
-PIYRLRLFIAGDEPNSVRAREALARLRNERLGPQCEVEVVDVFQDYQAAITHGVSVVPTLKIEGPRGGRTIVGSLRDEAVVLAALGLsPTE
|
| 245 |
+
>SRR6202012_1017248 94 0.447 1.037E-19 3 87 91 12 96 104
|
| 246 |
+
---FSLRLYIAGDSITSRRARQQLARIREILKQHKFDVETIDVLAQPQLAEQERILATPTLASEHGGPPKRIVGDLSDTKRVLEFLGI---
|
| 247 |
+
>26133|Ga0268298_10010625_3|-7238|00 74 0.317 1.090E-12 3 87 91 15 99 100
|
| 248 |
+
---YALRLFVAGNAANSQIARENLERLRARYPDYEFEVEVIDLNIDPEVALTHGIFISPALQVIDPPTGGVIYGNLSDERVLERVLKL---
|
| 249 |
+
>SRR5262249_6174883 86 0.452 5.872E-17 4 87 91 14 97 102
|
| 250 |
+
----VLRLYIARNSPSSRRAEQNLDYLRRLMKADGGRVEVIDVLANPELAERESILATPTLCYEHSGHRRRIIGDLGDPERILAFLGI---
|
| 251 |
+
>UniRef100_UPI001904043C 77 0.383 8.896E-14 2 87 91 2 86 90
|
| 252 |
+
--PYQLRLFVSGPNPLCRKAERAIRELLIERGV-AYELDVIDVLADPDAAEEYALVATPTLECTAPPPVRRVVGYYEHYAEVFDALGI---
|
| 253 |
+
>MGYP001146183833 101 0.583 3.457E-22 4 87 91 2 85 90
|
| 254 |
+
----ELRLYVIGKTPSAIKATEHLRALLEDQYKDEYALEVVDVLENPILASDDKILATPTVVRRLPHPIRKVIGDLSEREKVLLGLDL---
|
| 255 |
+
>UniRef100_A0A2U2N9L6 76 0.329 2.303E-13 4 85 91 7 88 93
|
| 256 |
+
----QLTLFVAGDSPRSRHAREVLRRALAERGLDPGALELVDVLAEPERTLEHGVFATPALVLRADGATRSLYGDLSDEQGLQQFL-----
|
| 257 |
+
>UniRef100_A0A127EN01 85 0.433 1.568E-16 5 87 91 11 92 99
|
| 258 |
+
-----LRLYIAGRSAISQRAESHLRQ-LHRSIKLECNIEIIDVLKSPELAEQAGVLATPTLSYEHPSRSRRIIGDLSDTKRIVEFLGI---
|
| 259 |
+
>A0A127EN01 87 0.421 4.277E-17 5 87 91 11 92 99
|
| 260 |
+
-----LRLYIAGRSAISQRAESHLRQL-HRSIKLECNIEIIDVLKSPELAEQAGVLATPTLSYEHPSRSRRIIGDLSDTKRIVEFLGI---
|
| 261 |
+
>U2E7T8 67 0.387 3.286E-10 7 85 91 13 92 100
|
| 262 |
+
-------LFVAGDSPSSRRARRALESLIGSQSNEQkAQFEVVDVLREPERALESNLLATPTLLIERGGHVSRYVGDLHEREDVREEL-----
|
| 263 |
+
>UniRef100_UPI00190730C9 67 0.308 3.391E-10 5 85 91 15 95 97
|
| 264 |
+
-----LTVFIAGDAPSSRQAMAHLTGVLDSIGIPPERLQVVDVLTDPGAALDAGALVTPSLQIKRGERARWFLGDLTDQRDLLAFL-----
|
| 265 |
+
>UniRef100_UPI000D3E5BC6 66 0.347 6.394E-10 2 73 91 3 74 103
|
| 266 |
+
--RFRVNVYVVGGSNHASRAVALLRDVADTHFGGDAEITVIDVTSEPALADAAGVITTPTYDLLAPLPRRRIIG-----------------
|
| 267 |
+
>UniRef100_A0A1Y6FIV8 61 0.308 5.425E-08 5 85 91 11 91 95
|
| 268 |
+
-----LTLLVAGESQATRSARATLDSLIGDGLAEASHVRVIDILQQPDYALRYKAFFTPSLIVETPTTTTTIVGDLHELDEVRSLL-----
|
| 269 |
+
>SRR5919109_3969706 72 0.309 5.322E-12 2 85 91 7 90 100
|
| 270 |
+
--PVSLVLYVSAESPASQRARRHLESLLAQFDASQLAVEVCDVSADPVRGETDHVVFTPTLVARSGGLATWVLGDLADRSMLVDLL-----
|
| 271 |
+
>MGYP001077603090 119 0.505 1.625E-28 1 87 91 17 103 106
|
| 272 |
+
-EKWNFTLYVAGDNLSARRAKKNLQGICDEYLEGRYAIEIVDLVEHPEIAEEDQILAAPTLVRKLPLPLRRIIGDLSSREKVLIGLEI---
|
| 273 |
+
>SRR3954463_15126113 100 0.392 8.943E-22 3 86 91 17 100 101
|
| 274 |
+
---FKFRLYVASSTPNAAKATANLQQLCREHLPGRHAIEVVDVFKQPKRALADQIYLTPTLLRLAPMPVRKIVGNLSEASALLAALG----
|
| 275 |
+
>ERR1039457_666370 113 0.443 3.540E-26 2 89 91 12 99 108
|
| 276 |
+
--RYRFKLYVTDHTLRSRQALAQLRKLCDEQFPQQYELEVVDVLEHPDEAAAQHIFATPTVVRERPLPIRRVIGDLSDMGKVLAGLALPP-
|
| 277 |
+
>UniRef100_UPI0005BD3569 63 0.370 8.088E-09 7 84 91 7 86 98
|
| 278 |
+
-------LFVAGGAPRSAAALRNLTAAIAatGRPEGTFRIELIDVLRDPARALEAGLLATPSLALTaANGRRRWFIGDF-DRPELLAG------
|
| 279 |
+
>10876|scaffold_592705_c1_2|-157|01 111 0.425 9.155E-26 1 87 91 56 142 164
|
| 280 |
+
-PVWKLHLYVADTTPRSVLATENLHSFCDQYLPGQYRVTIIDIVKQPALAREHEILATPTLIRVFPGPERTVVGSLSDTARVARALEL---
|
| 281 |
+
>UniRef100_A0A7V9DA05 104 0.360 5.330E-23 3 88 91 3 88 99
|
| 282 |
+
---YSFRLYVTGETTLSREAEANLRALCKNRLVDDYEIEIVDILERSALAEEEQIVATPTIMRLAPLPRLRVIGDLSDHERAARAFGLP--
|
| 283 |
+
>MGYP000147404972 93 0.414 2.683E-19 7 88 91 17 98 99
|
| 284 |
+
-------LFVAGNAPNSVSAQANLRQVCEQRLKNGWELKIIDVLEDYGTALDHGILVTPALVILEPLPAVTVFGDLSDTDRLLKALRLI--
|
| 285 |
+
>SRR5207249_9234194 77 0.373 8.623E-14 3 85 91 34 115 125
|
| 286 |
+
---FELVLYVSPGSPACARAQRNIHELLGRLDRAQVDLDVRDVSEDAERAEADRILFTPTLVVRRPL-LTWIVGDLTNGEEVLRVL-----
|
| 287 |
+
>UniRef100_A0A8T3N6J2 109 0.500 4.603E-25 2 87 91 1 86 97
|
| 288 |
+
--KYRLRLFVTGHTPASLSAQKNLRKLCEGELRGWCEFEVVDVLKQPELAEEARIIATPTLVKLTPEPQRKVIGDLSNHDQLLHVLDM---
|
| 289 |
+
>SRR3954451_13963006 68 0.325 1.269E-10 5 87 91 69 149 159
|
| 290 |
+
-----FTLYVDGPE-QGRHVSRRLLELCQPWGIAP-DLSVVDVGDGPDQAEQANIIGTPTVVREAPAPRRRIIGALDDDRRVVEALGL---
|
| 291 |
+
>UniRef100_A0A1T4Y342 59 0.308 1.930E-07 5 85 91 11 91 95
|
| 292 |
+
-----LTLLVAGESSAARSARATLDTLIGDGLAEASHVRVIDVLQQPDYALRYKVYFTPSLIVETSTTTTTIVGDLHEIDEVRSLL-----
|
| 293 |
+
>UniRef100_A0A2V7TZK6 77 0.373 1.222E-13 3 85 91 21 102 112
|
| 294 |
+
---FELVLYVSPGSPACARAQRNIHELLGRLDRSQVDLDVRDVSEDAERAEADRILFTPTLVVRRPL-LTWIVGDLTNGEEVLRVL-----
|
| 295 |
+
>UniRef100_A0A831PRG7 79 0.301 1.823E-14 4 86 91 30 112 122
|
| 296 |
+
----HLRLYILGTSARASLARQRVEEFCGQFPPGRLRLEVIDLLVDGEVAERDRIIATPSLRRVMPLPVVSLVGDMGDEQQLVALVN----
|
| 297 |
+
>SRR6185295_5137210 81 0.372 2.636E-15 0 85 91 100 184 194
|
| 298 |
+
RRPVELVLYVSPASPASARAQHNIHELIARLGGSKVDLDVCDVSEDAERAEADRILFTPTLVVRRPL-LTWIVGDLTNGEEVLRIL-----
|
| 299 |
+
>SRR5687767_10342612 86 0.380 8.062E-17 2 85 91 60 142 156
|
| 300 |
+
--PVELALYVTLPWPASLKAKRNLDRVLSGFSRSQVSLTVCDLAQEPERAEQDGIVFSPTLVKRMPEPRAWVMGDLSDR-KVLSNL-----
|
| 301 |
+
>SRR5881409_182087 64 0.290 4.157E-09 2 87 91 5 88 111
|
| 302 |
+
--PMVFTAYVDG-TEMGGQVRTRLLELCASRDVTA-EIRVVDVLSEPAAAETGNVVGVPTVVREQPHPRRRVIGVLDDTRRVAEALGL---
|
| 303 |
+
>UniRef100_A0A4Q3W5A8 102 0.459 1.893E-22 1 87 91 12 98 102
|
| 304 |
+
-KEFVLRLFVTGASPNSLKALNNIREICENHAKGNYSLEVIDVYQNAELVQQEQIIALPLLVRKNPLPERKLIGDLSEKEKVIKYLGL---
|
| 305 |
+
>UniRef100_A0A934QHJ2 66 0.320 8.781E-10 5 85 91 16 96 98
|
| 306 |
+
-----LTVFIAGDAPSSRQAMTNLTGVLDSLDIPPERLEVVDVLTNPAAALKAGALVTPSLQVKRGQQVYWFLGDLTEQRDLLAFL-----
|
| 307 |
+
>4460|scaffold_415991_c1_2|-159|00 113 0.448 2.579E-26 0 86 91 24 110 118
|
| 308 |
+
KARYTLRLYVAGFRQSSRSAIANIRRICDKHFEGSANLEVIDIYQQPELAAAQQIIASPTLIKEAPAPFRRVIGDLSDTTKVLAALG----
|
| 309 |
+
>MGYP000603749840 111 0.443 1.725E-25 2 89 91 69 156 158
|
| 310 |
+
--RFLLQLYVAGNSHRCVNARKNLREICEEHLPDSYTLEIIDIVENPEAAEEADIVAVPTLVKRSPSPVRKVVGDMSRTQNVLSGLNIEP-
|
| 311 |
+
>UPI0003E01BF4 75 0.317 4.209E-13 2 85 91 167 251 261
|
| 312 |
+
--RIELALYISASSPSSLKALRNLTRLLADHDAAQVRFTTYDLSkEHIAAAQEDRIAFTPTLVKRWPEPKVWILGDLDDIRVVSDLL-----
|
| 313 |
+
>10796|Ga0318514_12978242_1|+2|11 63 0.291 5.708E-09 0 77 91 12 89 92
|
| 314 |
+
QPRVELSLYTSSGSPSSLKAVRNLMSLLSNYDPLQVRLSVRDLsREAHEQAAEDRIAFTPTLVKRN-EPKVWVLGSLDD-------------
|
| 315 |
+
>UniRef100_UPI0021E125E4 71 0.313 1.035E-11 1 85 91 139 224 234
|
| 316 |
+
-KRIAFTLYISEASTASLRALRNLQKLLDGYDASQIDLRVVDLSkERPASFDEDRITFTPTLVKRSPEPRVYLLGTLEHIQSVADLL-----
|
| 317 |
+
>SRR3712207_3115678 67 0.311 2.393E-10 2 77 91 61 137 142
|
| 318 |
+
--RLELFFYVSSASACSLKALRNLDRFLADYQGAQVRLRVFDLSQDyPAEAEEDRIAFTPSLVRRYPTPKTWLLGSLDD-------------
|
| 319 |
+
>SRR6185295_1711097 83 0.388 1.018E-15 2 85 91 94 178 188
|
| 320 |
+
--RVQLTLYISTSSPSSLKALRNLQKLLNDYDPGTVSLTVCDLSRDtTGSAEEDRIVFTPTLVKRVPEPKVWILGDLENAEIVSDLL-----
|
| 321 |
+
>UniRef100_UPI00214A28C1 63 0.337 8.088E-09 5 86 91 139 221 230
|
| 322 |
+
-----LMLYISEASPASLRALRKLEKLLAGYERSQVRLTVVDLAkERPPSFDEDRIAFTPTLVKRYPTPRAYYLGALDQLQAVTDLLN----
|
| 323 |
+
>UniRef100_UPI00193BA132 68 0.337 1.798E-10 5 86 91 139 221 230
|
| 324 |
+
-----LTLYISEASPASLRALRNLEKLLANYERSQVRLAVVDLSkERPPSFDEDRIAFTPTLVKRFPAPRAYYLGTLDQFQAVTDLLN----
|
| 325 |
+
>SRR5688500_8380976 82 0.345 1.398E-15 2 85 91 168 251 262
|
| 326 |
+
--PIELVLYYTPPWPSSMKARRNLEKILGKYEADAVRLTVRDLGEHPDLAEADGVVFSPTLLKKSPGAPVWMLGDLSDATAVTDLL-----
|
| 327 |
+
>ETNmetMinimDraft_32_1059908.scaffolds.fasta_scaffold895325_1 83 0.365 5.401E-16 4 85 91 130 211 221
|
| 328 |
+
----ELVLYYTPPWTSSLKALRNLEKILEGFDKDAVHLNVRDLAEHPEQAEADGVVFSPTLVKKAPGPPVWMLGDLSDSRAVTDLL-----
|
| 329 |
+
>SRR5512132_2975018 81 0.341 4.970E-15 4 85 91 33 114 125
|
| 330 |
+
----ELVLYYTPPWPSSMKARRNLEKILEGYEAEAVHLTLRDLGDHPDLAESDGVVFSPTLIKRSPGAPVWMLGDLSDGSAVTDLL-----
|
| 331 |
+
>13960|scaffold210726_2|+957|01 84 0.341 2.865E-16 1 85 91 155 239 249
|
| 332 |
+
-PKAEFVLYISSASPSSLKALRNMQRLLGEYQASQVRFTVCDLLKEPGCFEEDHVAFTPTLVKRLPGPKTWIVGDLQDSSMVTDLL-----
|
| 333 |
+
>12613|JGI10216J12902_106548506_1|+3|10 64 0.365 4.157E-09 5 85 91 84 164 178
|
| 334 |
+
-----LTLYVS-DSMLSLRAAKNLRMVLARYRDEQVALTVINLSHDVDhHAEEDRIVVTPTLLRTFPAPRVWLVGNLDKRDLVERLL-----
|
| 335 |
+
>SRR5688572_30282090 83 0.341 7.416E-16 4 85 91 58 139 150
|
| 336 |
+
----ELILYYTPPWPSSIKARRNLEKILEGYDADAVHLTVRDIAEQPDLAEADGVVFSPTLIKKSPGAPVWMLGDLSDASGITELL-----
|
| 337 |
+
>SRR5919106_6413091 63 0.320 7.839E-09 5 81 91 24 100 104
|
| 338 |
+
-----FVMYVNG-SNRSRRALRKVRALFDEYDAAQLTWSTIDVSSDtASRVEQDRIVVTPTLLKTYPSPAVWITGELENTDVV---------
|
| 339 |
+
>SRR5688572_2308330 79 0.317 2.426E-14 4 85 91 137 218 229
|
| 340 |
+
----ELVFYYTPPWPSSMKARRNLEKILSGYAADAVHLTVRDLGEHPDLAEADGVVFSPTLIKKSPGAPVWMLGDLSDASGITELL-----
|
| 341 |
+
>3300018984.a:Ga0193605_1004274_5 73 0.318 2.055E-12 0 86 91 168 255 265
|
| 342 |
+
QKRIGFVLYISEASSSSLRALRNLQRLLDEYETSQIDLKVVDLSkERPASFDEDRITFTPTLVKRNPEPRVHLLGTLEHIQSVAELLG----
|
| 343 |
+
>GraSoiStandDraft_41_1057321.scaffolds.fasta_scaffold894400_3 78 0.346 6.280E-14 5 79 91 90 164 168
|
| 344 |
+
-----LVLYVSAASPASIQARRNLERMLSRFEPGQVRWTVRDLEREPLAGEEDRIAFTPTLVKRFPEPRMWVLGNLREAD-----------
|
| 345 |
+
>UniRef100_A0A7X0U868 87 0.345 2.341E-17 4 87 91 21 104 113
|
| 346 |
+
----HFRLYVSTTSPISLRAIANARRILQEAYPGAHRLTVLNIAEHVALARTDQIIVSPTLLRLAPLPQRRFIGDLSDLNRLRRALGM---
|
| 347 |
+
>SRR5690242_21852208 83 0.395 1.018E-15 2 86 91 14 99 108
|
| 348 |
+
--RIELVLYTAGRSPASMRALRQMKNLLAQYESAQVDFKVFDLAEgRPASAEEDHILLTPTLVQRSPPPRTWVVGDLEDTTLIADLLD----
|
| 349 |
+
>SRR4051812_14954157 76 0.301 1.626E-13 2 83 91 38 120 132
|
| 350 |
+
--RLELVLYISAQSPASLRAIRNFQAILNEFDADEVDYSVCDLATDiSGAADEDRIAFTPSLVKRHPLPNEWFLGDLTNTDPIRA-------
|
| 351 |
+
>SRR5687768_5535903 80 0.329 6.824E-15 4 85 91 139 220 230
|
| 352 |
+
----ELLLYYTPPWPSSMKARRNLEKILKAYEADAVHLTLRDLGEHPDLAEQDGVVFSPTLIKRSPGAPVWMLGDLSDSTAVTDLL-----
|
| 353 |
+
>GraSoiStandDraft_2_1057267.scaffolds.fasta_scaffold1451175_1 77 0.317 8.623E-14 4 85 91 155 236 246
|
| 354 |
+
----EFVLYVSASSAASSQARRNFEQLLEGYDATQVRYTVCDLGRDPLAGDEDRVAFTPTLVKRYPPPRMWLIGNLRELEIVADIL-----
|
| 355 |
+
>SRR6185437_3447168 72 0.307 5.322E-12 2 79 91 62 139 141
|
| 356 |
+
--PIELMLYVSRHSAHSAQAIRNITSVLSRFKATQVKLTVCDLSADPGAGAKDNITYTPTLVRSGPGPRTYILGHISNPE-----------
|
| 357 |
+
>SRR4051794_7772101 82 0.376 1.920E-15 2 85 91 91 175 188
|
| 358 |
+
--RIELVLYVSGSSPSSLKAMRNLDGVLRQFDLACIQLEICDLSRgYPATAEEDRIAFTPTLVKRGPAPRAWVIGNLENRDLVADLL-----
|
| 359 |
+
>SRR5829696_6517560 79 0.294 1.767E-14 2 86 91 36 120 128
|
| 360 |
+
--KVELVLYISSASPSSIVARRNLEKVLDRFDSAQIHVTVCDLVADPLAGERDRIAFTPTLVKTYPAPKMWVLGNLRDPAIVEDLLG----
|
| 361 |
+
>MGYP000536513446 72 0.362 3.875E-12 5 81 91 139 218 236
|
| 362 |
+
-----LKLYFSGVSTESRRAVRNLRRVLKEFDRHRIRLDVHDISDRAAPVaplEQDRIVVTPTLVRKHPLPKLWILGDLSKLEVV---------
|
| 363 |
+
>SRR5687768_1210620 76 0.358 2.232E-13 5 81 91 136 216 217
|
| 364 |
+
-----LALYVSGTSPSSRKALRNLTQVLRNVEPHRVAVTVHDISEsdHPwvEAAEDDRVVVIPTLVRHAPLPRVWIAGDLSEIDTV---------
|
| 365 |
+
>SRR6059058_1924189 81 0.378 2.636E-15 2 75 91 28 101 104
|
| 366 |
+
--RIELVLYVSASSPPSVRAVANLRRILQRYKSDRIRCSICDLTANPEEGDVDQIAFTPTLCKRQPEPPMWILGDL---------------
|
| 367 |
+
>SRR5688500_14284300 87 0.383 4.277E-17 2 86 91 26 111 119
|
| 368 |
+
--RIELVLYVAGPSPASTRALRQMKNLLAQYDAAQVDFRVIDLAQgRPASAEEDRVLLTPALVRRSPLPRTWVVGDLEDTTLVADLLD----
|
| 369 |
+
>SRR5688572_7102903 68 0.373 1.269E-10 5 75 91 47 121 122
|
| 370 |
+
-----LVLYVSGDSQASRRALRNVRRALEGVDPAAIELDVRDVSSSdgaaIAAADADRIVVTPTLVRAAPSPKVWIAGDL---------------
|
| 371 |
+
>SRR4029453_1913048 75 0.261 5.780E-13 2 85 91 54 137 147
|
| 372 |
+
--RVEFVLYVSPNSAASLQARRNFDKLLARFDATQVKYSICDLIRDPLAGDSDRVAFTPTLVKRYPAPRMWLIGNLRETEVLADIL-----
|
| 373 |
+
>SRR6185503_893473 70 0.360 2.599E-11 2 87 91 34 118 126
|
| 374 |
+
--PIELVLYVAPNSPACVRARANLEAALDAYDRSRIRLTVCDVSRDFEDAERDRIVFTPTLL-LRGAEAGCVVGDLSLGDAVDALLSL---
|
| 375 |
+
>SRR5262245_8274589 79 0.285 1.767E-14 2 85 91 152 235 245
|
| 376 |
+
--RVELVLYVSPSSPACAQARHNLERVLDHFDPSQIKYSVFDLVRDPLAGEDDRVAFTPTLVKRYPAPRTWVLGNLRDTQIVGDLL-----
|
| 377 |
+
>SoimicMinimDraft_3_1059731.scaffolds.fasta_scaffold2181823_1 76 0.292 3.066E-13 4 85 91 147 228 238
|
| 378 |
+
----ELVLYVSSASPACIQARRNLEQLLEKFDVSQVRFSICDLGREPTAGDADRIAFTPTLVKRYPEPKMWVLGNLREPQIIADLL-----
|
| 379 |
+
>23892|Ga0310888_10269646_2|-120|01 75 0.292 4.209E-13 4 85 91 189 270 280
|
| 380 |
+
----ELVLYVSSASPASTQARRNLELVLDGFDRSQIKYTICDLGRDPMAGEIDRVAFTPTLVKRYPEPRMWLLGTLRETDLVADLL-----
|
| 381 |
+
>SRR3954471_22207506 105 0.388 1.455E-23 3 87 91 77 161 171
|
| 382 |
+
---FRFRLYVAGTTPNSVQARANLSALCRRHLPGRYKIEIVDVSKQPDRALIEGIFMTPSLMKISPSPTRMIVGTLSPSDALMRALGL---
|
| 383 |
+
>SRR5581483_9829629 105 0.383 1.455E-23 3 88 91 11 96 111
|
| 384 |
+
---YSFRLYLAGGTARAMQAEAQLRHLCETRLPEGFELEVLDVTDHPDRAEEDRVLVTPTVIRLSPPPARRVLGSLSDEHRVGLALGLP--
|
| 385 |
+
>Cyp2metagenome_2_1107375.scaffolds.fasta_scaffold1445432_1 88 0.363 1.652E-17 0 87 91 148 235 244
|
| 386 |
+
RPRVQLVLYVDAVWVTSVRAQENLEKVLAGFDRSQVHLRICDVAREPLDAEKDQIVFTPTLVKRSPAPRAWVVGDLSDHDVVTALLEM---
|
| 387 |
+
>UPI00034656C1 83 0.383 7.416E-16 2 87 91 153 238 252
|
| 388 |
+
--KLELALYVTMPWPSSLRAKANLSRVLARVPEGHVRLAVCDLAREPERAEIDNVLFSPTLVKVWPAPKMWILGDLSEANVLTDLLSL---
|
| 389 |
+
>UniRef100_A0A2V7Y706 73 0.413 2.912E-12 2 76 91 165 239 260
|
| 390 |
+
--KIELALYVTLPWPSSLRAQSNLSRVLSGVPDGEVRLSVCDLAREPERAERDNVLFSPTLVKVWPEPKMWILGDLS--------------
|
| 391 |
+
>SRR3954454_15902270 80 0.309 6.824E-15 2 85 91 154 237 247
|
| 392 |
+
--RVELVLYVSSASAASVQARRNLEQVLERFERSQIKCSVCDLVRDPLAGTDDRVAFTPTLVKRFPEPRMWVIGNFRDPEVVADLL-----
|
| 393 |
+
>SRR5438045_7802661 70 0.270 2.599E-11 0 84 91 42 125 126
|
| 394 |
+
RRKVELVLYTSAASEKCQRAIRSIQQVLERYERDQVSFTICDLCCDPEAGDADAVIFTPTLVKRGAEPKRWIVGSL-DRPRLGAG------
|
| 395 |
+
>AntDryMetagUQ889_1029465.scaffolds.fasta_scaffold07537_1 79 0.297 1.287E-14 2 85 91 66 149 159
|
| 396 |
+
--RVELVLYVSSASPASVQARRNLERLLAGFDGSQVKFTVCDLVRDPTAGDSDRVAFTPTLVKRYPEPRMWVLGNLREPQIVADML-----
|
| 397 |
+
>SRR5678816_3727367 82 0.333 1.920E-15 0 86 91 82 168 182
|
| 398 |
+
RPPIELVLYVSSLSPHSIAALRNLRQTLAQYGGGAVKLTVCDLSKDPSLADRDGVHFTPSLVTTGHGPRTWIVGHLGNPQVLQAFLE----
|
| 399 |
+
>18084|scaffold527937_2|-1158|01 84 0.380 3.934E-16 2 85 91 186 269 284
|
| 400 |
+
--KVELALYVTLPWPSSLRAQTNLSRVLARVPEGEVRLSVCDLARDPGRAERDNVLFSPTLVKVWPEPKMWILGDLTETEALADLL-----
|
| 401 |
+
>MGYP000592549691 85 0.345 1.520E-16 2 85 91 141 224 234
|
| 402 |
+
--KIELVLYVSAASPASMQAQGNMERVLASFNRDEVAYSVCDLQQNPETADHDRVVFTPTLVKRHPSPRLWIIGDLRDGDIVADLL-----
|
| 403 |
+
>BarGraIncu01122A_1022018.scaffolds.fasta_scaffold128416_1 83 0.290 7.416E-16 2 87 91 153 238 246
|
| 404 |
+
--KVELVLYVSSASPASLQARRNLEQVLSRFAAGQVRWTIRDLGREPLAGEDDRIAFTPTLVKRFPEPRMWVLGNLRDTDILADMLRI---
|
| 405 |
+
>3300006028.a:Ga0070717_10004647_5 86 0.395 8.062E-17 5 85 91 13 93 103
|
| 406 |
+
-----LRLYVAGDAPNSVAALVHLRAALAELPADRVDLEIIDVLQEPERGLRDDVLMTPMLVRHRPAPERRVLGNLGAARALRNVL-----
|
| 407 |
+
>SRR6185436_10889035 74 0.400 1.090E-12 5 76 91 30 104 105
|
| 408 |
+
-----LRLYVTGHTSSADRARAALRDVerrLAQQGEAQVIAEVIDVLDDPESAGRDQVFATPTLIRLTPAPQIRLFGDLS--------------
|
| 409 |
+
>SRR5438132_14018022 99 0.386 1.685E-21 0 87 91 7 94 105
|
| 410 |
+
RESFVVRLYVADREATAVRAIANLEALCREFLPRGCELEIIDILREPQRGLDDSIMVTPTLINLAPQPVRRICGDLGHPGRLRDGLGL---
|
| 411 |
+
>SRR5438552_3472945 79 0.369 1.287E-14 2 85 91 378 461 478
|
| 412 |
+
--KVELALYVTLPWPSSLRARSNPPRVLNGVPEGEVRLDVCDLAREPDRAERDNVLFSPTLVKVWPEPKLWILGDLSEPAVLTDLL-----
|
| 413 |
+
>GraSoiStandDraft_28_1057319.scaffolds.fasta_scaffold3853791_1 78 0.306 3.331E-14 0 87 91 124 211 219
|
| 414 |
+
RHKVELVLYVSSASPASIQARRNLEMLLSRFATNQVQWSVRDLGRDPLAGVEDRITFTPTLVKRFPEPRMWVLGNLRETDLLADMLRL---
|
| 415 |
+
>SRR5688572_14330442 104 0.411 3.763E-23 3 87 91 54 138 144
|
| 416 |
+
---FKLRLYVAGNTPNSAQARTNLRALCRTHLSGRHEIEIVDVTREPNRALTDGIYMTPSLLKLAPSPVRMIVGTLSHPESLMDALGL---
|
| 417 |
+
>UniRef100_A0A512HC53 79 0.382 1.327E-14 5 85 91 3 83 91
|
| 418 |
+
-----LDLYLAGRSRNSMRALHNLKEWLAQYGGEGVELRVIDVLEHPDRALDEGVLVTPTVIRREPDPIRIVVGTLDLPDDVTVLL-----
|
| 419 |
+
>SRR5687768_18073040 77 0.329 8.623E-14 4 85 91 30 110 117
|
| 420 |
+
----QLVLYATAGSTSSSRARRNLEAVLERFDPATYELAICDPSVEPLRAEDDRVVFAPTLVRRGPQPG-WFLGDLSNTAALHDML-----
|
| 421 |
+
>SRR5215510_1611124 79 0.297 2.426E-14 2 85 91 156 239 241
|
| 422 |
+
--RVEFVLYVSASSPASGQARRNLEQLLDRFDAAQVKYAICDLGRDPMAGEHDRVAFTPTLVKRYPPPRMWLIGSLRETETIADIL-----
|
| 423 |
+
>SRR4029453_630509 82 0.329 1.920E-15 0 90 91 63 153 157
|
| 424 |
+
KPSIELVLYVSSDSPHSVAALRNLRRTLAQYAGDAVRLTVCDLSKDPSLAERDGVHFTPSLVTAGRGPRTWIVGHLGNPQVLQAFLESALE
|
| 425 |
+
>SRR5678816_1688620 78 0.325 4.573E-14 2 87 91 85 170 177
|
| 426 |
+
--RVELVLYISPASFPSRAAERELRTILSQYDAGRVSLRIADVSRETSDAARDHVIFTPTLVKRRPEPLVWVVGDLTHVEVVHDLLQL---
|
| 427 |
+
>SRR5262252_1015889 79 0.311 1.287E-14 1 90 91 52 141 145
|
| 428 |
+
-PSIELVLYVSSVSPHSMAALRNLRRTLAQYGGDAVRLTVCDLSKDPSLADRDGVHFTPSLVTTGHGPRTWIVGHLGNPQVLQAFLESALE
|
| 429 |
+
>SRR5687768_1317515 79 0.309 1.287E-14 2 85 91 203 286 295
|
| 430 |
+
--KIELILYVSSESPASLQALRNLDRALARFDASQVKLTVHDLARTPEAGAADRVAFSPTLVKTFPEPRMWVIGNLRDPEVLEDLL-----
|
| 431 |
+
>UniRef100_A0A958FYA2 80 0.382 9.667E-15 2 82 91 1 81 90
|
| 432 |
+
--KISLRLYYTGGSPVSELARVALKALQSRHSSVQFEIEEIDVVLYPDAAEADGILATPTVIKFSPLPIAKIVGDISSLEQVL--------
|
| 433 |
+
>MGYP001204866127 86 0.357 5.872E-17 4 87 91 50 133 142
|
| 434 |
+
----HLRLFVTGGTSLSAAAVARLKELEEKLPADFLSMEIVDVLEDPDSAENNRVLATPTLIRMSPLPMIRVVGDVESVDRLMQLLDL---
|
| 435 |
+
>SRR6185503_18435269 106 0.418 5.624E-24 4 89 91 38 123 130
|
| 436 |
+
----QLRLYVAGDSPRSEQAIRSIRRLDGTRLAGRYDLEVIDVLTQPERAESDHVLATPTLLRLSPGPCRRILGDLGDLDRLITALVPPP-
|
| 437 |
+
>UniRef100_A0A964QAK1 95 0.414 4.135E-20 4 85 91 7 88 99
|
| 438 |
+
----QLRLYVAGDSPRSQLAIRSLRRLDGTPLAGRYDLEVVDVHDQPYRAEVDHVLATPTLLRLSPGPCRRILGDMGDLDLLMRNL-----
|
| 439 |
+
>SRR3954462_9021407 91 0.455 1.797E-18 3 90 91 18 107 115
|
| 440 |
+
---YALRLYVAGGTKHDGAAVRAVERLRERLGakGATIELEVIDVLAAPDRAEQDRILATPTLVRVTPQPARKIVGDLGDVERVSRMLDLGPE
|
| 441 |
+
>MGYP001286529226 86 0.373 5.872E-17 5 87 91 9 91 108
|
| 442 |
+
-----LRLFVTGGSLYSRRALVTIAELTRRMPDLHCEGEVIDLLEQPERAGLERIMATPTLIRLEPEPARRIVGDLRDADSLMSVLEL---
|
| 443 |
+
>SRR5688572_18846285 104 0.436 2.741E-23 4 90 91 42 127 133
|
| 444 |
+
----QLRLYVAGASPRSEQAIKHLQRLDGTELAGRYDLEVVDVFREPDRAEADRVMATPTLLRIAPGPCRRILGDLGDLDALLRALG-PAE
|
| 445 |
+
>SRR5580700_4777658 89 0.452 6.385E-18 3 75 91 46 118 119
|
| 446 |
+
---YQLTLFVSGASELSARAIVDARRLCEMGGPGRYQLVVVDVHDEPDAALANDILATPTLIKHRPLPVRRLVGDL---------------
|
| 447 |
+
>UniRef100_K9VYB0 94 0.421 1.070E-19 3 85 91 53 135 141
|
| 448 |
+
---YIFRLFVSGHNLDTERTLQILHRLLEQSLGHPYTLKVIDIFKHPEQAEANSISATPTLIRISPQPIKRIVGELDDVERVLKLL-----
|
| 449 |
+
>14399|Ga0335069_12962689_1|-154|01 106 0.528 1.060E-23 1 89 91 8 96 109
|
| 450 |
+
-EHLSLKLYVAGHTARSECAVAQARRLSELQFGGRCTLEIVDVVENPEIAEEERILATPTLVKVSPPPVRRIIGDLTRLDDVLAGLGLLP-
|
| 451 |
+
>A0A1Z4NNA9 93 0.518 1.955E-19 3 85 91 166 248 256
|
| 452 |
+
---YVLRLFIAGHTLHTERILQTLHELLEKHLSHPYTLKVVDVLTHPDQAEINQVSATPTLVKVFPPPMRRIIGNLESAERILQML-----
|
| 453 |
+
>UniRef100_A0A1Z4NNA9 95 0.518 5.677E-20 3 85 91 166 248 256
|
| 454 |
+
---YVLRLFIAGHTLHTERILQTLHELLEKHLSHPYTLKVVDVLTHPDQAEINQVSATPTLVKVFPPPMRRIIGNLESAERILQML-----
|
| 455 |
+
>UniRef100_A0A0V7ZRD6 98 0.505 6.176E-21 4 86 91 166 248 256
|
| 456 |
+
----VFRLFIAGHNPATEHILQTLHEILEKYLGHPYTLKVIDVLSHPEQAEANQVTATPTLVKVWPHPIRRIVGDLNNIGKILQNLG----
|
| 457 |
+
>UniRef100_K9TAR4 93 0.493 2.017E-19 3 83 91 162 242 247
|
| 458 |
+
---YVLRLFVAGNDLTTKRTLETLHQVLEQQLQHPYTLKVIDILKHPELAETNQVSATPTLVRVWPRPVRRIVGELEDLQRAIQ-------
|
| 459 |
+
>SRR5262249_2417559 100 0.447 6.515E-22 3 87 91 15 99 103
|
| 460 |
+
---YRFHLYVAGASMQSRRAIMRINEIGRRYLDGSYELQVVDILQNPEKVAEAGVVATPTLIKTSPPPVRYFVGDLSDTKKIVTGLAI---
|
| 461 |
+
>UniRef100_UPI002021B723 99 0.518 2.387E-21 3 85 91 209 291 297
|
| 462 |
+
---YVLRLFVAGHSATTERILQTLHQLLEQYLHHAYTLKVIDVFKHPEQAEADQVSATPTLVKVWPQPIRRLVGELDNLEKLLQIL-----
|
| 463 |
+
>HubBroStandDraft_5_1064220.scaffolds.fasta_scaffold442189_2 95 0.505 5.502E-20 3 85 91 192 274 280
|
| 464 |
+
---YVLRLFVSGSNPNTEHTLVTVHQLLEQSLNHPYTLKVIDVFKHPEQAESDQISATPTLIKIWPKPVRRIVGELNDAEKIRRLL-----
|
| 465 |
+
>S7VCH7 116 0.523 2.810E-27 3 86 91 1 84 90
|
| 466 |
+
---YSLTLFITGNGPASARAEQNLRRICDHAMDGQVRLEIVDVLQSPELAEEEGILATPTLIKRAPPPIRRLIGDLSDEAQVLAGLD----
|
| 467 |
+
>UniRef100_S7VCH7 113 0.523 1.939E-26 3 86 91 1 84 90
|
| 468 |
+
---YSLTLFITGNGPASARAEQNLRRICDHAMDGQVRLEIVDVLQSPELAEEEGILATPTLIKRAPPPIRRLIGDLSDEAQVLAGLD----
|
| 469 |
+
>CoawatStandDraft_6_1074263.scaffolds.fasta_scaffold645439_1 82 0.439 1.398E-15 4 85 91 165 246 250
|
| 470 |
+
----VLRLFVSGHSAMTEQILTTLQGVLESSRYQPYTLQMVDVSKHPEQAEADQVAATPTLVRVSPRPVRRLVGDLDNPRAILSLL-----
|
| 471 |
+
>UniRef100_UPI00045E5B31 63 0.308 8.088E-09 5 84 91 137 217 233
|
| 472 |
+
-----LTLYVVSLNSETRRLVEQITvALAKLYDPGHWVLDVVEVLGMPEKALEKDVFATPMLVRDVPEPVLKLLGDLSRVPSVIAA------
|
| 473 |
+
>UniRef100_A0A969VCT1 102 0.530 2.598E-22 3 85 91 181 263 269
|
| 474 |
+
---YVLRLYVSGSNPSTERTLVTIHQLLEQSLHHPYTLKVIDVFKHPEQAEEDQISATPTLIKIWPKPVRRIVGELNDAEKIMRLL-----
|
| 475 |
+
>UniRef100_A0A517P709 83 0.341 5.573E-16 3 81 91 19 97 153
|
| 476 |
+
---YRFQLFVTGNSLLSRRAREHVERHLVGPLGNRAEVEIVDLIADPIAARRERIVATPTLIRLEPSPVVRLIGDLTDFDRV---------
|
| 477 |
+
>SRR3954454_16723622 97 0.457 8.218E-21 5 87 91 31 112 116
|
| 478 |
+
-----LTLYVVRGTPASERAIATIEQLRAA-LPGTVKIEVIDVADQPEVAETERIVATPMLVRVAPAPVRRIVGDLSDLDRVRWGLGL---
|
| 479 |
+
>SRR5581483_7974966 86 0.344 5.872E-17 1 87 91 35 121 126
|
| 480 |
+
-PPLHLRLFVSGSSTTSLHARAAVDRLQRDGFVVAESVEIVDVLAEPERAAADRVLVTPTLLRVAPAPSRRVLGDLSDLAAVARALGL---
|
| 481 |
+
>SRR4030095_4975553 78 0.320 4.573E-14 5 85 91 40 120 126
|
| 482 |
+
-----LVLYISANSRYASVARCNCQRLLDRFDPRQVRFEVCDIGAHPERAEEDSVCYTPMLVKRHPLPRAYVLGDLSNGEPLVHLL-----
|
| 483 |
+
>UniRef100_D0LT55 65 0.315 1.206E-09 0 87 91 141 232 237
|
| 484 |
+
KERVELCLYLHSGTSASAAAETNLQNALADFDTTRLQLESIDLARSPRAAhPEDKVVFTPTLVRRGPGSRLALVGDLGDRallDSVLLAAGL---
|
| 485 |
+
>22902|Ga0257122_1006421_4|-3359|00 108 0.510 1.154E-24 2 85 91 1 84 99
|
| 486 |
+
--KYYLTLYVTGETPNSQRAIANLEKLSEECDADEFDIQIIDLLKHPDLAAEDEIIAVPTLVKKLPKPMQKIVGDLSNCEEVLLGL-----
|
| 487 |
+
>UniRef100_A0A6L9ZJT6 103 0.530 1.004E-22 3 85 91 168 250 262
|
| 488 |
+
---YVLRLFVSGNSIGTERAMKSLHQILEQSLSHPYTLKVIDVLQHPEQAEADQITATPTLIRVWPLPVRRIVGEFNDVEKILTLL-----
|
| 489 |
+
>A0A0M1JMY3 85 0.421 1.520E-16 3 85 91 182 264 269
|
| 490 |
+
---YVFHLFVSGRSAITQRTMEILHQILEDSLGMTYTLKVIDISRHPEQTEIYQITATPTLVKIWPLPMRKIVGDLENLDKLRQVL-----
|
| 491 |
+
>UniRef100_UPI001E41D742 88 0.421 1.705E-17 3 85 91 178 260 265
|
| 492 |
+
---YVFHLFVSGRSAITQRTMEILHQILEDSLGMTYTLKVIDISRHPEQTEIYQITATPTLVKIWPLPMRKIVGDLENLDKLRQVL-----
|
| 493 |
+
>SRR6185503_14955318 93 0.400 1.955E-19 3 87 91 17 101 116
|
| 494 |
+
---YEFDLFVVGGSEKAKRAEENLRRLGDEVLGGAYELRIIDVLENGEAAEAANIVATPALVRRAPLPVRMIVGDLSEPTWLAHGLGL---
|
| 495 |
+
>SRR6185312_15773949 92 0.407 5.058E-19 2 82 91 38 118 119
|
| 496 |
+
--KYELELFVVGGSVKAQRAEQNLRRLCDALLAGRYELRITDVLDNADAAEEANIVATPALLRRAPLPVRMVVGDLSERDALL--------
|
| 497 |
+
>SRR6476619_1219783 96 0.411 1.549E-20 3 87 91 58 142 149
|
| 498 |
+
---YKLELFVVGGSVKAQRAEQNLRRLCDSALAGRYELRITDVLDNADAAEEANIVATPALVRRAPLPVRMVVGDLSERDALAYGLGL---
|
| 499 |
+
>SRR5690606_38631219 98 0.395 5.986E-21 2 87 91 31 116 122
|
| 500 |
+
--KYQLELFVVGHSSKAQRAEHNLRRLCDARIAGRYELQITDVLENADAAEAANIVATPALVRRAPLPVRMVVGDLSERSALVYGLGL---
|
| 501 |
+
>SRR5688572_23129716 89 0.360 4.650E-18 2 87 91 11 96 109
|
| 502 |
+
--KYDLELFVVGGSARGRHAEDNLRRLCDASIAGHYTLRVTDVLENAEAADAANVIATPAVLRHSPLPRRMIVGDLSRRDALVHGLGL---
|
| 503 |
+
>SRR5688572_20232672 92 0.412 5.058E-19 2 81 91 74 153 154
|
| 504 |
+
--KYELELFVVGHSSMARRAEHNLRRLCDQTIAGRYELRVTDVLENAEAAEAANIIATPTLVRRAPLPVRMVVGDLSRRDAL---------
|
| 505 |
+
>SRR6187401_742178 94 0.400 1.037E-19 3 87 91 21 105 118
|
| 506 |
+
---FQLELFVVGRSAKAQKAEQNLRRLCEAKLAGRYELRVTDVLENADAAEAANIVATPALVRRAPLPVRMVVGDLSERSALAYGLGL---
|
| 507 |
+
>SRR5687768_3218532 105 0.395 1.997E-23 0 90 91 9 99 111
|
| 508 |
+
RPTYQLELFVVGHSAKAQRAEHNLRRLCDEKLAGQYELRITDVLENADRAEAANVVATPALIRRAPLPVRMVVGDLSERDALAYGLGLEAE
|
| 509 |
+
>SRR6187401_2120497 94 0.383 7.553E-20 2 87 91 11 96 111
|
| 510 |
+
--KYELELFVVGHSSKAQKSKHNLRRLCEAMLAGRYELRVTDVLENADAAEAANIVATPALVRRAPLPIRMVVGDLSERKALVFGLGL---
|
| 511 |
+
>SRR5262245_48067131 116 0.445 2.047E-27 5 87 91 16 98 109
|
| 512 |
+
-----LRLYMTGTRSRSVRALENIRKICEEFLPGEYELEVVDLYQQPEKAAQEQIVAAPTLVKYYPLPARRVIGDMSDSDRVLHGLEL---
|
| 513 |
+
>UniRef100_A0A934ZFS7 93 0.383 2.017E-19 2 87 91 3 88 103
|
| 514 |
+
--RYQLELFVVAHSTKAQRAEHNLRRLCDAKLAGQYDLRITDVLEDAAAAEAANIVATPSLVRRAPLPVRVVVGDLSRRESLLYALGL---
|
| 515 |
+
>UniRef100_A0A852ZR84 106 0.390 1.093E-23 2 88 91 35 121 136
|
| 516 |
+
--PYVLTLFVFGPDESSRRAATHLRRLCDELVGGQYRLEVVDVGEDPELAEEFGIFVTPTVVRTQPLPQFRVIGDLSDDARTAAALGFP--
|
| 517 |
+
>ERR1700733_14812660 121 0.482 4.579E-29 3 87 91 67 151 163
|
| 518 |
+
---YRLRLIVAGRTTRSQRAIENLRRICDEHLGGQVDLEVIDIYQQPELAEKYQVIAAPTLIKLLPLPIRRVIGDLSEKERVLRGLEI---
|
| 519 |
+
>MGYP001377341570 91 0.400 1.309E-18 3 87 91 3 87 96
|
| 520 |
+
---YQFKLFVTGETVHAAAARATVDRLCDALQIEAHAVRIIDVLEEPDLAAADRIIATPTLIRTSPQPERRVIGDLSDLDTLLKTMRL---
|
| 521 |
+
>SRR5690242_13802588 108 0.441 1.154E-24 5 90 91 20 105 118
|
| 522 |
+
-----LELYVMGDSPKSRAALDNLRRICERRLAGRYDLQVIDVIEQPDAAEAANVVATPALIRRGPAPVVRVVGDLSDRTALVHALGLDAE
|
| 523 |
+
>UniRef100_UPI001FB8EED0 112 0.404 5.013E-26 3 86 91 13 96 110
|
| 524 |
+
---YQLRLFIAGTSPRSQRTIENLRRICREHLADRHSLVIVDIYQQPELAEAAQVVAAPTLLKLTPEPLRRIVGDLSDEARVLRGLG----
|
| 525 |
+
>UniRef100_A0A521U920 100 0.441 6.722E-22 2 87 91 3 88 109
|
| 526 |
+
--RYELELFVVGHSAKAERAESNLRRLCEARLAGRYDLRVTDVLEDAEAAEEANIIATPTLLRRAPLPVRMVVGDLSHRGALLRGLGL---
|
| 527 |
+
>UniRef100_A0A933USY1 86 0.416 6.058E-17 4 87 91 8 90 97
|
| 528 |
+
----VLRLYIAGHSPNSVVALANLDWLRRAHFQDA-TVDIVDILREPERAMADRVITAPALFKVAPSPPRMLLGNLSDATKVLQGLGL---
|
| 529 |
+
>SRR5471030_2470326 96 0.480 1.549E-20 3 79 91 115 191 193
|
| 530 |
+
---WQLRLYVVDQTVKAVTAYTNLKKIYESRLKGRYRITVIDLLKHPQLAKGDQILAIPTVVRKLPVPIRTIIGNLSDTD-----------
|
| 531 |
+
>UniRef100_UPI001F066344 109 0.411 6.318E-25 3 87 91 14 98 109
|
| 532 |
+
---FHLRLYVAGQTPRSALAQANLYALCEARLPGRHHIEIVDLMDEPARARTDGVIAVPTLIRVSPTPVRRVVGDLSDTARLLAGLEL---
|
| 533 |
+
>14341|Ga0209698_10565634_1|+23|00 112 0.348 6.670E-26 1 89 91 33 121 134
|
| 534 |
+
-PRYEFRLYIAGTNLNSVRAIENVRRLRKSLRPSRCKLEIIDLYQQPALAKRDQVVAAPALVKLYPLPRRTFVGDLSDSARVVAGLGIIT-
|
| 535 |
+
>UniRef100_UPI001BEB70D6 108 0.465 1.191E-24 2 87 91 10 95 99
|
| 536 |
+
--RYRLRLVIAGNSERSRRAIENLQHLCAEHLSGQVDLEVVDIYQRPELAEEYQVIAAPTLVKLLPLPVRRIIGDLSQEDRVLHGLEI---
|
| 537 |
+
>SRR5215218_6711373 87 0.404 2.269E-17 4 87 91 12 95 102
|
| 538 |
+
----VLRLYIAGNSSSSRRAEQNVMRLRDHMTADAWKIEIIDVLATPELAEQASILATPTLSYDNAGRPRRIVGDLSDTKRILDYLGI---
|
| 539 |
+
>SRR6188768_3491203 81 0.366 2.636E-15 2 72 91 41 111 112
|
| 540 |
+
--KYELELFVVGRSSKAQKAEHNLRRLCEARLAGRYELRITDVLENADAAEAANIVATPALVRRAPLPVRMVV------------------
|
| 541 |
+
>SRR6187402_177262 96 0.470 2.126E-20 4 88 91 0 84 99
|
| 542 |
+
----ELTLFVAGDTAKSALAATKLRHICESLARGNYTLAIVDVLKDSAAAEREKILVTPTLIKRSPPPTRRLLGDLTATAKVVETLGLP--
|
| 543 |
+
>SRR6059058_1539947 101 0.404 4.746E-22 3 86 91 13 96 103
|
| 544 |
+
---FHFRLYVAGDTPNSERARVNLGALCRKHLVGRYKIQIVDVFKDPNRAMIEGIFMTPTLIKVAPSPIRMVVGTLSQSAALMEALG----
|
| 545 |
+
>SRR3984957_851882 121 0.476 3.336E-29 2 87 91 14 99 105
|
| 546 |
+
--RWLLRLYVAGQSPKSLQAFANLMRIRDEHLGSEYEIEIVDLLENPQLAEGDEIVAIPTLVRRLPHPMRKIIGDLSDTDRVLVGLQL---
|
| 547 |
+
>SRR3954470_18350552 106 0.447 5.624E-24 3 87 91 19 103 112
|
| 548 |
+
---YQFRLYVAGDTPNSERARVNLGALCRKYLVGRYKIQIVDVFKDPDRAMVEGILMTPTLIKLAPSPVRMVVGTLSPSESLMDALGL---
|
| 549 |
+
>MGYP001434196702 99 0.388 2.314E-21 3 87 91 34 118 123
|
| 550 |
+
---YKFRLFVADDTLNSAQASVNLAALCRAHLPGRHEIEIVDVLLEPKRALAEGVFLTPTLIKFSPLPVRRIVGTLSEPLTVLRALGL---
|
| 551 |
+
>UniRef100_A0A4V1DI91 40 0.289 5.717E-01 0 74 91 10 85 373
|
| 552 |
+
QRRYLKLLLVAAPHHRATPDLRGLVAFLEnQDFGFDVSLEIADPAERPELLELHRLVATPALIKLDPTPKQVFAGN----------------
|
| 553 |
+
>MGYP000867160857 47 0.297 3.483E-03 2 74 91 29 102 379
|
| 554 |
+
--RYLKLLLVAAPHHRANPDLRGLVAFLENQDFGfDVTLEIADPAERPELLELHRLVATPALIKLEPTPKQVFAGN----------------
|
| 555 |
+
>UniRef100_UPI002001187B 44 0.311 4.536E-02 0 75 91 9 85 372
|
| 556 |
+
RRPHLKLLLVAGTRHRASADVRSLVAFLEKEDFGfEVSLELADPAQRPELLELHRLVATPALIKLEPAPKQVFAGNM---------------
|
| 557 |
+
>MGYP001295239604 51 0.328 1.461E-04 7 75 91 50 119 406
|
| 558 |
+
-------LLVAGTRHRASADVRSLVAFLEKEDFGfEVSLELADPAQRPELLELHRLVATPALIKLEPAPKQVFAGNM---------------
|
| 559 |
+
>MGYP000025237920 48 0.358 9.797E-04 7 81 91 40 117 121
|
| 560 |
+
-------LLVASPHHRATPDLRGLMAFLEHEDFGfDVQLDVVDPALRPELLELHRLVATPALIKLEPSPRQVFaeIGRASCRERV---------
|
| 561 |
+
>UniRef100_A0A560LTW2 39 0.277 2.028E+00 5 74 91 17 88 381
|
| 562 |
+
-----LTLLIVATSQHlSSPGLRGVLQFLESHDYGfELNLQIADPAKRPELLELYRLVATPAVVKLHPAPRQVFAGN----------------
|
| 563 |
+
>UniRef100_A0A076H9F6 41 0.329 4.166E-01 7 84 91 14 90 383
|
| 564 |
+
-------LLVAARHHLSGQDLRSLVQYLEREDVGfEVTLQLADPSQQPELLELHRLVVTPALIKLSPSPKQVFAG--SNIHQQLKG------
|
| 565 |
+
>MGYP000311810024 47 0.306 2.536E-03 0 73 91 19 92 206
|
| 566 |
+
RQPLKL-LLVAARHHLSGQDLRGLVQFLErEDLGFEVTLQVADPSQQPELLELHRLVVTPALIKLAPNPKQVFAG-----------------
|
| 567 |
+
>UniRef100_A0A968YHU9 46 0.295 4.933E-03 4 73 91 22 91 390
|
| 568 |
+
----QLLLFVDKRAT-AKEQIQKISQYLETlEPQCDFELHVVEVAEQPYLVEHYKLVATPALVKIRPEPRHILAG-----------------
|
| 569 |
+
>UniRef100_UPI000B35C714 42 0.269 1.611E-01 1 74 91 13 88 385
|
| 570 |
+
-ERQVLHLLLV--ATRQQLAGQDLRTLLqllrREDLGFEVSLEVADPRRQPELLELHRLLATPALVKLAPAPKQVFAGN----------------
|
| 571 |
+
>MGYP001088475876 43 0.310 8.285E-02 1 73 91 12 84 189
|
| 572 |
+
-KKLELIL-VAGRKHLSRKDISEMLKFLEsKECNFEVSIQLSDPTKQPELLELHRLVAIPALIKIFPEPKQIFAG-----------------
|
| 573 |
+
>UniRef100_A0A7Y3TRI1 56 0.281 2.441E-06 4 73 91 15 84 443
|
| 574 |
+
----QLLLFIDKR-PSSREQVQQVRMALKELReECDFELQIVDVSEQPYLAEYFRLIATPALVKLHPEPRQILAG-----------------
|
| 575 |
+
>UniRef100_A0A352XFA5 54 0.309 8.682E-06 4 73 91 15 84 396
|
| 576 |
+
----QLLLFIDER-PTSRKHIHRIRSYLETLRADyPFELMLISVGEHPYLAEHFRLVATPALIKIHPPPRQTLAG-----------------
|
| 577 |
+
>UniRef100_A0A5J6Q9E6 42 0.306 2.211E-01 0 73 91 3 76 394
|
| 578 |
+
RPELRLLL-VASKAHAASQDVRSMMALLEQDDCGfQVTLKLADPRQQPELLELHRLVATPALVKLLPLPRQTFVG-----------------
|
| 579 |
+
>UniRef100_A0A139WUD4 60 0.328 7.451E-08 2 73 91 13 84 395
|
| 580 |
+
--PLQLLLFVDGR-PKSRQQVQRILSYLEELQADcKFELQIVDVGQKPYLAEHFKLVATPALIKIHPEPRQILAG-----------------
|
| 581 |
+
>UniRef100_A0A926Y835 54 0.352 8.682E-06 4 73 91 15 84 391
|
| 582 |
+
----QLLLFV-DDRPSSRKQLQQIYSYLEQIKADNsFELQVVEVGEEPYLAEHFKIVATPALIKIHPAPRQALAG-----------------
|
| 583 |
+
>UniRef100_UPI0020A7ECEA 56 0.295 2.441E-06 4 73 91 15 84 397
|
| 584 |
+
----QLLLFIDER-PSSRKHIHRIRSYLETLRADyPFELMVVSVGEHPYLAEHFRLVATPALIKIHPLPRQTLAG-----------------
|
| 585 |
+
>UniRef100_A0A068MZV1 50 0.260 2.071E-04 2 73 91 13 84 383
|
| 586 |
+
--PLQFLLFI-DDRPSSQDSVQEISQCLGTLVDGHsYDLQILQISKHPHLVEHFRLVATPSLIKLQPEPRQVLAG-----------------
|
| 587 |
+
>UniRef100_UPI0016873C22 61 0.352 3.951E-08 4 73 91 16 85 382
|
| 588 |
+
----QLLLFV-DQRPSSQEHIRQVRQFLEELNaQDEFELQIIDVGEQPYLAEHFKLIATPTLIKIHPEPRQVLAG-----------------
|
| 589 |
+
>UniRef100_A0A0M1JTJ0 57 0.347 9.426E-07 4 74 91 15 85 394
|
| 590 |
+
----QLLLFVDSR-PHSAEQIQEIRNYLKQWRTEfPYNLEIINVVEEPYLAEHYKLIATPTLLKLYPEPRQVLTGN----------------
|
| 591 |
+
>UniRef100_A0A261KTE7 53 0.309 3.088E-05 4 73 91 30 99 405
|
| 592 |
+
----QLLLFINKR-PGSQEQIQAIRKSLSKLKTDyPFEFNVIDVGEQPYMAEHFKLIATPALLKIHPEPRQTLTG-----------------
|
| 593 |
+
>UniRef100_A0A3C0N9T6 52 0.323 4.240E-05 4 73 91 15 84 398
|
| 594 |
+
----QLLLFVDER-LSSRKHIQRIRNYLKTLRIDyPFELMVVDVGEQPYLAEHFKLVATPALIKIHPKPRQILAG-----------------
|
| 595 |
+
>UniRef100_A0A6J4I559 51 0.323 1.098E-04 4 73 91 15 84 398
|
| 596 |
+
----QLLLFVDER-LSSQKDLQQISSYLETLRAEyPFELMIVDVGEHPYLAEHFKLVATPSLIKIHPKPRQILAG-----------------
|
| 597 |
+
>UniRef100_A0A6P0QL46 59 0.352 1.930E-07 4 73 91 15 84 397
|
| 598 |
+
----QLLLFVDER-PSSRKHIQRIRSYLETLKADyPFELTVVDVGEQPYLAEHFKLVATPALIKIHPNPRQTIAG-----------------
|
| 599 |
+
>UniRef100_A0A846D500 52 0.309 4.240E-05 4 73 91 21 90 392
|
| 600 |
+
----QLLLFVDER-PSSQENIQQIHSYLESLKADyPFELQVIEIAEQPHLVEHFRLLATPALVKIFPAPRQTLAG-----------------
|
| 601 |
+
>UniRef100_A0A350Y740 56 0.309 2.441E-06 4 73 91 15 84 386
|
| 602 |
+
----QLLLF-TDERPSSRKHIHRVRSYLETLRANyPFELRIVDVGEQPYLAEHFKVVATPALIKIHPLPRQTLAG-----------------
|
| 603 |
+
>UniRef100_A0YTH4 56 0.380 1.778E-06 4 73 91 15 84 394
|
| 604 |
+
----QLLLF-ADQRPSSKEQIGEIRQFLEKLNcEEAYELQVIDVGQQPYLAEYFKLVATPALVKIFPEPRHILAG-----------------
|
| 605 |
+
>UniRef100_L8M9H5 47 0.295 2.617E-03 4 73 91 33 102 405
|
| 606 |
+
----QLLLFVDKR-PGYRKKIQRVQAYLDDLkLEQDFQLEVIEIDKQPHLVEYFKLVATPALVKISPQPRQVLAG-----------------
|
| 607 |
+
>UniRef100_UPI001D02CB6E 63 0.315 5.889E-09 2 73 91 13 84 395
|
| 608 |
+
--PLHLLLFVDGR-PKSRQQVQRIRAYLKELQAEySFELEIIDVGQQPYLAEHFRLVATPALIKIHPEPRQILAG-----------------
|
| 609 |
+
>UniRef100_A0A6P0XB78 55 0.338 4.604E-06 4 73 91 15 84 390
|
| 610 |
+
----QLLLF-TDERLSSRKTIQQIRHYLESlRMEYPFELKVVDVGKQPDLAEHFKLVATPALIKIHPQPRQTLAG-----------------
|
| 611 |
+
>UniRef100_A0A6M0G3C9 53 0.338 3.088E-05 4 73 91 15 84 390
|
| 612 |
+
----QLLLF-TDERLSSRKNIQKIRHYLESLkMEYPFELKVVDVGKQPDLAEHFKLVATPALIKIHPQPRQVLTG-----------------
|
| 613 |
+
>UniRef100_A0A937N7H3 72 0.290 3.998E-12 2 87 91 2 87 396
|
| 614 |
+
--KFTLTLYIVGKGTDWASVVERLEAICKTDLAGHYGIESVDVANGQDLSGDMRILAPDAVMLWLPAPLQAPMNDLVNAKPGLVGLDL---
|
| 615 |
+
>23040|scaffold_1553752_c1_1|+3|10 100 0.448 1.228E-21 1 87 91 18 104 123
|
| 616 |
+
-PRFKFSVYIAGQTRQSELALARLRKICDEEIPANYEIEIIDLAKNPHLAKEHQILATPSIFRTLPAPVRKSIGNLSKADKTLLGLDL---
|
| 617 |
+
>ERR1700732_1595525 94 0.402 1.037E-19 0 86 91 15 101 108
|
| 618 |
+
KPRFKFRVYIARPTRKSDLALARLRALCAEAFPDDYDIEMIDLAKSPHLAKEHQILATPAVFRTLPAPVRKSIGDLSKTDKSRLGLD----
|
| 619 |
+
>SRR5882762_5740304 94 0.397 1.424E-19 0 87 91 36 123 136
|
| 620 |
+
KAQFKFQVYIARPTRQSDLALARLRAICDEAIPDDYDIEIIDLAKRPGLAGKFQIVATPTILRTLPAPIRKSIGDLSKTDKALLGLDL---
|
| 621 |
+
>SRR4030081_2822547 95 0.420 4.008E-20 1 88 91 31 118 121
|
| 622 |
+
-PRYKFSVYIASPTRASESALARLRKICEEQIPNEYEIEVFHLSKNPQLARDHNIIATPAIFRTLPAPVLKSIGDLSRTDQALLGLDLL--
|
| 623 |
+
>SRR6185295_12007002 113 0.471 1.879E-26 1 89 91 177 264 270
|
| 624 |
+
-PGWVLRLYVAGMNRTSARAVERVHAICDEYLAGRYELEVIDIYQLPALARGHQIVATPTLIRLLPAPLRRYIGDLSN-ENLVFGLDLKP-
|
| 625 |
+
>SRR5215831_4900645 114 0.558 9.973E-27 1 86 91 71 156 163
|
| 626 |
+
-PGLVLRLFICGTSPRAASAVKNLRFICESELHGAYSLEIIDVLEQPDLAEEAKVLATPTLIKLLPLPLRRIIGDLSDKEKLLIGLE----
|
| 627 |
+
>SRR5712672_2296523 88 0.409 1.652E-17 0 87 91 11 98 106
|
| 628 |
+
KTRFKFSLCIARVTEKSKAALARLRAICDETIPKKYDIKVIDLSKNPELARDHNIIATPAVFRTLPTPVRRSIADLSRNDRALLGLNL---
|
| 629 |
+
>12123|Ga0209625_1018534_1|-121|00 95 0.415 5.502E-20 1 89 91 15 103 115
|
| 630 |
+
-PNFRFQLYIGGPTRASDEVLGRLQAICDEAIPNDYAIEIIDLSKNPQLAKDHQIIATPSVFRTLPEPMRKSIGDLSLKQRTIIGLDLLT-
|
| 631 |
+
>SRR4051812_48803690 106 0.404 5.624E-24 1 84 91 45 128 135
|
| 632 |
+
-EHYSLALYITGSSPRSALAISAIRKICDTHLLGCYSLEIIDLTQQPLRARSEQIVATPTLIRRLPFPIRRFIGDMSLVERQLLG------
|
| 633 |
+
>UniRef100_A0A932SCN4 121 0.579 6.484E-29 0 87 91 15 102 111
|
| 634 |
+
QKVWTLRLYVAGQTPKSVTALSNLERICEAHLEGKYRIEVVDLLKSPQLARGDQIIATPTLVRRLPPPVKKIIGDLSNADRVLVGLDL---
|
| 635 |
+
>SRR5579871_1513814 128 0.540 2.104E-31 1 87 91 59 145 153
|
| 636 |
+
-ETWNLRLYVAGQSPKSLTAFSNLKRICETYLPGKYHIEVLDLLKNPQLAEGDQVVAIPTLVRRLPEPLRKIIGDLSNTERTLVGLDL---
|
| 637 |
+
>UniRef100_A0A517YME2 122 0.505 2.508E-29 1 87 91 17 103 120
|
| 638 |
+
-ETWELRLYIAGQTPKSVAAFRNLKKLCEEHLPGRYQIEVIDLMQHPQLAAGDQIVAIPTLVRRLPEPLRRIVGDLSNTERTLVGLQL---
|
| 639 |
+
>ERR1035437_1623514 104 0.453 2.741E-23 2 87 91 28 113 122
|
| 640 |
+
--KFVLRLFVAGATPRSRHAVRRVRELCETELKGNCELEVIDIYQQPGLARENQIVATPTLIIAFPPPLRRFIGNRTNITGLFVELDL---
|
| 641 |
+
>SRR4030081_12347 93 0.397 3.684E-19 0 87 91 29 116 123
|
| 642 |
+
KTRFKFAVYIARPSAESDAALARLRKICDETIPKNYDIRVIDLSKNPELARDHQIVATPAVFRTLPTTVRRTIGDLSNNDRALLGLNL---
|
| 643 |
+
>UniRef100_A0A062V6S3 115 0.505 7.496E-27 1 87 91 18 104 111
|
| 644 |
+
-EVWELRLYIAGQTARSDAALANLKRICEEHLAGKYRIEVIDLLKNPQIARDHQILATPTVIRKLPEPLKKTIGDLSQTERVLVGLDL---
|
| 645 |
+
>ERR1700676_5527870 97 0.393 1.128E-20 1 89 91 58 146 156
|
| 646 |
+
-PHFKFQLYIGRSTRASDAAITRLQAICDETIPDDYAIEIIDLSKNPQLAKDHQIIATPSVFRTLPEPIRKSIGDLSLKHKAIVGLDLPT-
|
| 647 |
+
>15488|Ga0208981_1210027_1|+2|10 90 0.363 3.387E-18 0 87 91 24 111 119
|
| 648 |
+
KTRFKFAVYIARPSAESDAALARLRKICDETIPKNYDIRVIDLSKNPELARDHQIVATPAVFRTLPTPVRKSVGELSSKDRTLLGLNL---
|
| 649 |
+
>SRR5258708_39684100 96 0.400 1.549E-20 0 89 91 38 127 150
|
| 650 |
+
KAHFEFCVYIANHTLRSDLALKRLKKICEENVPGDYEIEVVDIAKPPDIAKDRQIVATPAVFRTLPAPFRRLIGDLPHEERSLLGLDLFT-
|
| 651 |
+
>MGYP000303408308 113 0.574 3.540E-26 1 87 91 1 87 102
|
| 652 |
+
-KKFELKLYVTGQTARTETAMGNLKDLFDKELAEQYDLEVIDVLERPQLAEDERILATPTLIRKLPVPIRRIVGDLSNREQVLLGLDL---
|
| 653 |
+
>SRR5258708_26871038 87 0.363 3.115E-17 0 87 91 32 119 129
|
| 654 |
+
KHRFKFQVFIGKPSQKSDLAVARLREVCEAEIPGEYDIEIIDLSRTPELAGENNIVATPAVFRTLPAPVRKSIGDLVEKHKVLLALDL---
|
| 655 |
+
>ERR1044072_7554525 109 0.529 8.406E-25 2 86 91 123 207 214
|
| 656 |
+
--RYILKLYVTGRTSRAERAIANLRRLCEDELEGCYQLEGIDIVEHPQLAEDERVRATPTLVKQLPPPLRRGVGDLSSRAKGLFGLD----
|
| 657 |
+
>18065|scaffold45210_3|+2239|01 117 0.465 1.491E-27 0 87 91 45 132 140
|
| 658 |
+
RPYWNLRLYVAGSSPRSLAAVTNLTKVWEEHLPGPYSIEVVDLLEHPNLARADQILATPARVGALPSPIRRVIGDLSSRDRVLVGLEI---
|
| 659 |
+
>SRR5262249_21381219 117 0.534 1.491E-27 1 86 91 16 101 117
|
| 660 |
+
-EPFVLKLFICGASPRANSAVANLRHICEHDLQGHFTLEIIDVLEQPDLAEESKVLATPTLIKLLPPPLRRIIGDLSDKQKLLVGLD----
|
| 661 |
+
>SRR4051812_10739495 120 0.593 1.184E-28 2 87 91 18 102 110
|
| 662 |
+
--KFVLKLYIAGSSPRSQRAIANLHRICAEELPGS-EVDVIDVLQQPHLAEGARIMATPTLIKELPPPVRRIIGDLSDAEQVLLGLDL---
|
| 663 |
+
>SRR6266404_266030 86 0.417 8.062E-17 9 87 91 0 78 113
|
| 664 |
+
---------IASPTRESQLALTRLRKICDEQIPNEYEIEVFDLRKHPELATRYDIVATPAICRTLPAPLRKSVGDLSKTEKALLGLDL---
|
| 665 |
+
>SRR5260370_32467315 102 0.422 1.834E-22 0 89 91 34 123 146
|
| 666 |
+
KPHYKFSVYIANHTLRSNSALERLKKICEENVPGDYEIEVIDIAKSPGLATDHQIVATPAVFRTLPAPLRKSIGDLSQKDKALLGLDLFT-
|
| 667 |
+
>UniRef100_V4JFK2 110 0.529 3.353E-25 3 87 91 3 87 94
|
| 668 |
+
---YLLRLYIVGSTLQSERAIRNLRSICNKALHNRYRLEIIDVIEHPEAAQDAHIIATPTLIKELPPPLMRIIGDMSNQEKVLVGLDL---
|
| 669 |
+
>SRR5258708_27969647 100 0.448 1.228E-21 3 89 91 14 100 104
|
| 670 |
+
---FKLRVYIGGEALESDRAVARLRKICDEAAPNDYEIEVVDLSKNPQLASRYQIVATPTVIRTLPSPVRKTIGHMSKREKVLLGLDLVP-
|
| 671 |
+
>UniRef100_UPI001565DCEC 109 0.482 8.673E-25 1 87 91 15 101 115
|
| 672 |
+
-EVWELRLYVAGQTARSMTAFANLKRIAEQHLRGRYRIEVIDLKADPQRADEDGILAMPTVVCKLPPPLRKVVGDLSDTEKALVGLKL---
|
| 673 |
+
>SRR5262245_2800084 119 0.541 1.625E-28 3 87 91 29 113 122
|
| 674 |
+
---YVLRLYVAGMTARSMDAISRLKAICEEHLGEHYKLETIDIHQQPGLARDQQIVAAPTLIKELPPPVRRLVGDLTNRERVLVGLDL---
|
| 675 |
+
>ERR1039458_2802747 93 0.413 3.684E-19 1 87 91 15 101 123
|
| 676 |
+
-ESVELCLFVAGDAGPSARARRELEGLLVELGGGAWSIEVVDVLVRPDLAERARIVATPVLIRLAPLPRRSIIGDLSDWQVVAEVLEL---
|
| 677 |
+
>ERR1700677_35954 83 0.416 5.401E-16 4 87 91 12 95 102
|
| 678 |
+
----ELCLFVAGNTGPSARARRELEWLRVELEEGGWSIEGIDVTERPDLAERARILATPVLTRLAPLPRLSMIGDLSDWKVVAEVLEL---
|
| 679 |
+
>SRR5450755_1004340 90 0.448 2.467E-18 4 90 91 126 212 214
|
| 680 |
+
----ELCLFVAGEAGPSVRARRELDRLRMGLEGGGGRVDVIDVMERPDLAEQAGILATPVLIRLAPLPRRSIIGDLSDWEVVADVLELALE
|
| 681 |
+
>MGYP000397992808 97 0.447 1.128E-20 3 87 91 3 87 95
|
| 682 |
+
---FSFQLFVAGDTPRSHLAASNLRDLLDRVAPDDYDLEVIDVLERPDLAEKERILATPFVLKISPPPTRRVVGDLTDLALAARALDL---
|
| 683 |
+
>SRR5437763_3353870 89 0.418 8.766E-18 2 87 91 37 122 129
|
| 684 |
+
--PVELCLFVVGESGPSVRARRELEAFRVARGGDGWRVVVIDVLERPDVAERERILATPVLIRMAPLPRRGIIGDLSDWEAVAEVLEL---
|
| 685 |
+
>SRR5450755_2832987 95 0.372 5.502E-20 2 87 91 20 105 112
|
| 686 |
+
--RVSLRMYVASDTAPSADARRQLAALCERLGGERWEVEVVDVFERPALAEADRIVATPVLIRLFPAPRLSVIGDFSDLDAVAAALDL---
|
| 687 |
+
>MGYP000274140882 87 0.419 2.269E-17 7 87 91 19 98 105
|
| 688 |
+
-------LYVAGRSERSALAEQNLRAV-TQRLHGPVQIEVIDLTRRPDLAEELDIVATPMVLRVLPEPPRRVVGDLSDQALLAQALDL---
|
| 689 |
+
>ERR1039458_272673 74 0.400 1.090E-12 14 88 91 24 97 113
|
| 690 |
+
--------------PRSLTA-PQLERLRPELEGGGLGVEVVDVMQRPDLAERARILATPVLMRLAPLPRRSIIGDLSDWRLVTEVLELP--
|
| 691 |
+
>ERR1051325_10531525 81 0.305 4.970E-15 1 85 91 28 112 122
|
| 692 |
+
-PRIELVLYVTAASSHSAAATRNCEALLSRFDRRSVVFEICDISLHPERAEVDGICFTPVLMKRMPLPRAYVIGDLSNTAALVDLL-----
|
| 693 |
+
>SRR5688500_6194403 81 0.329 3.619E-15 2 86 91 52 134 135
|
| 694 |
+
--RLSLRLFVAGDSPDSETAIANLEALFPN--GSEAEIEIVDIQREPARAARESIMLTPTLLKLAPSPACRILGNLKNRDALLELLG----
|
| 695 |
+
>SRR5438477_4004941 90 0.360 2.467E-18 0 85 91 467 552 558
|
| 696 |
+
RERVALRLYVSPASPPSVKARRNMEKLLERIGPVNVDFEVLDLALEPLRAETDNVVFTPTLVKHWPEPRVWILGDLSDPVVVGDLL-----
|
| 697 |
+
>SRR3954468_23038545 80 0.409 9.369E-15 5 87 91 32 114 129
|
| 698 |
+
-----LRLYVAGEGPNSARARANLQRLLADVDSSRYVLEVVDCLDEPLRALSDGVPPPQTLMPVTPPPQRTIVGSLSAMDHVADALEI---
|
| 699 |
+
>SRR5271165_1490500 102 0.383 1.336E-22 2 87 91 3 88 112
|
| 700 |
+
--KFKFRIYVAGDALNSAQALANLDAICREYLPDRHEIEVVDVFREPKRALTDGVFMTPTLVKVAPFPTRRIVGTLSQTRLVLQAVGL---
|
| 701 |
+
>UniRef100_A0A142HLX2 109 0.482 4.603E-25 3 87 91 14 98 109
|
| 702 |
+
---YELVLYVAGATPNSTRAVRNIKAICEEYLPGRYALRILDIYQQPELAQQAQLVALPTLVRLRPLPQRRLVGDLSNRPVVLSVLGL---
|
| 703 |
+
>A0A142HLX2 117 0.482 1.087E-27 3 87 91 14 98 109
|
| 704 |
+
---YELVLYVAGATPNSTRAVRNIKAICEEYLPGRYALRILDIYQQPELAQQAQLVALPTLVRLRPLPQRRLVGDLSNRPVVLSVLGL---
|
| 705 |
+
>SRR5512142_3015464 83 0.397 7.416E-16 0 87 91 23 110 118
|
| 706 |
+
RSEIRLCLYVAGNAPNSVAARANLSAALAALDNVSAAVEIVDVFERPDLAVQNEVYVTPMLLRLAPPPKCRIVGSLSDRDAIVNILDI---
|
| 707 |
+
>SRR5215208_4287972 81 0.285 3.619E-15 2 85 91 19 102 112
|
| 708 |
+
--PIELVLYISAASAHTAAARRNCEALLARFDQRRVHFEICDVSQHPDRADTDGICFTPVLMKKLPLPRTYVVGDLSNTTALVDLL-----
|
| 709 |
+
>SRR3954469_15268427 97 0.441 1.128E-20 2 87 91 18 103 115
|
| 710 |
+
--RLVLRLYVAGDAPNSALARANLKRLLDSLDRDQYALEIIDCLDEPLRALNDGVLVTPTLLRLSPEPGRTIVGSLSAIDHVADALDL---
|
| 711 |
+
>SRR5450432_1605686 81 0.348 2.636E-15 2 87 91 153 238 249
|
| 712 |
+
--KIEITLYLTLPWPSSARAQANLQRVLARVPDGHVRLNTCDLAQEPGRAEKDNVLFSPTLVKVWPAPKMWILGDLSEAGVLTDLLEL---
|
| 713 |
+
>SRR5688572_14714411 99 0.418 2.314E-21 2 87 91 50 135 150
|
| 714 |
+
--PVLLRLYVAGDAPNSSRARANLRRLLADVDPAKYDLEIIDCLDEPLRALNDGVLVTPTLVRVQPEPQRTVVGTLSALDHVADALDI---
|
| 715 |
+
>SRR5687768_10781478 82 0.305 1.398E-15 2 86 91 172 256 262
|
| 716 |
+
--PIELVLYISAHSPRSAAAIENIKRVLARFSSSRVSLTICDLSLEPHKGEADSVAFTPTLVKRSPGPRTYILGHLANPEVLVELLD----
|
| 717 |
+
>SRR5438105_8210914 83 0.302 7.416E-16 0 85 91 54 139 149
|
| 718 |
+
KTRIELVLYVSAASSHTATARRNCEALLARFDQRRVRFEVCDISRHPDRAESDGICFTPVLMKKRPLPRAYVIGDLSNTAALMDLL-----
|
| 719 |
+
>UniRef100_A0A2V8GSN4 79 0.302 2.503E-14 0 85 91 17 102 112
|
| 720 |
+
KTRIELVLYVSAASSHTATARRNCEALLARFDQRRVRFEVCDISRHPDRAESDGICFTPVLMKKRPLPRAYVIGDLSNTAALMDLL-----
|
| 721 |
+
>12689|scaffold1791547_1|-2|11 80 0.317 9.369E-15 1 85 91 85 169 187
|
| 722 |
+
-PSVELVLYVSTASSYAASATRNCEALLARFDRRAVRLEICDVSEHPDRAETDGICFTPVLLKKQPLPRTYILGDLSNTAALVDLL-----
|
| 723 |
+
>UniRef100_A0A948CES1 102 0.388 1.379E-22 3 87 91 56 140 151
|
| 724 |
+
---YRFQLFVSGSSPRSTLARANLTKVCDETVPGNYTIEVVDVLLRPDLAEESSILATPLVVRVSPHPPRRAVGDFTDLERLAAAMGL---
|
| 725 |
+
>SRR6185295_9098939 82 0.395 1.920E-15 2 87 91 339 424 440
|
| 726 |
+
--RIELALYVTMPWPSSLRAKANLGRVLARVPEGLVHLNVCDLAREPLRAELDNVLFSPTLVKVWPAPKMWILGDLSEPEVLTDLLAL---
|
| 727 |
+
>SRR4051812_7426335 87 0.392 3.115E-17 5 88 91 15 98 101
|
| 728 |
+
-----LRLYVAGNAPNSTKARRNLDALLASFEPSSYQLEVIDCLSEAGRTLADGVIVTPTLVKFEPAPAATVIGTLSDADAVRAILRGP--
|
examples/data/KaiB_seq_ids.txt
ADDED
|
@@ -0,0 +1,364 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
101
|
| 2 |
+
MGYP000886600007
|
| 3 |
+
UniRef100_A0A971TK21
|
| 4 |
+
ERR1044071_8151622
|
| 5 |
+
UniRef100_A0A1F2SDQ0
|
| 6 |
+
SRR5580692_7392111
|
| 7 |
+
K9S6Z6
|
| 8 |
+
SRR3954470_14000739
|
| 9 |
+
UniRef100_UPI0018DCCDB9
|
| 10 |
+
SRR5688500_3073545
|
| 11 |
+
MGYP000149628109
|
| 12 |
+
3740|scaffold08918_4|-3862|00
|
| 13 |
+
SRR6187401_1816177
|
| 14 |
+
SRR5512147_2964451
|
| 15 |
+
26123|scaffold_438712_c1_1|+1|10
|
| 16 |
+
12684|Ga0207652_11722284_1|+1|10
|
| 17 |
+
SRR5580704_16544830
|
| 18 |
+
UniRef100_A0A3N5XMK1
|
| 19 |
+
UniRef100_A0A7W0IU42
|
| 20 |
+
ERR1043166_4517172
|
| 21 |
+
MGYP001057477004
|
| 22 |
+
26195|Ga0315277_11393154_2|-158|01
|
| 23 |
+
UniRef100_A0A8J7LTM0
|
| 24 |
+
12918|scaffold901211_1|+1|10
|
| 25 |
+
SRR5215510_15873647
|
| 26 |
+
SRR5512142_679953
|
| 27 |
+
UniRef100_UPI00190A34C5
|
| 28 |
+
SRR4051812_34338196
|
| 29 |
+
UniRef100_A0A3N5LPZ6
|
| 30 |
+
UniRef100_A0A950DFQ4
|
| 31 |
+
5937|scaffold842798_1|-17|01
|
| 32 |
+
SRR4030095_15264891
|
| 33 |
+
UniRef100_K9UBR2
|
| 34 |
+
UniRef100_UPI001E525CD0
|
| 35 |
+
UniRef100_A0A845X7U1
|
| 36 |
+
3300017444.a:Ga0185300_10001144_2
|
| 37 |
+
UniRef100_A0A349JMI1
|
| 38 |
+
UniRef100_K9SDH8
|
| 39 |
+
UniRef100_A0A3M1PH87
|
| 40 |
+
UniRef100_A0A1C0V439
|
| 41 |
+
UniRef100_UPI001E4DBDD3
|
| 42 |
+
UniRef100_A0A969FMT9
|
| 43 |
+
UniRef100_A0A0M2Q0K7
|
| 44 |
+
UniRef100_A0A978SUS5
|
| 45 |
+
UniRef100_A0A930TQ40
|
| 46 |
+
UniRef100_A0A2W4XV74
|
| 47 |
+
UniRef100_A0A939KSU8
|
| 48 |
+
UniRef100_A0A8K1ZWU9
|
| 49 |
+
UniRef100_UPI001C72C3C0
|
| 50 |
+
UniRef100_A0A6P0TIA3
|
| 51 |
+
UniRef100_A0A5B8NIC7
|
| 52 |
+
UniRef100_UPI002012D1AC
|
| 53 |
+
UniRef100_UPI001C0312E0
|
| 54 |
+
UniRef100_A0A928Z921
|
| 55 |
+
UniRef100_A0A351L2B1
|
| 56 |
+
UniRef100_A0A8J7E209
|
| 57 |
+
UniRef100_UPI00232B6744
|
| 58 |
+
UniRef100_UPI0018EFA1D6
|
| 59 |
+
UniRef100_U5DJK1
|
| 60 |
+
UniRef100_A0A832M402
|
| 61 |
+
UniRef100_A0A1C0VJA3
|
| 62 |
+
UniRef100_A0A969K2I2
|
| 63 |
+
UniRef100_K9X4T7
|
| 64 |
+
UniRef100_W7Q8H4
|
| 65 |
+
W7Q8H4
|
| 66 |
+
A0A1H7MTT5
|
| 67 |
+
UniRef100_A0A1H7MTT5
|
| 68 |
+
A0A1P8R863
|
| 69 |
+
UniRef100_A0A1P8R863
|
| 70 |
+
UniRef100_UPI001CD124AC
|
| 71 |
+
UniRef100_A0A2N5Y751
|
| 72 |
+
UniRef100_A0A2N7UCY5
|
| 73 |
+
MGYP001039280987
|
| 74 |
+
UniRef100_U5T3B0
|
| 75 |
+
UniRef100_A0A540VSD6
|
| 76 |
+
UniRef100_A0A2S6G6T8
|
| 77 |
+
UniRef100_UPI00124EBEF7
|
| 78 |
+
UniRef100_UPI001439CAB5
|
| 79 |
+
MGYP000294044467
|
| 80 |
+
UniRef100_A0A4Q8CZA3
|
| 81 |
+
3104|Ga0306908_1123748_1|-11|01
|
| 82 |
+
UniRef100_UPI00133045FA
|
| 83 |
+
UniRef100_UPI00201FFA50
|
| 84 |
+
UniRef100_A0A845V233
|
| 85 |
+
UniRef100_UPI00082AD250
|
| 86 |
+
MGYP001134272031
|
| 87 |
+
UniRef100_UPI00037CF37C
|
| 88 |
+
UniRef100_UPI0003674D18
|
| 89 |
+
UniRef100_UPI00047687D0
|
| 90 |
+
UniRef100_A0A3S0W7L6
|
| 91 |
+
UniRef100_UPI001903F48C
|
| 92 |
+
SRR4051794_37995438
|
| 93 |
+
UniRef100_A0A318VX16
|
| 94 |
+
UniRef100_A0A372BWN2
|
| 95 |
+
3300017992.a:Ga0180435_10008823_6
|
| 96 |
+
UniRef100_UPI000401CEFF
|
| 97 |
+
16161|scaffold59688_2|+220|00
|
| 98 |
+
SRR5690606_35087643
|
| 99 |
+
UniRef100_A0A0R3M5F3
|
| 100 |
+
UniRef100_A0A969HBU9
|
| 101 |
+
MGYP000847580960
|
| 102 |
+
A0A0R3M5F3
|
| 103 |
+
SRR5262249_39096779
|
| 104 |
+
SRR5690348_3124064
|
| 105 |
+
MGYP000666260026
|
| 106 |
+
UniRef100_A0A838IZY5
|
| 107 |
+
MGYP001366537082
|
| 108 |
+
A0A1Q2HNV8
|
| 109 |
+
UniRef100_UPI002011AF87
|
| 110 |
+
23258|scaffold4609030_1|+1|11
|
| 111 |
+
SRR5579871_6120579
|
| 112 |
+
MGYP000105995723
|
| 113 |
+
SRR3954468_6301146
|
| 114 |
+
MGYP001433622665
|
| 115 |
+
UniRef100_A0A3D6C093
|
| 116 |
+
A0A1W6LJH8
|
| 117 |
+
SRR5687767_11767969
|
| 118 |
+
SRR4051794_40104329
|
| 119 |
+
SRR3954454_22706284
|
| 120 |
+
MGYP000010225417
|
| 121 |
+
2271|Ga0209795_10171170_2|-245|01
|
| 122 |
+
MGYP000738482073
|
| 123 |
+
SRR6202012_1017248
|
| 124 |
+
26133|Ga0268298_10010625_3|-7238|00
|
| 125 |
+
SRR5262249_6174883
|
| 126 |
+
UniRef100_UPI001904043C
|
| 127 |
+
MGYP001146183833
|
| 128 |
+
UniRef100_A0A2U2N9L6
|
| 129 |
+
UniRef100_A0A127EN01
|
| 130 |
+
A0A127EN01
|
| 131 |
+
U2E7T8
|
| 132 |
+
UniRef100_UPI00190730C9
|
| 133 |
+
UniRef100_UPI000D3E5BC6
|
| 134 |
+
UniRef100_A0A1Y6FIV8
|
| 135 |
+
SRR5919109_3969706
|
| 136 |
+
MGYP001077603090
|
| 137 |
+
SRR3954463_15126113
|
| 138 |
+
ERR1039457_666370
|
| 139 |
+
UniRef100_UPI0005BD3569
|
| 140 |
+
10876|scaffold_592705_c1_2|-157|01
|
| 141 |
+
UniRef100_A0A7V9DA05
|
| 142 |
+
MGYP000147404972
|
| 143 |
+
SRR5207249_9234194
|
| 144 |
+
UniRef100_A0A8T3N6J2
|
| 145 |
+
SRR3954451_13963006
|
| 146 |
+
UniRef100_A0A1T4Y342
|
| 147 |
+
UniRef100_A0A2V7TZK6
|
| 148 |
+
UniRef100_A0A831PRG7
|
| 149 |
+
SRR6185295_5137210
|
| 150 |
+
SRR5687767_10342612
|
| 151 |
+
SRR5881409_182087
|
| 152 |
+
UniRef100_A0A4Q3W5A8
|
| 153 |
+
UniRef100_A0A934QHJ2
|
| 154 |
+
4460|scaffold_415991_c1_2|-159|00
|
| 155 |
+
MGYP000603749840
|
| 156 |
+
UPI0003E01BF4
|
| 157 |
+
10796|Ga0318514_12978242_1|+2|11
|
| 158 |
+
UniRef100_UPI0021E125E4
|
| 159 |
+
SRR3712207_3115678
|
| 160 |
+
SRR6185295_1711097
|
| 161 |
+
UniRef100_UPI00214A28C1
|
| 162 |
+
UniRef100_UPI00193BA132
|
| 163 |
+
SRR5688500_8380976
|
| 164 |
+
ETNmetMinimDraft_32_1059908.scaffolds.fasta_scaffold895325_1
|
| 165 |
+
SRR5512132_2975018
|
| 166 |
+
13960|scaffold210726_2|+957|01
|
| 167 |
+
12613|JGI10216J12902_106548506_1|+3|10
|
| 168 |
+
SRR5688572_30282090
|
| 169 |
+
SRR5919106_6413091
|
| 170 |
+
SRR5688572_2308330
|
| 171 |
+
3300018984.a:Ga0193605_1004274_5
|
| 172 |
+
GraSoiStandDraft_41_1057321.scaffolds.fasta_scaffold894400_3
|
| 173 |
+
UniRef100_A0A7X0U868
|
| 174 |
+
SRR5690242_21852208
|
| 175 |
+
SRR4051812_14954157
|
| 176 |
+
SRR5687768_5535903
|
| 177 |
+
GraSoiStandDraft_2_1057267.scaffolds.fasta_scaffold1451175_1
|
| 178 |
+
SRR6185437_3447168
|
| 179 |
+
SRR4051794_7772101
|
| 180 |
+
SRR5829696_6517560
|
| 181 |
+
MGYP000536513446
|
| 182 |
+
SRR5687768_1210620
|
| 183 |
+
SRR6059058_1924189
|
| 184 |
+
SRR5688500_14284300
|
| 185 |
+
SRR5688572_7102903
|
| 186 |
+
SRR4029453_1913048
|
| 187 |
+
SRR6185503_893473
|
| 188 |
+
SRR5262245_8274589
|
| 189 |
+
SoimicMinimDraft_3_1059731.scaffolds.fasta_scaffold2181823_1
|
| 190 |
+
23892|Ga0310888_10269646_2|-120|01
|
| 191 |
+
SRR3954471_22207506
|
| 192 |
+
SRR5581483_9829629
|
| 193 |
+
Cyp2metagenome_2_1107375.scaffolds.fasta_scaffold1445432_1
|
| 194 |
+
UPI00034656C1
|
| 195 |
+
UniRef100_A0A2V7Y706
|
| 196 |
+
SRR3954454_15902270
|
| 197 |
+
SRR5438045_7802661
|
| 198 |
+
AntDryMetagUQ889_1029465.scaffolds.fasta_scaffold07537_1
|
| 199 |
+
SRR5678816_3727367
|
| 200 |
+
18084|scaffold527937_2|-1158|01
|
| 201 |
+
MGYP000592549691
|
| 202 |
+
BarGraIncu01122A_1022018.scaffolds.fasta_scaffold128416_1
|
| 203 |
+
3300006028.a:Ga0070717_10004647_5
|
| 204 |
+
SRR6185436_10889035
|
| 205 |
+
SRR5438132_14018022
|
| 206 |
+
SRR5438552_3472945
|
| 207 |
+
GraSoiStandDraft_28_1057319.scaffolds.fasta_scaffold3853791_1
|
| 208 |
+
SRR5688572_14330442
|
| 209 |
+
UniRef100_A0A512HC53
|
| 210 |
+
SRR5687768_18073040
|
| 211 |
+
SRR5215510_1611124
|
| 212 |
+
SRR4029453_630509
|
| 213 |
+
SRR5678816_1688620
|
| 214 |
+
SRR5262252_1015889
|
| 215 |
+
SRR5687768_1317515
|
| 216 |
+
UniRef100_A0A958FYA2
|
| 217 |
+
MGYP001204866127
|
| 218 |
+
SRR6185503_18435269
|
| 219 |
+
UniRef100_A0A964QAK1
|
| 220 |
+
SRR3954462_9021407
|
| 221 |
+
MGYP001286529226
|
| 222 |
+
SRR5688572_18846285
|
| 223 |
+
SRR5580700_4777658
|
| 224 |
+
UniRef100_K9VYB0
|
| 225 |
+
14399|Ga0335069_12962689_1|-154|01
|
| 226 |
+
A0A1Z4NNA9
|
| 227 |
+
UniRef100_A0A1Z4NNA9
|
| 228 |
+
UniRef100_A0A0V7ZRD6
|
| 229 |
+
UniRef100_K9TAR4
|
| 230 |
+
SRR5262249_2417559
|
| 231 |
+
UniRef100_UPI002021B723
|
| 232 |
+
HubBroStandDraft_5_1064220.scaffolds.fasta_scaffold442189_2
|
| 233 |
+
S7VCH7
|
| 234 |
+
UniRef100_S7VCH7
|
| 235 |
+
CoawatStandDraft_6_1074263.scaffolds.fasta_scaffold645439_1
|
| 236 |
+
UniRef100_UPI00045E5B31
|
| 237 |
+
UniRef100_A0A969VCT1
|
| 238 |
+
UniRef100_A0A517P709
|
| 239 |
+
SRR3954454_16723622
|
| 240 |
+
SRR5581483_7974966
|
| 241 |
+
SRR4030095_4975553
|
| 242 |
+
UniRef100_D0LT55
|
| 243 |
+
22902|Ga0257122_1006421_4|-3359|00
|
| 244 |
+
UniRef100_A0A6L9ZJT6
|
| 245 |
+
A0A0M1JMY3
|
| 246 |
+
UniRef100_UPI001E41D742
|
| 247 |
+
SRR6185503_14955318
|
| 248 |
+
SRR6185312_15773949
|
| 249 |
+
SRR6476619_1219783
|
| 250 |
+
SRR5690606_38631219
|
| 251 |
+
SRR5688572_23129716
|
| 252 |
+
SRR5688572_20232672
|
| 253 |
+
SRR6187401_742178
|
| 254 |
+
SRR5687768_3218532
|
| 255 |
+
SRR6187401_2120497
|
| 256 |
+
SRR5262245_48067131
|
| 257 |
+
UniRef100_A0A934ZFS7
|
| 258 |
+
UniRef100_A0A852ZR84
|
| 259 |
+
ERR1700733_14812660
|
| 260 |
+
MGYP001377341570
|
| 261 |
+
SRR5690242_13802588
|
| 262 |
+
UniRef100_UPI001FB8EED0
|
| 263 |
+
UniRef100_A0A521U920
|
| 264 |
+
UniRef100_A0A933USY1
|
| 265 |
+
SRR5471030_2470326
|
| 266 |
+
UniRef100_UPI001F066344
|
| 267 |
+
14341|Ga0209698_10565634_1|+23|00
|
| 268 |
+
UniRef100_UPI001BEB70D6
|
| 269 |
+
SRR5215218_6711373
|
| 270 |
+
SRR6188768_3491203
|
| 271 |
+
SRR6187402_177262
|
| 272 |
+
SRR6059058_1539947
|
| 273 |
+
SRR3984957_851882
|
| 274 |
+
SRR3954470_18350552
|
| 275 |
+
MGYP001434196702
|
| 276 |
+
UniRef100_A0A4V1DI91
|
| 277 |
+
MGYP000867160857
|
| 278 |
+
UniRef100_UPI002001187B
|
| 279 |
+
MGYP001295239604
|
| 280 |
+
MGYP000025237920
|
| 281 |
+
UniRef100_A0A560LTW2
|
| 282 |
+
UniRef100_A0A076H9F6
|
| 283 |
+
MGYP000311810024
|
| 284 |
+
UniRef100_A0A968YHU9
|
| 285 |
+
UniRef100_UPI000B35C714
|
| 286 |
+
MGYP001088475876
|
| 287 |
+
UniRef100_A0A7Y3TRI1
|
| 288 |
+
UniRef100_A0A352XFA5
|
| 289 |
+
UniRef100_A0A5J6Q9E6
|
| 290 |
+
UniRef100_A0A139WUD4
|
| 291 |
+
UniRef100_A0A926Y835
|
| 292 |
+
UniRef100_UPI0020A7ECEA
|
| 293 |
+
UniRef100_A0A068MZV1
|
| 294 |
+
UniRef100_UPI0016873C22
|
| 295 |
+
UniRef100_A0A0M1JTJ0
|
| 296 |
+
UniRef100_A0A261KTE7
|
| 297 |
+
UniRef100_A0A3C0N9T6
|
| 298 |
+
UniRef100_A0A6J4I559
|
| 299 |
+
UniRef100_A0A6P0QL46
|
| 300 |
+
UniRef100_A0A846D500
|
| 301 |
+
UniRef100_A0A350Y740
|
| 302 |
+
UniRef100_A0YTH4
|
| 303 |
+
UniRef100_L8M9H5
|
| 304 |
+
UniRef100_UPI001D02CB6E
|
| 305 |
+
UniRef100_A0A6P0XB78
|
| 306 |
+
UniRef100_A0A6M0G3C9
|
| 307 |
+
UniRef100_A0A937N7H3
|
| 308 |
+
23040|scaffold_1553752_c1_1|+3|10
|
| 309 |
+
ERR1700732_1595525
|
| 310 |
+
SRR5882762_5740304
|
| 311 |
+
SRR4030081_2822547
|
| 312 |
+
SRR6185295_12007002
|
| 313 |
+
SRR5215831_4900645
|
| 314 |
+
SRR5712672_2296523
|
| 315 |
+
12123|Ga0209625_1018534_1|-121|00
|
| 316 |
+
SRR4051812_48803690
|
| 317 |
+
UniRef100_A0A932SCN4
|
| 318 |
+
SRR5579871_1513814
|
| 319 |
+
UniRef100_A0A517YME2
|
| 320 |
+
ERR1035437_1623514
|
| 321 |
+
SRR4030081_12347
|
| 322 |
+
UniRef100_A0A062V6S3
|
| 323 |
+
ERR1700676_5527870
|
| 324 |
+
15488|Ga0208981_1210027_1|+2|10
|
| 325 |
+
SRR5258708_39684100
|
| 326 |
+
MGYP000303408308
|
| 327 |
+
SRR5258708_26871038
|
| 328 |
+
ERR1044072_7554525
|
| 329 |
+
18065|scaffold45210_3|+2239|01
|
| 330 |
+
SRR5262249_21381219
|
| 331 |
+
SRR4051812_10739495
|
| 332 |
+
SRR6266404_266030
|
| 333 |
+
SRR5260370_32467315
|
| 334 |
+
UniRef100_V4JFK2
|
| 335 |
+
SRR5258708_27969647
|
| 336 |
+
UniRef100_UPI001565DCEC
|
| 337 |
+
SRR5262245_2800084
|
| 338 |
+
ERR1039458_2802747
|
| 339 |
+
ERR1700677_35954
|
| 340 |
+
SRR5450755_1004340
|
| 341 |
+
MGYP000397992808
|
| 342 |
+
SRR5437763_3353870
|
| 343 |
+
SRR5450755_2832987
|
| 344 |
+
MGYP000274140882
|
| 345 |
+
ERR1039458_272673
|
| 346 |
+
ERR1051325_10531525
|
| 347 |
+
SRR5688500_6194403
|
| 348 |
+
SRR5438477_4004941
|
| 349 |
+
SRR3954468_23038545
|
| 350 |
+
SRR5271165_1490500
|
| 351 |
+
UniRef100_A0A142HLX2
|
| 352 |
+
A0A142HLX2
|
| 353 |
+
SRR5512142_3015464
|
| 354 |
+
SRR5215208_4287972
|
| 355 |
+
SRR3954469_15268427
|
| 356 |
+
SRR5450432_1605686
|
| 357 |
+
SRR5688572_14714411
|
| 358 |
+
SRR5687768_10781478
|
| 359 |
+
SRR5438105_8210914
|
| 360 |
+
UniRef100_A0A2V8GSN4
|
| 361 |
+
12689|scaffold1791547_1|-2|11
|
| 362 |
+
UniRef100_A0A948CES1
|
| 363 |
+
SRR6185295_9098939
|
| 364 |
+
SRR4051812_7426335
|
examples/data/provenance.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# KaiB demo data — provenance
|
| 2 |
+
|
| 3 |
+
This directory contains a concatenated demo asset for the SF-Cluster Colab
|
| 4 |
+
notebook. It is derived from the SF-Cluster Phase II benchmark's KaiB
|
| 5 |
+
`diverse_sf` arm and the FrustrAI-Seq per-residue Frustration Index (FI)
|
| 6 |
+
outputs.
|
| 7 |
+
|
| 8 |
+
## Files
|
| 9 |
+
|
| 10 |
+
| File | Shape / size | Description |
|
| 11 |
+
|-----------------------|------------------------|-------------|
|
| 12 |
+
| `KaiB_filtered.a3m` | 364 records, L=91 | Subset of the KaiB filtered MSA. Query (`>101`, UniProt Q79V61 residues 5–95) is row 0. Lowercase insertion-state letters preserved. |
|
| 13 |
+
| `KaiB_fi_matrix.npy` | (364, 91) float32 | Per-residue FI matrix. Row `i` corresponds to record `i` in the A3M. |
|
| 14 |
+
| `KaiB_seq_ids.txt` | 364 lines | One short sequence ID per line, in the same order as the A3M / FI matrix. |
|
| 15 |
+
|
| 16 |
+
## Source paths (private dev repo, read-only)
|
| 17 |
+
|
| 18 |
+
- Filtered MSA:
|
| 19 |
+
`/data1/hanqun/SF-Design/SF-Cluster/data/processed/msa/KaiB/KaiB/KaiB_KaiBTE_91aa_UniProt_Q79V61_5to95_2QKE_chainB.filtered.a3m`
|
| 20 |
+
(depth 6821, L=91)
|
| 21 |
+
- FI artifacts (per-subset):
|
| 22 |
+
`/data1/hanqun/SF-Design/SF-Cluster/results/frustai_artifacts/KaiB/diverse_sf/KaiB/{000..011}/`
|
| 23 |
+
with files `fi_matrix.npy` ((32, 91) float32), `metadata.json`,
|
| 24 |
+
`fi_residual_matrix.npy`, `entropy_matrix.npy`.
|
| 25 |
+
- Source subset A3Ms (used to map FI rows → sequence IDs):
|
| 26 |
+
`/data1/hanqun/SF-Design/SF-Cluster/results/baseline_p8/diverse_sf/KaiB/KaiB/screen/diversesf_KaiB_KaiB_seed{000..011}.a3m`.
|
| 27 |
+
|
| 28 |
+
## Construction recipe
|
| 29 |
+
|
| 30 |
+
1. For each of the 12 `diverse_sf` subsets, load `fi_matrix.npy` ((32, 91)
|
| 31 |
+
float32) and the corresponding `diversesf_KaiB_KaiB_seed{NNN}.a3m`.
|
| 32 |
+
2. Concatenate rows in subset-index order; track the parallel sequence-ID
|
| 33 |
+
list from the A3M records.
|
| 34 |
+
3. **Deduplication policy**: first occurrence wins. A sequence ID seen in an
|
| 35 |
+
earlier subset is skipped (both in the FI matrix and the ID list). This
|
| 36 |
+
reduces 12 × 32 = 384 raw rows to 364 unique rows.
|
| 37 |
+
4. Extract the corresponding sequences (with their full headers and
|
| 38 |
+
lowercase insertion states) from the filtered MSA, preserving the order
|
| 39 |
+
established in step 3. The query (`>101`) is always row 0.
|
| 40 |
+
|
| 41 |
+
All 364 unique IDs were found in the filtered MSA (0 missing).
|
| 42 |
+
|
| 43 |
+
## Models
|
| 44 |
+
|
| 45 |
+
- **FrustrAI-Seq weights**: HF repo `leuschj/FrustrAI-Seq`, commit
|
| 46 |
+
`ee5a01a29fde00630f4a1157f0e6cb8343ac434b`. Inference in fp16 with LoRA
|
| 47 |
+
adapters merged.
|
| 48 |
+
|
| 49 |
+
## License
|
| 50 |
+
|
| 51 |
+
This demo asset is released under MIT alongside the SF-Cluster OSS package.
|
| 52 |
+
The KaiB sequence (UniProt Q79V61, *Thermosynechococcus elongatus*) and its
|
| 53 |
+
MSA neighbors are public-domain sequence records via UniRef100 / Mgnify;
|
| 54 |
+
no proprietary structures are included. FrustrAI-Seq outputs are derived
|
| 55 |
+
features (floating-point FI values) and are released by the FrustrAI-Seq
|
| 56 |
+
authors under their own license — see
|
| 57 |
+
https://huggingface.co/leuschj/FrustrAI-Seq.
|