Spaces:
Sleeping
Sleeping
the-puzzler commited on
Commit ·
de47ae5
1
Parent(s): 44b0b79
Fix Space README metadata
Browse files
README.md
CHANGED
|
@@ -1,74 +1,18 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
| `small-notext.pt` | small | DNA-only | 20 | 3 | 80 | 5 |
|
| 20 |
-
| `small-text.pt` | small | DNA + text | 20 | 3 | 80 | 5 |
|
| 21 |
-
| `large-notext.pt` | large | DNA-only | 100 | 5 | 400 | 5 |
|
| 22 |
-
| `large-text.pt` | large | DNA + text | 100 | 5 | 400 | 5 |
|
| 23 |
-
|
| 24 |
-
Shared dimensions:
|
| 25 |
-
- `OTU_EMB = 384`
|
| 26 |
-
- `TXT_EMB = 1536`
|
| 27 |
-
- `DROPOUT = 0.1`
|
| 28 |
-
|
| 29 |
-
## Input expectations
|
| 30 |
-
|
| 31 |
-
1. Build a set of OTU embeddings (ProkBERT vectors) per sample.
|
| 32 |
-
2. Optionally build a set of text embeddings (metadata) per sample for text-enabled variants.
|
| 33 |
-
3. Feed both sets as:
|
| 34 |
-
- `embeddings_type1`: shape `(B, N_otu, 384)`
|
| 35 |
-
- `embeddings_type2`: shape `(B, N_txt, 1536)`
|
| 36 |
-
- `mask`: shape `(B, N_otu + N_txt)` with valid positions as `True`
|
| 37 |
-
- `type_indicators`: shape `(B, N_otu + N_txt)` (0 for OTU tokens, 1 for text tokens)
|
| 38 |
-
|
| 39 |
-
## Minimal loading example
|
| 40 |
-
|
| 41 |
-
```python
|
| 42 |
-
import torch
|
| 43 |
-
from model import MicrobiomeTransformer
|
| 44 |
-
|
| 45 |
-
ckpt_path = "large-notext.pt" # or small-notext.pt / small-text.pt / large-text.pt
|
| 46 |
-
checkpoint = torch.load(ckpt_path, map_location="cpu")
|
| 47 |
-
state_dict = checkpoint.get("model_state_dict", checkpoint)
|
| 48 |
-
|
| 49 |
-
is_small = "small" in ckpt_path
|
| 50 |
-
model = MicrobiomeTransformer(
|
| 51 |
-
input_dim_type1=384,
|
| 52 |
-
input_dim_type2=1536,
|
| 53 |
-
d_model=20 if is_small else 100,
|
| 54 |
-
nhead=5,
|
| 55 |
-
num_layers=3 if is_small else 5,
|
| 56 |
-
dim_feedforward=80 if is_small else 400,
|
| 57 |
-
dropout=0.1,
|
| 58 |
-
)
|
| 59 |
-
model.load_state_dict(state_dict, strict=False)
|
| 60 |
-
model.eval()
|
| 61 |
-
```
|
| 62 |
-
|
| 63 |
-
## Intended use
|
| 64 |
-
|
| 65 |
-
- Microbiome representation learning from OTU sets
|
| 66 |
-
- Stability-style scoring of community members
|
| 67 |
-
- Downstream analyses such as dropout/colonization prediction and rollout trajectory experiments
|
| 68 |
-
|
| 69 |
-
## Limitations
|
| 70 |
-
|
| 71 |
-
- This is a research model and not a clinical diagnostic tool.
|
| 72 |
-
- Outputs depend strongly on upstream OTU mapping, embedding resolution, and cohort preprocessing.
|
| 73 |
-
- Text-enabled checkpoints expect compatible metadata embedding pipelines.
|
| 74 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Microbiome Space
|
| 3 |
+
emoji: 🧬
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "5.0.0"
|
| 8 |
+
python_version: "3.10"
|
| 9 |
+
app_file: app.py
|
| 10 |
+
pinned: false
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Microbiome Gene Scoring Explorer
|
| 14 |
+
|
| 15 |
+
Upload a FASTA of genes, embed with `prokbert-mini-long` (mean pooling), score with `large-notext`, and inspect:
|
| 16 |
+
- UMAP of input embeddings
|
| 17 |
+
- UMAP of final embeddings
|
| 18 |
+
- Logit distribution
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|