the-puzzler commited on
Commit
de47ae5
·
1 Parent(s): 44b0b79

Fix Space README metadata

Browse files
Files changed (1) hide show
  1. README.md +18 -74
README.md CHANGED
@@ -1,74 +1,18 @@
1
- # Microbiome Transformer (Set-Based OTU Stability Model)
2
-
3
- This repository provides Transformer checkpoints for microbiome set modeling using SSU rRNA OTU embeddings (ProkBERT-derived vectors) and optional text metadata embeddings.
4
-
5
- Please see https://github.com/the-puzzler/microbiome-model for more information and relevant code.
6
-
7
- ## Model summary
8
-
9
- - **Architecture:** `MicrobiomeTransformer` (see `model.py`)
10
- - **Input type 1 (DNA/OTU):** 384-d embeddings
11
- - **Input type 2 (text metadata):** 1536-d embeddings
12
- - **Core behavior:** permutation-invariant set encoding via Transformer encoder (no positional encodings)
13
- - **Output:** per-token scalar logits (used as stability scores)
14
-
15
- ## Available checkpoints
16
-
17
- | Filename | Size variant | Metadata variant | `d_model` | `num_layers` | `dim_feedforward` | `nhead` |
18
- |---|---|---|---:|---:|---:|---:|
19
- | `small-notext.pt` | small | DNA-only | 20 | 3 | 80 | 5 |
20
- | `small-text.pt` | small | DNA + text | 20 | 3 | 80 | 5 |
21
- | `large-notext.pt` | large | DNA-only | 100 | 5 | 400 | 5 |
22
- | `large-text.pt` | large | DNA + text | 100 | 5 | 400 | 5 |
23
-
24
- Shared dimensions:
25
- - `OTU_EMB = 384`
26
- - `TXT_EMB = 1536`
27
- - `DROPOUT = 0.1`
28
-
29
- ## Input expectations
30
-
31
- 1. Build a set of OTU embeddings (ProkBERT vectors) per sample.
32
- 2. Optionally build a set of text embeddings (metadata) per sample for text-enabled variants.
33
- 3. Feed both sets as:
34
- - `embeddings_type1`: shape `(B, N_otu, 384)`
35
- - `embeddings_type2`: shape `(B, N_txt, 1536)`
36
- - `mask`: shape `(B, N_otu + N_txt)` with valid positions as `True`
37
- - `type_indicators`: shape `(B, N_otu + N_txt)` (0 for OTU tokens, 1 for text tokens)
38
-
39
- ## Minimal loading example
40
-
41
- ```python
42
- import torch
43
- from model import MicrobiomeTransformer
44
-
45
- ckpt_path = "large-notext.pt" # or small-notext.pt / small-text.pt / large-text.pt
46
- checkpoint = torch.load(ckpt_path, map_location="cpu")
47
- state_dict = checkpoint.get("model_state_dict", checkpoint)
48
-
49
- is_small = "small" in ckpt_path
50
- model = MicrobiomeTransformer(
51
- input_dim_type1=384,
52
- input_dim_type2=1536,
53
- d_model=20 if is_small else 100,
54
- nhead=5,
55
- num_layers=3 if is_small else 5,
56
- dim_feedforward=80 if is_small else 400,
57
- dropout=0.1,
58
- )
59
- model.load_state_dict(state_dict, strict=False)
60
- model.eval()
61
- ```
62
-
63
- ## Intended use
64
-
65
- - Microbiome representation learning from OTU sets
66
- - Stability-style scoring of community members
67
- - Downstream analyses such as dropout/colonization prediction and rollout trajectory experiments
68
-
69
- ## Limitations
70
-
71
- - This is a research model and not a clinical diagnostic tool.
72
- - Outputs depend strongly on upstream OTU mapping, embedding resolution, and cohort preprocessing.
73
- - Text-enabled checkpoints expect compatible metadata embedding pipelines.
74
-
 
1
+ ---
2
+ title: Microbiome Space
3
+ emoji: 🧬
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: "5.0.0"
8
+ python_version: "3.10"
9
+ app_file: app.py
10
+ pinned: false
11
+ ---
12
+
13
+ # Microbiome Gene Scoring Explorer
14
+
15
+ Upload a FASTA of genes, embed with `prokbert-mini-long` (mean pooling), score with `large-notext`, and inspect:
16
+ - UMAP of input embeddings
17
+ - UMAP of final embeddings
18
+ - Logit distribution