File size: 1,627 Bytes
ebff460
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
library_name: pytorch
tags:
  - vae
  - genomics
  - genome-minimization
  - e-coli
---

# Genome Minimizer 2

VAE-powered pipeline for generating minimal *E. coli* genomes. Models are trained on a binary gene presence/absence matrix of ~10,000 *E. coli* strains across ~55,000 genes.

## Model Variants

Each preset is stored on its own branch:

| Branch | Architecture | Loss Functions | Description |
|--------|---|---|---|
| [`v0`](https://huggingface.co/McClain/genome-minimizer-2/tree/v0) | 55,039 → 1024 → 64 | Recon + KL (linear) | Baseline VAE |
| [`v1`](https://huggingface.co/McClain/genome-minimizer-2/tree/v1) | 55,039 → 512 → 32 | Recon + KL (linear) + Abundance + L1 | + gene frequency control |
| [`v2`](https://huggingface.co/McClain/genome-minimizer-2/tree/v2) | 55,039 → 512 → 32 | Recon + KL (cosine) + Abundance + L1 | Improved convergence |
| [`v3`](https://huggingface.co/McClain/genome-minimizer-2/tree/v3) | 55,039 → 512 → 32 | Recon + KL (cosine) + Weighted Abundance + L1 | Best minimal genomes |

## Quick Start

```python
from huggingface_hub import hf_hub_download
import torch
from src.genome_minimizer_2.training.model import VAE

# Download v3 (best for minimal genomes)
path = hf_hub_download("McClain/genome-minimizer-2", "final.pt", revision="v3")
checkpoint = torch.load(path, map_location="cpu")

model = VAE(input_dim=55039, hidden_dim=512, latent_dim=32)
model.load_state_dict(checkpoint["model_state_dict"])
```

## Links

- [W&B Experiment Tracking](https://wandb.ai/mcclain/genome-minimizer-2)
- [GitHub Repository](https://github.com/ucl-cssb/genome-minimizer-2)