metadata
library_name: pytorch
tags:
- vae
- genomics
- genome-minimization
- e-coli
Genome Minimizer 2
VAE-powered pipeline for generating minimal E. coli genomes. Models are trained on a binary gene presence/absence matrix of ~10,000 E. coli strains across ~55,000 genes.
Model Variants
Each preset is stored on its own branch:
| Branch | Architecture | Loss Functions | Description |
|---|---|---|---|
v0 |
55,039 → 1024 → 64 | Recon + KL (linear) | Baseline VAE |
v1 |
55,039 → 512 → 32 | Recon + KL (linear) + Abundance + L1 | + gene frequency control |
v2 |
55,039 → 512 → 32 | Recon + KL (cosine) + Abundance + L1 | Improved convergence |
v3 |
55,039 → 512 → 32 | Recon + KL (cosine) + Weighted Abundance + L1 | Best minimal genomes |
Quick Start
from huggingface_hub import hf_hub_download
import torch
from src.genome_minimizer_2.training.model import VAE
# Download v3 (best for minimal genomes)
path = hf_hub_download("McClain/genome-minimizer-2", "final.pt", revision="v3")
checkpoint = torch.load(path, map_location="cpu")
model = VAE(input_dim=55039, hidden_dim=512, latent_dim=32)
model.load_state_dict(checkpoint["model_state_dict"])