| --- |
| library_name: pytorch |
| tags: |
| - vae |
| - genomics |
| - genome-minimization |
| - e-coli |
| --- |
| |
| # Genome Minimizer 2 |
|
|
| VAE-powered pipeline for generating minimal *E. coli* genomes. Models are trained on a binary gene presence/absence matrix of ~10,000 *E. coli* strains across ~55,000 genes. |
|
|
| ## Model Variants |
|
|
| Each preset is stored on its own branch: |
|
|
| | Branch | Architecture | Loss Functions | Description | |
| |--------|---|---|---| |
| | [`v0`](https://huggingface.co/McClain/genome-minimizer-2/tree/v0) | 55,039 β 1024 β 64 | Recon + KL (linear) | Baseline VAE | |
| | [`v1`](https://huggingface.co/McClain/genome-minimizer-2/tree/v1) | 55,039 β 512 β 32 | Recon + KL (linear) + Abundance + L1 | + gene frequency control | |
| | [`v2`](https://huggingface.co/McClain/genome-minimizer-2/tree/v2) | 55,039 β 512 β 32 | Recon + KL (cosine) + Abundance + L1 | Improved convergence | |
| | [`v3`](https://huggingface.co/McClain/genome-minimizer-2/tree/v3) | 55,039 β 512 β 32 | Recon + KL (cosine) + Weighted Abundance + L1 | Best minimal genomes | |
|
|
| ## Quick Start |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import torch |
| from src.genome_minimizer_2.training.model import VAE |
| |
| # Download v3 (best for minimal genomes) |
| path = hf_hub_download("McClain/genome-minimizer-2", "final.pt", revision="v3") |
| checkpoint = torch.load(path, map_location="cpu") |
| |
| model = VAE(input_dim=55039, hidden_dim=512, latent_dim=32) |
| model.load_state_dict(checkpoint["model_state_dict"]) |
| ``` |
|
|
| ## Links |
|
|
| - [W&B Experiment Tracking](https://wandb.ai/mcclain/genome-minimizer-2) |
| - [GitHub Repository](https://github.com/ucl-cssb/genome-minimizer-2) |
|
|