Spaces:
Running
Running
| # JCVI Syn3.0 Unknown Genes | |
| This directory contains protein sequences from the JCVI Syn3.0 minimal bacterial genome that were annotated as "unknown function" or "generic". | |
| ## Source | |
| **JCVI Syn3.0** is the minimal bacterial genome created by the J. Craig Venter Institute: | |
| > Hutchison CA 3rd, et al. "Design and synthesis of a minimal bacterial genome." | |
| > Science. 2016 Mar 25;351(6280):aad6253. | |
| > DOI: [10.1126/science.aad6253](https://doi.org/10.1126/science.aad6253) | |
| The 473-gene genome was systematically reduced from *Mycoplasma mycoides* to identify the minimal set of genes required for life. | |
| ## Files | |
| | File | Description | | |
| |------|-------------| | |
| | `unknown_aa_seqs.fasta` | 149 protein sequences with unknown/generic function | | |
| | `unknown_aa_seqs.npy` | Pre-computed Protein-Vec embeddings (149 × 512) | | |
| ## Gene Naming | |
| - `MMSYN1_XXXX` - Gene identifier in Syn3.0 | |
| - `1=Unknown` - Gene with unknown function | |
| - `2=Generic` - Gene with generic/broad annotation | |
| ## Results | |
| Using conformal protein retrieval at 10% FDR (α=0.1): | |
| - **59/149 (39.6%)** of unknown genes can be confidently annotated | |
| - Results reproduced in `notebooks/pfam/genes_unknown.ipynb` | |
| - See paper Figure 2A for visualization | |
| ## Citation | |
| If using this data, please cite both the CPR paper and the original Syn3.0 paper: | |
| ```bibtex | |
| @article{boger2025conformal, | |
| title={Functional protein mining with conformal guarantees}, | |
| author={Boger, Ron S and Chithrananda, Seyone and Angelopoulos, Anastasios N and Yoon, Peter H and Jordan, Michael I and Doudna, Jennifer A}, | |
| journal={Nature Communications}, | |
| volume={16}, | |
| pages={85}, | |
| year={2025}, | |
| doi={10.1038/s41467-024-55676-y} | |
| } | |
| @article{hutchison2016design, | |
| title={Design and synthesis of a minimal bacterial genome}, | |
| author={Hutchison, Clyde A and Chuang, Ray-Yuan and Noskov, Vladimir N and others}, | |
| journal={Science}, | |
| volume={351}, | |
| number={6280}, | |
| pages={aad6253}, | |
| year={2016}, | |
| doi={10.1126/science.aad6253} | |
| } | |
| ``` | |