Spaces:
Sleeping
Sleeping
| title: ColiFormer - E. coli Codon Optimization | |
| emoji: 𧬠| |
| colorFrom: blue | |
| colorTo: green | |
| sdk: streamlit | |
| sdk_version: 1.28.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: E. coli codon optimization with fine-tuned transformers | |
| tags: | |
| - biology | |
| - codon-optimization | |
| - e-coli | |
| - protein-synthesis | |
| - bioinformatics | |
| - synthetic-biology | |
| - transformers | |
| - streamlit | |
| # 𧬠ColiFormer - E. coli Codon Optimization | |
| **ColiFormer** is a specialized codon optimization tool fine-tuned specifically for *Escherichia coli* sequences, achieving **6.2% better CAI scores** compared to the base CodonTransformer model. | |
| ## π Features | |
| - **π― E. coli Specialized**: Fine-tuned on 4,300 high-CAI E. coli sequences | |
| - **π Advanced Metrics**: CAI, tAI, GC content, and codon frequency analysis | |
| - **π€ Auto-Loading**: Automatically downloads model and reference data from Hugging Face | |
| - **β‘ Real-time**: Interactive sequence optimization with live metrics | |
| - **π¬ Research-Grade**: Based on BigBird Transformer architecture | |
| - **π Performance**: Significant improvement over base models for E. coli | |
| ## π Model Performance | |
| | Metric | Base Model | ColiFormer | Improvement | | |
| |--------|------------|------------|-------------| | |
| | CAI Score | 0.742 | 0.788 | **+6.2%** | | |
| | tAI Score | 0.451 | 0.478 | **+6.0%** | | |
| | GC Content | 52.1% | 51.8% | Optimized | | |
| ## π Related Resources | |
| - **Model**: [saketh11/ColiFormer](https://huggingface.co/saketh11/ColiFormer) | |
| - **Dataset**: [saketh11/ColiFormer-Data](https://huggingface.co/datasets/saketh11/ColiFormer-Data) | |
| - **Base Model**: [adibvafa/CodonTransformer](https://huggingface.co/adibvafa/CodonTransformer) | |
| - **Paper**: [CodonTransformer: The Global Translation of Genetic Code by Transformer](https://www.biorxiv.org/content/10.1101/2023.09.09.556981v1) | |
| ## π‘ How to Use | |
| 1. **Enter your protein sequence** in single-letter amino acid format | |
| 2. **Select optimization parameters** (temperature, max length, etc.) | |
| 3. **Click "Optimize Sequence"** to generate the optimized DNA sequence | |
| 4. **View comprehensive metrics** including CAI, tAI, GC content, and codon usage | |
| 5. **Download results** as FASTA or Excel files | |
| ## π§ͺ Example | |
| **Input Protein**: `MKRISTTITTTITITTGNGAG` | |
| **Optimized DNA**: `ATGAAACGTATTAGT...` (optimized for E. coli expression) | |
| **Metrics**: | |
| - CAI: 0.85 (High) | |
| - tAI: 0.52 (Good) | |
| - GC Content: 51.2% (Optimal) | |
| ## π¬ Technical Details | |
| - **Architecture**: BigBird Transformer with 12 layers | |
| - **Training**: Adaptive Learning Methods (ALM) enhanced | |
| - **Context Length**: Up to 4096 tokens | |
| - **Fine-tuning**: 4,300 high-CAI E. coli sequences | |
| - **Reference Data**: 50,000+ E. coli gene sequences for metrics | |
| ## π Citation | |
| If you use ColiFormer in your research, please cite: | |
| ```bibtex | |
| @article{codon_transformer_2023, | |
| title={CodonTransformer: The Global Translation of Genetic Code by Transformer}, | |
| author={Adibvafa Fallahpour and Bartosz Grzybowski and Bogdan Gliwa and Bartosz Michalak}, | |
| journal={bioRxiv}, | |
| year={2023}, | |
| doi={10.1101/2023.09.09.556981} | |
| } | |
| ``` | |
| ## π License | |
| This project is licensed under the MIT License. | |
| --- | |
| **Built with β€οΈ for the synthetic biology community** | |