| --- |
| library_name: transformers |
| tags: |
| - CodonTransformer |
| - Computational Biology |
| - Machine Learning |
| - Bioinformatics |
| - Synthetic Biology |
| license: apache-2.0 |
| pipeline_tag: token-classification |
| --- |
| |
|  |
|
|
| **CodonTransformer** is the ultimate tool for codon optimization, transforming protein sequences into optimized DNA sequences specific for your target organisms. Whether you are a researcher or a practitioner in genetic engineering, CodonTransformer provides a comprehensive suite of features to facilitate your work. By leveraging the Transformer architecture and a user-friendly Jupyter notebook, it reduces the complexity of codon optimization, saving you time and effort. |
| <br> |
|
|
| **This is the pretrained model, for best results please use the [finetuned model](https://huggingface.co/adibvafa/CodonTransformer)**. |
|
|
| ## Authors |
| Adibvafa Fallahpour<sup>1,2</sup>\*, Vincent Gureghian<sup>3</sup>\*, Guillaume J. Filion<sup>2</sup>‡, Ariel B. Lindner<sup>3</sup>‡, Amir Pandi<sup>3</sup>‡ |
|
|
| <sup>1</sup> Vector Institute for Artificial Intelligence, Toronto ON, Canada |
| <sup>2</sup> University of Toronto Scarborough; Department of Biological Science; Scarborough ON, Canada |
| <sup>3</sup> Université Paris Cité, INSERM U1284, Center for Research and Interdisciplinarity, F-75006 Paris, France |
| \* These authors contributed equally to this work. |
| ‡ To whom correspondence should be addressed: <br> |
| guillaume.filion@utoronto.ca, ariel.lindner@inserm.fr, amir.pandi@cri-paris.org |
| <br> |
|
|
| ## Use Case |
| **For a guide on finetuning CodonTransformer, check out our [GitHub.](https://github.com/Adibvafa/CodonTransformer/tree/main?tab=readme-ov-file#finetuning-codontransformer)** |
| <br>**For an interactive demo, check out our [Google Colab Notebook.](https://adibvafa.github.io/CodonTransformer/GoogleColab)** |
| <br></br> |
| After installing CodonTransformer, you can use: |
| ```python |
| import torch |
| from transformers import AutoTokenizer, BigBirdForMaskedLM |
| from CodonTransformer.CodonPrediction import predict_dna_sequence |
| from CodonTransformer.CodonJupyter import format_model_output |
| DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| |
| |
| # Load model and tokenizer |
| tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer") |
| model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer-base").to(DEVICE) |
| |
| |
| # Set your input data |
| protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG" |
| organism = "Escherichia coli general" |
| |
| |
| # Predict with CodonTransformer |
| output = predict_dna_sequence( |
| protein=protein, |
| organism=organism, |
| device=DEVICE, |
| tokenizer=tokenizer, |
| model=model, |
| attention_type="original_full", |
| ) |
| print(format_model_output(output)) |
| ``` |
| The output is: |
| <br> |
|
|
|
|
| ```python |
| ----------------------------- |
| | Organism | |
| ----------------------------- |
| Escherichia coli general |
|
|
| ----------------------------- |
| | Input Protein | |
| ----------------------------- |
| MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG |
|
|
| ----------------------------- |
| | Processed Input | |
| ----------------------------- |
| M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK |
| |
| ----------------------------- |
| | Predicted DNA | |
| ----------------------------- |
| ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA |
| ``` |
| |
| |
| ## Additional Resources |
| - **Project Website** <br> |
| https://adibvafa.github.io/CodonTransformer/ |
| |
| - **GitHub Repository** <br> |
| https://github.com/Adibvafa/CodonTransformer |
| |
| - **Google Colab Demo** <br> |
| https://adibvafa.github.io/CodonTransformer/GoogleColab |
| |
| - **PyPI Package** <br> |
| https://pypi.org/project/CodonTransformer/ |
| |
| - **Paper** <br> |
| https://www.biorxiv.org/content/10.1101/2024.09.13.612903 |