--- license: other --- [![License](https://img.shields.io/badge/license-GenBio_AI_Community_License-orange)](https://github.com/genbio-ai/ModelGenerator/blob/main/LICENSE) # How to use ```python from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained( "Taykhoom/AIDO-RNA-Wrapper", trust_remote_code=True, ) model = AutoModel.from_pretrained( "Taykhoom/AIDO-RNA-Wrapper", trust_remote_code=True, base_model="genbio-ai/AIDO.RNA-650M-CDS", ) dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC" inputs = tokenizer(dna, add_special_tokens=True, return_tensors="pt") embedding = model(**inputs).last_hidden_state # [1, sequence_length, 1280] ``` # Model Variants The following `base_model` options are available for embedding generation. The short name (keys) or the full model name (values) can be specified using the `base_model` argument. ```python VARIANTS = { "aido_rna_1m_mars": "genbio-ai/AIDO.RNA-1M-MARS", "aido_rna_25m_mars": "genbio-ai/AIDO.RNA-25M-MARS", "aido_rna_300m_mars": "genbio-ai/AIDO.RNA-300M-MARS", "aido_rna_650m": "genbio-ai/AIDO.RNA-650M", "aido_rna_650m_cds": "genbio-ai/AIDO.RNA-650M-CDS", "aido_rna_1b600m": "genbio-ai/AIDO.RNA-1.6B", "aido_rna_1b600m_cds": "genbio-ai/AIDO.RNA-1.6B-CDS", } ``` # Performance Vs Original AIDO.RNA Models Verify that the modified code produces the same embeddings as the original AIDO.RNA models. Original AIDO.RNA code snippet: ```python from modelgenerator.tasks import Embed import torch model = Embed.from_config({"model.backbone": "aido_rna_650m"}).eval() dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC" transformed_batch = model.transform({"sequences": [dna]}) embedding = model(transformed_batch) # [1, sequence_length, 1280] embedding_mean = torch.mean(embedding, dim=1) print(torch.mean(embedding_mean)) # Outputs tensor(0.0005, grad_fn=) embedding_max = torch.max(embedding, dim=1)[0] print(torch.mean(embedding_max)) # Outputs tensor(1.5583, grad_fn=) ``` Modified code snippet using the wrapper: ```python import torch from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained( "Taykhoom/AIDO-RNA-Wrapper", trust_remote_code=True, ) model = AutoModel.from_pretrained( "Taykhoom/AIDO-RNA-Wrapper", trust_remote_code=True, base_model="genbio-ai/AIDO.RNA-650M", ) dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC" inputs = tokenizer(dna, add_special_tokens=True, return_tensors="pt") embedding = model(**inputs).last_hidden_state # [1, sequence_length, 1280] embedding_mean = torch.mean(embedding, dim=1) print(torch.mean(embedding_mean)) # Outputs tensor(0.0005, grad_fn=) embedding_max = torch.max(embedding, dim=1)[0] print(torch.mean(embedding_max)) # Outputs tensor(1.5583, grad_fn=) ``` # License Notice This repository contains modified versions of GenBio AI code. Modifications include: - Removal of reliance on modelgenerator package - Can load specific AIDO.RNA models via the `base_model` argument Not all of the original functionality may be preserved. These changes were made to better integrate with the mRNABench framework which focuses on embedding generation for mRNA sequences. Most of the required code was directly copied from the original GenBio AI repository with minimal changes, so please refer to the original repository for full details on the implementation. When using this repository, please adhere to the original license terms of the GenBio AI code. This license can be found in this directory as `LICENSE`. # Original Repository The original AIDO.RNA models and code are available at: https://github.com/genbio-ai/ModelGenerator