AIDO-RNA-Wrapper / README.md
Taykhoom's picture
Upload folder using huggingface_hub
6a3f570 verified
---
license: other
---
[![License](https://img.shields.io/badge/license-GenBio_AI_Community_License-orange)](https://github.com/genbio-ai/ModelGenerator/blob/main/LICENSE)
# How to use
```python
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"Taykhoom/AIDO-RNA-Wrapper",
trust_remote_code=True,
)
model = AutoModel.from_pretrained(
"Taykhoom/AIDO-RNA-Wrapper",
trust_remote_code=True,
base_model="genbio-ai/AIDO.RNA-650M-CDS",
)
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, add_special_tokens=True, return_tensors="pt")
embedding = model(**inputs).last_hidden_state # [1, sequence_length, 1280]
```
# Model Variants
The following `base_model` options are available for embedding generation. The short name (keys) or the full model name (values) can be specified using the `base_model` argument.
```python
VARIANTS = {
"aido_rna_1m_mars": "genbio-ai/AIDO.RNA-1M-MARS",
"aido_rna_25m_mars": "genbio-ai/AIDO.RNA-25M-MARS",
"aido_rna_300m_mars": "genbio-ai/AIDO.RNA-300M-MARS",
"aido_rna_650m": "genbio-ai/AIDO.RNA-650M",
"aido_rna_650m_cds": "genbio-ai/AIDO.RNA-650M-CDS",
"aido_rna_1b600m": "genbio-ai/AIDO.RNA-1.6B",
"aido_rna_1b600m_cds": "genbio-ai/AIDO.RNA-1.6B-CDS",
}
```
# Performance Vs Original AIDO.RNA Models
Verify that the modified code produces the same embeddings as the original AIDO.RNA models.
Original AIDO.RNA code snippet:
```python
from modelgenerator.tasks import Embed
import torch
model = Embed.from_config({"model.backbone": "aido_rna_650m"}).eval()
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
transformed_batch = model.transform({"sequences": [dna]})
embedding = model(transformed_batch) # [1, sequence_length, 1280]
embedding_mean = torch.mean(embedding, dim=1)
print(torch.mean(embedding_mean)) # Outputs tensor(0.0005, grad_fn=<MeanBackward0>)
embedding_max = torch.max(embedding, dim=1)[0]
print(torch.mean(embedding_max)) # Outputs tensor(1.5583, grad_fn=<MeanBackward0>)
```
Modified code snippet using the wrapper:
```python
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained(
"Taykhoom/AIDO-RNA-Wrapper",
trust_remote_code=True,
)
model = AutoModel.from_pretrained(
"Taykhoom/AIDO-RNA-Wrapper",
trust_remote_code=True,
base_model="genbio-ai/AIDO.RNA-650M",
)
dna = "ACGTAGCATCGGATCTATCTATCGACACTTGGTTATCGATCTACGAGCATCTCGTTAGC"
inputs = tokenizer(dna, add_special_tokens=True, return_tensors="pt")
embedding = model(**inputs).last_hidden_state # [1, sequence_length, 1280]
embedding_mean = torch.mean(embedding, dim=1)
print(torch.mean(embedding_mean)) # Outputs tensor(0.0005, grad_fn=<MeanBackward0>)
embedding_max = torch.max(embedding, dim=1)[0]
print(torch.mean(embedding_max)) # Outputs tensor(1.5583, grad_fn=<MeanBackward0>)
```
# License Notice
This repository contains modified versions of GenBio AI code.
Modifications include:
- Removal of reliance on modelgenerator package
- Can load specific AIDO.RNA models via the `base_model` argument
Not all of the original functionality may be preserved. These changes were made to better integrate with the mRNABench framework which focuses on embedding generation for mRNA sequences. Most of the required code was directly copied from the original GenBio AI repository with minimal changes, so please refer to the original repository for full details on the implementation.
When using this repository, please adhere to the original license terms of the GenBio AI code. This license can be found in this directory as `LICENSE`.
# Original Repository
The original AIDO.RNA models and code are available at: https://github.com/genbio-ai/ModelGenerator