|
|
--- |
|
|
license: other |
|
|
--- |
|
|
|
|
|
# How to use |
|
|
```python |
|
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
"Taykhoom/Helix-mRNA-Wrapper", |
|
|
trust_remote_code=True, |
|
|
) |
|
|
|
|
|
model = AutoModel.from_pretrained( |
|
|
"Taykhoom/Helix-mRNA-Wrapper", |
|
|
trust_remote_code=True, |
|
|
).eval() |
|
|
|
|
|
dna = "ACGUAGCAUCGGAUCUAUCUAUCGACACUUGGUUAUCGAUCUACGAGCAUCUCGUUAGC" |
|
|
inputs = tokenizer( |
|
|
dna, |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
padding="longest", |
|
|
max_length=tokenizer.model_max_length, |
|
|
return_special_tokens_mask=True, |
|
|
) |
|
|
|
|
|
special_tokens_mask = inputs["special_tokens_mask"] |
|
|
attention_mask = 1 - special_tokens_mask |
|
|
|
|
|
embedding = model( |
|
|
input_ids=inputs["input_ids"], |
|
|
attention_mask=attention_mask, |
|
|
).last_hidden_state # [1, sequence_length, 256] |
|
|
``` |
|
|
|
|
|
# Performance Vs Original Helix-mRNA Models |
|
|
|
|
|
Verify that the modified code produces the same embeddings as the original Helix-mRNA models. |
|
|
|
|
|
Original Helix-mRNA code snippet: |
|
|
```python |
|
|
from helical.models.helix_mrna import HelixmRNA, HelixmRNAConfig |
|
|
import torch |
|
|
|
|
|
input_sequences = ["ACGUAGCAUCGGAUCUAUCUAUCGACACUUGGUUAUCGAUCUACGAGCAUCUCGUUAGC"] |
|
|
|
|
|
helix_mrna_config = HelixmRNAConfig(batch_size=1) |
|
|
helix_mrna = HelixmRNA(configurer=helix_mrna_config) |
|
|
|
|
|
# prepare data for input to the model |
|
|
processed_input_data = helix_mrna.process_data(input_sequences) |
|
|
|
|
|
# generate the embeddings for the processed data |
|
|
embedding = torch.Tensor(helix_mrna.get_embeddings(processed_input_data)) |
|
|
|
|
|
embedding_mean = torch.mean(embedding, dim=1) # [1, 256] |
|
|
print(torch.mean(embedding_mean)) # Outputs tensor(-0.0033) |
|
|
|
|
|
embedding_max = torch.max(embedding, dim=1)[0] |
|
|
print(torch.mean(embedding_max)) # Outputs tensor(0.0989) |
|
|
|
|
|
``` |
|
|
|
|
|
Modified code snippet using the wrapper: |
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
"Taykhoom/Helix-mRNA-Wrapper", |
|
|
trust_remote_code=True, |
|
|
) |
|
|
|
|
|
model = AutoModel.from_pretrained( |
|
|
"Taykhoom/Helix-mRNA-Wrapper", |
|
|
trust_remote_code=True, |
|
|
).eval() |
|
|
|
|
|
dna = "ACGUAGCAUCGGAUCUAUCUAUCGACACUUGGUUAUCGAUCUACGAGCAUCUCGUUAGC" |
|
|
inputs = tokenizer( |
|
|
dna, |
|
|
return_tensors="pt", |
|
|
truncation=True, |
|
|
padding="longest", |
|
|
max_length=tokenizer.model_max_length, |
|
|
return_special_tokens_mask=True, |
|
|
) |
|
|
|
|
|
special_tokens_mask = inputs["special_tokens_mask"] |
|
|
attention_mask = 1 - special_tokens_mask |
|
|
|
|
|
embedding = model( |
|
|
input_ids=inputs["input_ids"], |
|
|
attention_mask=attention_mask, |
|
|
).last_hidden_state # [1, sequence_length, 256] |
|
|
|
|
|
embedding_mean = torch.mean(embedding, dim=1) |
|
|
print(torch.mean(embedding_mean)) # Outputs tensor(-0.0033, grad_fn=<MeanBackward0>) |
|
|
|
|
|
embedding_max = torch.max(embedding, dim=1)[0] |
|
|
print(torch.mean(embedding_max)) # Outputs tensor(0.0989, grad_fn=<MeanBackward0>) |
|
|
``` |
|
|
|
|
|
# License Notice |
|
|
This repository contains modified versions of Helical code. |
|
|
Modifications include: |
|
|
- Removal of reliance on helical package |
|
|
- Removal of some ease-of-use embedding generation code (to standardize usage) and other checks (see original repository for more details) |
|
|
|
|
|
Not all of the original functionality may be preserved. These changes were made to better integrate with the mRNABench framework which focuses on embedding generation for mRNA sequences. Most of the required code was directly copied from the original Helical repository with minimal changes, so please refer to the original repository for full details on the implementation. |
|
|
|
|
|
When using this repository, please adhere to the original license terms of the Helical code. This license can be found in this directory as `LICENSE`. |
|
|
|
|
|
# Original Repository |
|
|
The original Helical repository can be found at: https://github.com/helicalAI/helical |
|
|
|