| # BerTELEO | |
| A bert model pre-trained on short DNA sequence the teleo marker from zhihan1996/DNABERT-2-117M | |
| use this model for teleo sequence emmebdding | |
| Paper not already release. | |
| How use : | |
| ```python | |
| from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM | |
| import torch | |
| model_id = "gustoudu81/BerTeleo" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModel.from_pretrained(model_id, trust_remote_code=True) | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| model = model.to(device).eval() | |
| inputs = tokenizer("ACGTACGTACGT", return_tensors="pt") | |
| inputs = {k: v.to(device) for k, v in inputs.items()} | |
| with torch.no_grad(): | |
| hidden_states = model(**inputs)[0] | |
| # embedding with mean pooling | |
| embedding_mean = torch.mean(hidden_states[0], dim=0) | |
| print(embedding_mean.shape) # expect to be 768 | |
| # embedding with max pooling | |
| embedding_max = torch.max(hidden_states[0], dim=0)[0] | |
| print(embedding_max.shape) # expect to be 768 | |
| ``` |