File size: 1,800 Bytes
2fa1eca
 
 
7d23648
 
 
2fa1eca
 
 
 
7d23648
 
 
2fa1eca
2c0808a
ce0a983
2fa1eca
 
 
a8650f6
2fa1eca
 
4025bf2
 
 
2fa1eca
4025bf2
a8650f6
2fa1eca
 
4025bf2
2fa1eca
 
4025bf2
2fa1eca
 
4025bf2
2fa1eca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7d23648
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
language:
- en
license: mit
pipeline_tag: feature-extraction
library_name: transformers
---

# BarcodeBERT for Taxonomic Classification

A pre-trained transformer model for inference on insect DNA barcoding data, as presented in the paper [BarcodeBERT: Transformers for Biodiversity Analysis](https://huggingface.co/papers/2311.02401).

Code: https://github.com/bioscan-ml/BarcodeBERT

[Colab](https://colab.research.google.com/drive/1MUEQVHIOX2ks7tLsMoQtNlbvsbSuYgs1)

To use **BarcodeBERT** as a feature extractor:

```python
from transformers import AutoTokenizer, BertForTokenClassification

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "bioscan-ml/BarcodeBERT", trust_remote_code=True
)

# Load the model
model = BertForTokenClassification.from_pretrained("bioscan-ml/BarcodeBERT", trust_remote_code=True)

# Sample sequence
dna_seq = "ACGCGCTGACGCATCAGCATACGA"

# Tokenize
input_seq = tokenizer(dna_seq, return_tensors="pt")["input_ids"]

# Pass through the model
output = model(input_seq.unsqueeze(0))["hidden_states"][-1]

# Compute Global Average Pooling 
features = output.mean(1)
```

## Citation 

If you find BarcodeBERT useful in your research please consider citing:

    @misc{arias2023barcodebert,
      title={{BarcodeBERT}: Transformers for Biodiversity Analysis},
      author={Pablo Millan Arias
        and Niousha Sadjadi
        and Monireh Safari
        and ZeMing Gong
        and Austin T. Wang
        and Scott C. Lowe
        and Joakim Bruslund Haurum
        and Iuliia Zarubiieva
        and Dirk Steinke
        and Lila Kari
        and Angel X. Chang
        and Graham W. Taylor
      },
      year={2023},
      eprint={2311.02401},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      doi={10.48550/arxiv.2311.02401},
    }