BarcodeBERT / README.md
gwtaylor's picture
Add pipeline tag, library name, and links to paper and code (#2)
7d23648 verified
---
language:
- en
license: mit
pipeline_tag: feature-extraction
library_name: transformers
---
# BarcodeBERT for Taxonomic Classification
A pre-trained transformer model for inference on insect DNA barcoding data, as presented in the paper [BarcodeBERT: Transformers for Biodiversity Analysis](https://huggingface.co/papers/2311.02401).
Code: https://github.com/bioscan-ml/BarcodeBERT
[Colab](https://colab.research.google.com/drive/1MUEQVHIOX2ks7tLsMoQtNlbvsbSuYgs1)
To use **BarcodeBERT** as a feature extractor:
```python
from transformers import AutoTokenizer, BertForTokenClassification
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"bioscan-ml/BarcodeBERT", trust_remote_code=True
)
# Load the model
model = BertForTokenClassification.from_pretrained("bioscan-ml/BarcodeBERT", trust_remote_code=True)
# Sample sequence
dna_seq = "ACGCGCTGACGCATCAGCATACGA"
# Tokenize
input_seq = tokenizer(dna_seq, return_tensors="pt")["input_ids"]
# Pass through the model
output = model(input_seq.unsqueeze(0))["hidden_states"][-1]
# Compute Global Average Pooling
features = output.mean(1)
```
## Citation
If you find BarcodeBERT useful in your research please consider citing:
@misc{arias2023barcodebert,
title={{BarcodeBERT}: Transformers for Biodiversity Analysis},
author={Pablo Millan Arias
and Niousha Sadjadi
and Monireh Safari
and ZeMing Gong
and Austin T. Wang
and Scott C. Lowe
and Joakim Bruslund Haurum
and Iuliia Zarubiieva
and Dirk Steinke
and Lila Kari
and Angel X. Chang
and Graham W. Taylor
},
year={2023},
eprint={2311.02401},
archivePrefix={arXiv},
primaryClass={cs.LG},
doi={10.48550/arxiv.2311.02401},
}