File size: 2,270 Bytes
4f113eb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
language:
- en
tags:
- biology
- dna
- genomics
- metagenomics
- classifier
- awd-lstm
- transfer-learning
license: mit
pipeline_tag: text-classification
library_name: pytorch
---
# LookingGlass Functional Classifier
Classifies DNA reads into one of 1274 experimentally-validated functional annotations with 81.5% accuracy.
This is a **pure PyTorch implementation** fine-tuned from the LookingGlass base model.
## Links
- **Paper**: [Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter](https://doi.org/10.1038/s41467-022-30070-8) (Nature Communications, 2022)
- **GitHub**: [ahoarfrost/LookingGlass](https://github.com/ahoarfrost/LookingGlass)
- **Base Model**: [HoarfrostLab/lookingglass-v1](https://huggingface.co/HoarfrostLab/lookingglass-v1)
## Citation
```bibtex
@article{hoarfrost2022deep,
title={Deep learning of a bacterial and archaeal universal language of life
enables transfer learning and illuminates microbial dark matter},
author={Hoarfrost, Adrienne and Aptekmann, Ariel and Farfanuk, Gaetan and Bromberg, Yana},
journal={Nature Communications},
volume={13},
number={1},
pages={2606},
year={2022},
publisher={Nature Publishing Group}
}
```
## Model
| | |
|---|---|
| Architecture | LookingGlass encoder + classification head |
| Encoder | AWD-LSTM (3-layer, unidirectional) |
| Classes | 1274 functional annotation classes |
| Parameters | ~17M |
## Installation
```bash
pip install torch
git clone https://huggingface.co/HoarfrostLab/LGv1_FunctionalClassifier
cd LGv1_FunctionalClassifier
```
## Usage
```python
from lookingglass_classifier import LookingGlassClassifier, LookingGlassTokenizer
model = LookingGlassClassifier.from_pretrained('.')
tokenizer = LookingGlassTokenizer()
model.eval()
inputs = tokenizer(["GATTACA", "ATCGATCGATCG"], return_tensors=True)
# Get predictions
predictions = model.predict(inputs['input_ids'])
print(predictions) # tensor([class_idx, class_idx])
# Get probabilities
probs = model.predict_proba(inputs['input_ids'])
print(probs.shape) # torch.Size([2, 1274])
# Get raw logits
logits = model(inputs['input_ids'])
print(logits.shape) # torch.Size([2, 1274])
```
## License
MIT License
|