|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- biology |
|
|
- dna |
|
|
- genomics |
|
|
- metagenomics |
|
|
- classifier |
|
|
- awd-lstm |
|
|
- transfer-learning |
|
|
license: mit |
|
|
pipeline_tag: text-classification |
|
|
library_name: pytorch |
|
|
--- |
|
|
|
|
|
# LookingGlass Optimal Temperature Classifier |
|
|
|
|
|
Identifies whether a DNA read originates from an enzyme with a psychrophilic (<15°C), mesophilic (20-40°C), or thermophilic (>50°C) optimal temperature with 70.1% accuracy. |
|
|
|
|
|
This is a **pure PyTorch implementation** fine-tuned from the LookingGlass base model. |
|
|
|
|
|
## Links |
|
|
|
|
|
- **Paper**: [Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter](https://doi.org/10.1038/s41467-022-30070-8) (Nature Communications, 2022) |
|
|
- **GitHub**: [ahoarfrost/LookingGlass](https://github.com/ahoarfrost/LookingGlass) |
|
|
- **Base Model**: [HoarfrostLab/lookingglass-v1](https://huggingface.co/HoarfrostLab/lookingglass-v1) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{hoarfrost2022deep, |
|
|
title={Deep learning of a bacterial and archaeal universal language of life |
|
|
enables transfer learning and illuminates microbial dark matter}, |
|
|
author={Hoarfrost, Adrienne and Aptekmann, Ariel and Farfanuk, Gaetan and Bromberg, Yana}, |
|
|
journal={Nature Communications}, |
|
|
volume={13}, |
|
|
number={1}, |
|
|
pages={2606}, |
|
|
year={2022}, |
|
|
publisher={Nature Publishing Group} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model |
|
|
|
|
|
| | | |
|
|
|---|---| |
|
|
| Architecture | LookingGlass encoder + classification head | |
|
|
| Encoder | AWD-LSTM (3-layer, unidirectional) | |
|
|
| Classes | 3 classes: psychrophilic, mesophilic, thermophilic | |
|
|
| Parameters | ~17M | |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install torch |
|
|
git clone https://huggingface.co/HoarfrostLab/LGv1_OptimalTempClassifier |
|
|
cd LGv1_OptimalTempClassifier |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from lookingglass_classifier import LookingGlassClassifier, LookingGlassTokenizer |
|
|
|
|
|
model = LookingGlassClassifier.from_pretrained('.') |
|
|
tokenizer = LookingGlassTokenizer() |
|
|
model.eval() |
|
|
|
|
|
inputs = tokenizer(["GATTACA", "ATCGATCGATCG"], return_tensors=True) |
|
|
|
|
|
# Get predictions |
|
|
predictions = model.predict(inputs['input_ids']) |
|
|
print(predictions) # tensor([class_idx, class_idx]) |
|
|
|
|
|
# Get probabilities |
|
|
probs = model.predict_proba(inputs['input_ids']) |
|
|
print(probs.shape) # torch.Size([2, 3]) |
|
|
|
|
|
# Get raw logits |
|
|
logits = model(inputs['input_ids']) |
|
|
print(logits.shape) # torch.Size([2, 3]) |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |
|
|
|