File size: 2,347 Bytes
2eef7fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
language:
- en
tags:
- biology
- dna
- genomics
- metagenomics
- classifier
- awd-lstm
- transfer-learning
license: mit
pipeline_tag: text-classification
library_name: pytorch
---

# LookingGlass Reading Frame Classifier

Identifies the correct reading frame start position (1, 2, 3, -1, -2, or -3) for DNA reads. Note: currently only intended for prokaryotic sequences with low proportions of noncoding DNA.

This is a **pure PyTorch implementation** fine-tuned from the LookingGlass base model.

## Links

- **Paper**: [Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter](https://doi.org/10.1038/s41467-022-30070-8) (Nature Communications, 2022)
- **GitHub**: [ahoarfrost/LookingGlass](https://github.com/ahoarfrost/LookingGlass)
- **Base Model**: [HoarfrostLab/lookingglass-v1](https://huggingface.co/HoarfrostLab/lookingglass-v1)

## Citation

```bibtex
@article{hoarfrost2022deep,
  title={Deep learning of a bacterial and archaeal universal language of life
         enables transfer learning and illuminates microbial dark matter},
  author={Hoarfrost, Adrienne and Aptekmann, Ariel and Farfanuk, Gaetan and Bromberg, Yana},
  journal={Nature Communications},
  volume={13},
  number={1},
  pages={2606},
  year={2022},
  publisher={Nature Publishing Group}
}
```

## Model

| | |
|---|---|
| Architecture | LookingGlass encoder + classification head |
| Encoder | AWD-LSTM (3-layer, unidirectional) |
| Classes | 6 classes: 1, 2, 3, -1, -2, -3 |
| Parameters | ~17M |

## Installation

```bash
pip install torch
git clone https://huggingface.co/HoarfrostLab/LGv1_ReadingFrameClassifier
cd LGv1_ReadingFrameClassifier
```

## Usage

```python
from lookingglass_classifier import LookingGlassClassifier, LookingGlassTokenizer

model = LookingGlassClassifier.from_pretrained('.')
tokenizer = LookingGlassTokenizer()
model.eval()

inputs = tokenizer(["GATTACA", "ATCGATCGATCG"], return_tensors=True)

# Get predictions
predictions = model.predict(inputs['input_ids'])
print(predictions)  # tensor([class_idx, class_idx])

# Get probabilities
probs = model.predict_proba(inputs['input_ids'])
print(probs.shape)  # torch.Size([2, 6])

# Get raw logits
logits = model(inputs['input_ids'])
print(logits.shape)  # torch.Size([2, 6])
```

## License

MIT License