--- language: - en tags: - biology - dna - genomics - metagenomics - classifier - awd-lstm - transfer-learning license: mit pipeline_tag: text-classification library_name: pytorch --- # LookingGlass Functional Classifier Classifies DNA reads into one of 1274 experimentally-validated functional annotations with 81.5% accuracy. This is a **pure PyTorch implementation** fine-tuned from the LookingGlass base model. ## Links - **Paper**: [Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter](https://doi.org/10.1038/s41467-022-30070-8) (Nature Communications, 2022) - **GitHub**: [ahoarfrost/LookingGlass](https://github.com/ahoarfrost/LookingGlass) - **Base Model**: [HoarfrostLab/lookingglass-v1](https://huggingface.co/HoarfrostLab/lookingglass-v1) ## Citation ```bibtex @article{hoarfrost2022deep, title={Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter}, author={Hoarfrost, Adrienne and Aptekmann, Ariel and Farfanuk, Gaetan and Bromberg, Yana}, journal={Nature Communications}, volume={13}, number={1}, pages={2606}, year={2022}, publisher={Nature Publishing Group} } ``` ## Model | | | |---|---| | Architecture | LookingGlass encoder + classification head | | Encoder | AWD-LSTM (3-layer, unidirectional) | | Classes | 1274 functional annotation classes | | Parameters | ~17M | ## Installation ```bash pip install torch git clone https://huggingface.co/HoarfrostLab/LGv1_FunctionalClassifier cd LGv1_FunctionalClassifier ``` ## Usage ```python from lookingglass_classifier import LookingGlassClassifier, LookingGlassTokenizer model = LookingGlassClassifier.from_pretrained('.') tokenizer = LookingGlassTokenizer() model.eval() inputs = tokenizer(["GATTACA", "ATCGATCGATCG"], return_tensors=True) # Get predictions predictions = model.predict(inputs['input_ids']) print(predictions) # tensor([class_idx, class_idx]) # Get probabilities probs = model.predict_proba(inputs['input_ids']) print(probs.shape) # torch.Size([2, 1274]) # Get raw logits logits = model(inputs['input_ids']) print(logits.shape) # torch.Size([2, 1274]) ``` ## License MIT License