HoarfrostLab
/

LGv1_FunctionalClassifier

Text Classification

transfer-learning

Model card Files Files and versions

LGv1_FunctionalClassifier / README.md

adriennehoarfrost's picture

adriennehoarfrost

Upload folder using huggingface_hub

4f113eb verified 3 days ago

|

history blame contribute delete

2.27 kB

	---
	language:
	- en
	tags:
	- biology
	- dna
	- genomics
	- metagenomics
	- classifier
	- awd-lstm
	- transfer-learning
	license: mit
	pipeline_tag: text-classification
	library_name: pytorch
	---

	# LookingGlass Functional Classifier

	Classifies DNA reads into one of 1274 experimentally-validated functional annotations with 81.5% accuracy.

	This is a pure PyTorch implementation fine-tuned from the LookingGlass base model.

	## Links

	- Paper: [Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter](https://doi.org/10.1038/s41467-022-30070-8) (Nature Communications, 2022)
	- GitHub: [ahoarfrost/LookingGlass](https://github.com/ahoarfrost/LookingGlass)
	- Base Model: [HoarfrostLab/lookingglass-v1](https://huggingface.co/HoarfrostLab/lookingglass-v1)

	## Citation

	```bibtex
	@article{hoarfrost2022deep,
	title={Deep learning of a bacterial and archaeal universal language of life
	enables transfer learning and illuminates microbial dark matter},
	author={Hoarfrost, Adrienne and Aptekmann, Ariel and Farfanuk, Gaetan and Bromberg, Yana},
	journal={Nature Communications},
	volume={13},
	number={1},
	pages={2606},
	year={2022},
	publisher={Nature Publishing Group}
	}
	```

	## Model

	\| \| \|
	\|---\|---\|
	\| Architecture \| LookingGlass encoder + classification head \|
	\| Encoder \| AWD-LSTM (3-layer, unidirectional) \|
	\| Classes \| 1274 functional annotation classes \|
	\| Parameters \| ~17M \|

	## Installation

	```bash
	pip install torch
	git clone https://huggingface.co/HoarfrostLab/LGv1_FunctionalClassifier
	cd LGv1_FunctionalClassifier
	```

	## Usage

	```python
	from lookingglass_classifier import LookingGlassClassifier, LookingGlassTokenizer

	model = LookingGlassClassifier.from_pretrained('.')
	tokenizer = LookingGlassTokenizer()
	model.eval()

	inputs = tokenizer(["GATTACA", "ATCGATCGATCG"], return_tensors=True)

	# Get predictions
	predictions = model.predict(inputs['input_ids'])
	print(predictions) # tensor([class_idx, class_idx])

	# Get probabilities
	probs = model.predict_proba(inputs['input_ids'])
	print(probs.shape) # torch.Size([2, 1274])

	# Get raw logits
	logits = model(inputs['input_ids'])
	print(logits.shape) # torch.Size([2, 1274])
	```

	## License

	MIT License