--- license: mit pipeline_tag: feature-extraction tags: - biology - Gene - Protein - GO - MLM - Gene function - Gene Ontology - DAG - Protein function --- ## Model Details GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction. ### Model Description First encoder to capture relations among GO functions. Could generate GO function embedding for various biological applications that related to gene or gene products. For the Gene-GO function mapping database, please refer to our previous work UniEtnrezDB (UniEntrezGOA.zip at https://zenodo.org/records/13335548) ### Model Sources - **Repository:** https://github.com/MM-YY-WW/GoBERT - **Paper:** GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction. (AAAI-25) - **Demo:** https://gobert.nasy.moe/ ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoTokenizer, BertForPreTraining import torch repo_name = "MM-YY-WW/GoBERT" tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=False, trust_remote_code=True) model = BertForPreTraining.from_pretrained(repo_name) # Obtain function-level GoBERT Embedding: input_sequences = 'GO:0005739 GO:0005783 GO:0005829 GO:0006914 GO:0006915 GO:0006979 GO:0031966 GO:0051560' tokenized_input = tokenizer(input_sequences) input_tensor = torch.tensor(tokenized_input['input_ids']).unsqueeze(0) attention_mask = torch.tensor(tokenized_input['attention_mask']).unsqueeze(0) model.eval() with torch.no_grad(): outputs = model(input_ids=input_tensor, attention_mask=attention_mask, output_hidden_states=True) embedding = outputs.hidden_states[-1].squeeze(0).cpu().numpy() ``` ## Citation **BibTeX:** ```bibtex @inproceedings{miao2025gobert, title={GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction}, author={Miao, Yuwei and Guo, Yuzhi and Ma, Hehuan and Yan, Jingquan and Jiang, Feng and Liao, Rui and Huang, Junzhou}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={39}, number={1}, pages={622--630}, year={2025}, doi={10.1609/aaai.v39i1.32043} } ```