Spaces:
Running
Running
| title: README | |
| emoji: π | |
| colorFrom: pink | |
| colorTo: blue | |
| sdk: static | |
| pinned: false | |
| # Neural Bioinformatics Research Group - ProkBERT Models | |
| Welcome to the official Hugging Face organization for the Neural Bioinformatics Research Group. Our main goal is to provide genomic language models for microbiome applications. | |
| ## Models | |
| We provide access to a collection of pretrained and fine-tuned models from the ProkBERT family. These models are built on the Local Context Aware (LCA) tokenization, specifically tailored for DNA sequences to balance context size and performance. | |
| ProkBERT models are designed for microbiome-related tasks, such as prokaryote promoter identification or phage detection. Despite their compact size, they are powerful and efficient. | |
| ## Model Overview | |
| | Model | Parameters | Tokenizer | Layers | Attention Heads | Max. Context Size | Training Data | | |
| |---------------|------------|------------------|--------|-----------------|-------------------|---------------------| | |
| | `mini` | 20.6M | 6-mer, shift=1 | 6 | 6 | 1027 nt | 206.65 billion | | |
| | `mini-c` | 24.9M | 1-mer | 6 | 6 | 1022 nt | 206.65 billion | | |
| | `mini-long` | 26.6M | 6-mer, shift=2 | 6 | 6 | 4096 nt | 206.65 billion | | |
| _A comprehensive overview of model parameters across varied configurations._ | |
| ## Resources | |
| - [Read our paper](https://www.frontiersin.org/articles/10.3389/fmicb.2023.1331233/full) | |
| - [Learn more about the model](https://github.com/nbrg-ppcu/prokbert) | |
| - [Get started with code on GitHub](https://github.com/nbrg-ppcu/prokbert/tree/main?tab=readme-ov-file#tutorials-and-examples) | |
| --- | |
| For more information or questions, please visit our [GitHub repository](https://github.com/nbrg-ppcu/prokbert) or contact us at [email](obalasz@gmail.com). | |