|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- biology |
|
|
- genomics |
|
|
- dna-sequence |
|
|
- bacterial-classification |
|
|
- bert |
|
|
- transformers |
|
|
--- |
|
|
|
|
|
# BBERT Pre-trained Models |
|
|
|
|
|
Pre-trained models for [BBERT](https://github.com/AmirErez/BBERT) - BERT for Bacterial DNA Classification. |
|
|
|
|
|
## Models Included |
|
|
|
|
|
### 1. BBERT Transformer (`bbert_checkpoint-32500/`) |
|
|
- Main BERT-based model trained on bacterial DNA sequences |
|
|
- Hidden size: 768 |
|
|
- Trained on diverse bacterial genomes |
|
|
|
|
|
### 2. Bacterial Classifier (`bacterial_classifier/epoch_80.pt`) |
|
|
- Binary classifier for bacterial vs. non-bacterial sequences |
|
|
- Input: BBERT embeddings (768-dim) |
|
|
- Trained for 80 epochs on 3.9M sequences |
|
|
|
|
|
### 3. Reading Frame Classifier (`frame_classifier/classifier_model_2000K_37e.pth`) |
|
|
- 6-way classifier for reading frame prediction |
|
|
- Frames: +1, +2, +3, -1, -2, -3 |
|
|
- Trained for 37 epochs on 2M sequences |
|
|
|
|
|
### 4. Coding Sequence Classifier (`coding_classifier/epoch_46.pt`) |
|
|
- Binary classifier for coding vs. non-coding sequences |
|
|
- Trained for 46 epochs on 3.9M sequences |
|
|
|
|
|
## Usage |
|
|
|
|
|
These models are automatically downloaded when using BBERT: |
|
|
|
|
|
\`\`\`bash |
|
|
# First time setup |
|
|
pip install bbert # or clone from GitHub |
|
|
python source/download_models.py |
|
|
|
|
|
# Then use normally |
|
|
python bbert.py your_sequences.fasta --output_dir results |
|
|
\`\`\` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use BBERT, please cite: |
|
|
[Add your citation here] |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License - see LICENSE file for details |
|
|
|