| --- |
| language: |
| - dna |
| tags: |
| - biology |
| - genomics |
| - transposable-elements |
| - dnabert |
| - bilstm |
| - sequence-classification |
| license: mit |
| --- |
| |
| # TE-GER — Order Classification |
|
|
| Part of the **TE-GER** (Transposable Elements Genomic Entity Recognition) toolkit. |
|
|
| TE-GER order classification model: classifies Transposable Elements by order (DIRS, HELITRON, LINE, LTR, PLE, SINE, TIR) in genomic sequences. Architecture: DNABERT-2 + BiLSTM hybrid. |
|
|
| ## Model Architecture |
|
|
| - **Base:** [DNABERT-2](https://huggingface.co/zhihan1996/DNABERT-2-117M) (DNA language model) |
| - **Head:** Bidirectional LSTM + Linear Classifier |
| - **Input:** 512 bp sliding windows over raw FASTA sequences |
| - **Task:** Sequence classification (token-level TE annotation) |
|
|
| ## Usage |
|
|
| Use this model via the [TE-GER CLI](https://github.com/johanpina/te-ger): |
|
|
| ```bash |
| python Te_annotator.py genome.fasta output.gff3 --level order |
| ``` |
|
|
| ## Labels |
|
|
| - `0`: Background |
| - `1`: DIRS |
| - `2`: HELITRON |
| - `3`: LINE |
| - `4`: LTR |
| - `5`: PLE |
| - `6`: SINE |
| - `7`: TIR |
|
|
| ## Citation |
|
|
| Developed by Johan S. Piña — 2025 |
|
|