--- language: - dna tags: - biology - genomics - transposable-elements - dnabert - bilstm - sequence-classification license: mit --- # TE-GER — Order Classification Part of the **TE-GER** (Transposable Elements Genomic Entity Recognition) toolkit. TE-GER order classification model: classifies Transposable Elements by order (DIRS, HELITRON, LINE, LTR, PLE, SINE, TIR) in genomic sequences. Architecture: DNABERT-2 + BiLSTM hybrid. ## Model Architecture - **Base:** [DNABERT-2](https://huggingface.co/zhihan1996/DNABERT-2-117M) (DNA language model) - **Head:** Bidirectional LSTM + Linear Classifier - **Input:** 512 bp sliding windows over raw FASTA sequences - **Task:** Sequence classification (token-level TE annotation) ## Usage Use this model via the [TE-GER CLI](https://github.com/johanpina/te-ger): ```bash python Te_annotator.py genome.fasta output.gff3 --level order ``` ## Labels - `0`: Background - `1`: DIRS - `2`: HELITRON - `3`: LINE - `4`: LTR - `5`: PLE - `6`: SINE - `7`: TIR ## Citation Developed by Johan S. Piña — 2025