TE-GER — Superfamily Classification

Part of the TE-GER (Transposable Elements Genomic Entity Recognition) toolkit.

TE-GER superfamily classification model: fine-grained TE annotation across 21 superfamilies (Gypsy, Copia, Mutator, HAT, etc.) in genomic sequences. Architecture: DNABERT-2 + BiLSTM hybrid.

Model Architecture

  • Base: DNABERT-2 (DNA language model)
  • Head: Bidirectional LSTM + Linear Classifier
  • Input: 512 bp sliding windows over raw FASTA sequences
  • Task: Sequence classification (token-level TE annotation)

Usage

Use this model via the TE-GER CLI:

python Te_annotator.py genome.fasta output.gff3 --level superfamilies

Labels

  • 0: Background
  • 1: ACADEM-1
  • 2: BELPAO
  • 3: CACTA
  • 4: COPIA
  • 5: CR1
  • 6: DIRS
  • 7: ERV
  • 8: GYPSY
  • 9: HAT
  • 10: HELITRON
  • 11: I
  • 12: KOLOBOK
  • 13: L1
  • 14: LARD
  • 15: LINE
  • 16: LTR
  • 17: MULE
  • 18: P
  • 19: PIFHARBINGER
  • 20: PIGGYBAC
  • 21: PLE
  • 22: R1
  • 23: RTE
  • 24: SINE
  • 25: TC1MARINER
  • 26: TIR
  • 27: TRNA

Citation

Developed by Johan S. Piña — 2025

Downloads last month
52
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support