Gengram-torch / README.md
zhejianglab-ospo's picture
Create README.md
c0b1af4 verified
metadata
license: apache-2.0
tags:
  - biology

Gengram-10B-torch

This repository hosts the model weights for Gengram-10B-torch. For instructions and details, please refer to the Gengram GitHub.

Gengram is a novel conditional memory module designed for genomic foundation models (GFMs) that introduces explicit motif memory retrieval to enhance Transformer-based DNA sequence modeling. Unlike traditional GFMs that rely on dense computation to implicitly infer multi-nucleotide motifs, Gengram provides an efficient lookup mechanism for biological patterns through a genomic-specific hashing scheme.

✨ Key Features

  • 🎯 Explicit Motif Memory: Stores and retrieves k-mers (k=1-6) via hash-based lookup tables
  • 🧬 Local Window Aggregation: 21bp window mechanism aligned with DNA helical structure
  • ⚡ Computational Efficiency: Linear time complexity with minimal overhead
  • 🔧 Architecture Agnostic: Compatible with various attention mechanisms (MHA, GQA, MLA)
  • ⚖️ Stable Training: Improves load balancing in Mixture-of-Experts models
  • 🔍 Biological Interpretability: Learns meaningful motif representations

✨ Biological Interpretability

  • Reverse-complement symmetry in memory embeddings
  • Context-dependent gating aligned with functional regions
  • Hierarchical representation from shallow to deep layers

For full documentation, training details, and usage instructions, please visit the GitHub repository.