metadata
license: apache-2.0
tags:
- biology
Gengram-10B-torch
This repository hosts the model weights for Gengram-10B-torch. For instructions and details, please refer to the Gengram GitHub.
Gengram is a novel conditional memory module designed for genomic foundation models (GFMs) that introduces explicit motif memory retrieval to enhance Transformer-based DNA sequence modeling. Unlike traditional GFMs that rely on dense computation to implicitly infer multi-nucleotide motifs, Gengram provides an efficient lookup mechanism for biological patterns through a genomic-specific hashing scheme.
✨ Key Features
- 🎯 Explicit Motif Memory: Stores and retrieves k-mers (k=1-6) via hash-based lookup tables
- 🧬 Local Window Aggregation: 21bp window mechanism aligned with DNA helical structure
- ⚡ Computational Efficiency: Linear time complexity with minimal overhead
- 🔧 Architecture Agnostic: Compatible with various attention mechanisms (MHA, GQA, MLA)
- ⚖️ Stable Training: Improves load balancing in Mixture-of-Experts models
- 🔍 Biological Interpretability: Learns meaningful motif representations
✨ Biological Interpretability
- Reverse-complement symmetry in memory embeddings
- Context-dependent gating aligned with functional regions
- Hierarchical representation from shallow to deep layers
For full documentation, training details, and usage instructions, please visit the GitHub repository.