ZhejiangLab
/

Gengram-torch

Model card Files Files and versions

Gengram-torch / README.md

zhejianglab-ospo's picture

zhejianglab-ospo

Create README.md

c0b1af4 verified 13 days ago

|

history blame contribute delete

1.57 kB

	---
	license: apache-2.0
	tags:
	- biology
	---
	# Gengram-10B-torch


	This repository hosts the model weights for Gengram-10B-torch. For instructions and details, please refer to the [Gengram GitHub](https://github.com/zhejianglab/Gengram).

	Gengram is a novel conditional memory module designed for genomic foundation models (GFMs) that introduces explicit motif memory retrieval to enhance Transformer-based DNA sequence modeling. Unlike traditional GFMs that rely on dense computation to implicitly infer multi-nucleotide motifs, Gengram provides an efficient lookup mechanism for biological patterns through a genomic-specific hashing scheme.

	### ✨ Key Features

	- 🎯 Explicit Motif Memory: Stores and retrieves k-mers (k=1-6) via hash-based lookup tables
	- 🧬 Local Window Aggregation: 21bp window mechanism aligned with DNA helical structure
	- ⚡ Computational Efficiency: Linear time complexity with minimal overhead
	- 🔧 Architecture Agnostic: Compatible with various attention mechanisms (MHA, GQA, MLA)
	- ⚖️ Stable Training: Improves load balancing in Mixture-of-Experts models
	- 🔍 Biological Interpretability: Learns meaningful motif representations

	### ✨ Biological Interpretability

	- Reverse-complement symmetry in memory embeddings
	- Context-dependent gating aligned with functional regions
	- Hierarchical representation from shallow to deep layers

	For full documentation, training details, and usage instructions, please visit the [GitHub]((https://github.com/zhejianglab/Gengram)) repository.