Add nano-Geneformer as a community reference implementation

#588

by nqhuya - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/588

Discussion Files changed

+12

-1

nqhuya

3 days ago

What is nano-Geneformer?

I recently built nano-Geneformer, a lightweight and faithful reimplementation of Geneformer designed to make the core implementation easier to read, reproduce, benchmark, and extend while preserving the original architecture and inference behavior.

Repository:
https://github.com/huynguyen250896/nano-Geneformer

Highlights

Supports all official Geneformer checkpoints (V1, V2-104M, V2-104M_CLcancer, and V2-316M)
Faithfully reproduces the original Geneformer architecture and inference pipeline
Cleaner, modern PyTorch implementation with simplified installation and dependency management
Suitable for learning, benchmarking, experimentation, fine-tuning, and future training from scratch

Validation

I carefully benchmarked nano-Geneformer against the official implementation to ensure it can serve as a practical community reference implementation.

Compared with the official implementation, nano-Geneformer:

reduces peak GPU memory by up to 56.8% for the largest Geneformer model (V2-316M)
achieves 1.06–1.15× faster inference
reproduces cell embeddings with mean cosine similarity ≈ 1.000000
preserves local/global representation geometry and pairwise distance structure across all official checkpoints

The full benchmark notebook is available in the repository.

Why this PR?

The goal of nano-Geneformer is not to replace the official implementation, but to provide a lightweight community resource for users who want a smaller, easier-to-read implementation for learning, reproducibility, benchmarking, and research.

This PR only adds a link under Community Projects. It does not modify any code, pretrained models, datasets, checkpoints, or model behavior.

Thank you for your consideration.

Add nano-Geneformer as a community reference implementation252a7bbf

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment