JFLa
/

GF-CAB

Token Classification

transcriptomics

Model card Files Files and versions

JFLa commited on Oct 16, 2025

Commit

fd3e64f

·

verified ·

1 Parent(s): c31a0a9

Update README.md

Files changed (1) hide show

README.md +15 -1

README.md CHANGED Viewed

@@ -14,4 +14,18 @@ tags:
 - biology
 - single-cell
 - transcriptomics
----

 - biology
 - single-cell
 - transcriptomics
+---
+# Geneformer-CAB: Benchmarking Scale and Architecture in Foundation Models for Single-Cell Transcriptomics
+- Model Overview:
+Geneformer-CAB (Cumulative-Assignment-Blocking) is a benchmarked variant of the Geneformer architecture for modeling single-cell transcriptomic data.
+Rather than introducing an entirely new model, Geneformer-CAB systematically evaluates how data scale and architectural refinements interact to influence model generalization, predictive diversity, and robustness to batch effects.
+- This model integrates two architectural enhancements:
+1. Cumulative probability recalibration, which adjusts token-level prediction dynamics to reduce overconfident, frequency-driven outputs.
+2. Similarity-based regularization, which penalizes redundant token predictions to promote diversity and alignment with rank-ordered gene expression profiles.
+Together, these mechanisms provide insight into the limits of scale in single-cell foundation models — revealing that scaling up pretraining data does not always yield superior downstream performance.