Update README.md
Browse files
README.md
CHANGED
|
@@ -14,4 +14,18 @@ tags:
|
|
| 14 |
- biology
|
| 15 |
- single-cell
|
| 16 |
- transcriptomics
|
| 17 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
- biology
|
| 15 |
- single-cell
|
| 16 |
- transcriptomics
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# Geneformer-CAB: Benchmarking Scale and Architecture in Foundation Models for Single-Cell Transcriptomics
|
| 20 |
+
- Model Overview:
|
| 21 |
+
|
| 22 |
+
Geneformer-CAB (Cumulative-Assignment-Blocking) is a benchmarked variant of the Geneformer architecture for modeling single-cell transcriptomic data.
|
| 23 |
+
Rather than introducing an entirely new model, Geneformer-CAB systematically evaluates how data scale and architectural refinements interact to influence model generalization, predictive diversity, and robustness to batch effects.
|
| 24 |
+
|
| 25 |
+
- This model integrates two architectural enhancements:
|
| 26 |
+
|
| 27 |
+
1. Cumulative probability recalibration, which adjusts token-level prediction dynamics to reduce overconfident, frequency-driven outputs.
|
| 28 |
+
|
| 29 |
+
2. Similarity-based regularization, which penalizes redundant token predictions to promote diversity and alignment with rank-ordered gene expression profiles.
|
| 30 |
+
|
| 31 |
+
Together, these mechanisms provide insight into the limits of scale in single-cell foundation models — revealing that scaling up pretraining data does not always yield superior downstream performance.
|