Mamba3-2.7B / README.md
tvastr's picture
Upload README.md with huggingface_hub
7700768 verified
---
license: apache-2.0
base_model: state-spaces/mamba2-2.7b
tags:
- mamba
- mamba3
- ssm
- subsuminator
---
# Mamba3-2.7B (Alpha)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20374967.svg)](https://doi.org/10.5281/zenodo.20374967)
This model is a structurally transmutated version of **Mamba2-2.7B**, migrated to the **Mamba3** architecture using the Subsuminator framework.
πŸ”¬ **Alpha β€” base weights transmuted, instruction tuning not yet applied.**
CE ratio vs Mamba2-2.7B baseline: confirmed ≀1.05x on A10G (Actual Ratio: 1.0016x).
## Model Highlights
- **Architecture:** Mamba3 (SISO Trapezoidal + RoPE)
- **Scale:** 2.7 Billion Parameters
- **Training Phase:** Alpha (Pre-SFT Checkpoint)
- **Precision:** BFloat16
## Structural Changes Applied
- Temporal Convolution Folding
- SiLU Linearization via scalar `alpha`
- B/C RMSNorm expectation initialization
- Trap gate and RoPE zero-init
- Data-dependent A structural extraction
This model requires fine-tuning before it can be reliably used for text generation.
## Adapter Code
The nine-point weight mapping used to produce this checkpoint is fully open-sourced:
**[Rta-Forge/heists-galore](https://github.com/Rta-Forge/heists-galore)** β€” `mamba2-to-mamba3/`
Includes the weight adapter (`mamba3_adapter.py`), the empirical activation measurement rig (`empirical_fit.py`), and the CE verification harness (`check_ce.py`). No internal dependencies β€” runs against any standard `mamba-ssm` installation.
## Methodology paper
**[FORGEry: A Multi-Model Adversarial Research Methodology for Independent AI Researchers](https://zenodo.org/records/20374967)** β€” Zenodo preprint, CC BY 4.0.
DOI: [10.5281/zenodo.20374967](https://doi.org/10.5281/zenodo.20374967)