| --- |
| license: apache-2.0 |
| base_model: state-spaces/mamba2-2.7b |
| tags: |
| - mamba |
| - mamba3 |
| - ssm |
| - subsuminator |
| --- |
| |
| # Mamba3-2.7B (Alpha) |
|
|
| [](https://doi.org/10.5281/zenodo.20374967) |
|
|
| This model is a structurally transmutated version of **Mamba2-2.7B**, migrated to the **Mamba3** architecture using the Subsuminator framework. |
|
|
| π¬ **Alpha β base weights transmuted, instruction tuning not yet applied.** |
| CE ratio vs Mamba2-2.7B baseline: confirmed β€1.05x on A10G (Actual Ratio: 1.0016x). |
|
|
| ## Model Highlights |
| - **Architecture:** Mamba3 (SISO Trapezoidal + RoPE) |
| - **Scale:** 2.7 Billion Parameters |
| - **Training Phase:** Alpha (Pre-SFT Checkpoint) |
| - **Precision:** BFloat16 |
|
|
| ## Structural Changes Applied |
| - Temporal Convolution Folding |
| - SiLU Linearization via scalar `alpha` |
| - B/C RMSNorm expectation initialization |
| - Trap gate and RoPE zero-init |
| - Data-dependent A structural extraction |
|
|
| This model requires fine-tuning before it can be reliably used for text generation. |
|
|
| ## Adapter Code |
|
|
| The nine-point weight mapping used to produce this checkpoint is fully open-sourced: |
|
|
| **[Rta-Forge/heists-galore](https://github.com/Rta-Forge/heists-galore)** β `mamba2-to-mamba3/` |
|
|
| Includes the weight adapter (`mamba3_adapter.py`), the empirical activation measurement rig (`empirical_fit.py`), and the CE verification harness (`check_ce.py`). No internal dependencies β runs against any standard `mamba-ssm` installation. |
|
|
| ## Methodology paper |
|
|
| **[FORGEry: A Multi-Model Adversarial Research Methodology for Independent AI Researchers](https://zenodo.org/records/20374967)** β Zenodo preprint, CC BY 4.0. |
|
|
| DOI: [10.5281/zenodo.20374967](https://doi.org/10.5281/zenodo.20374967) |
|
|