Upload README.md with huggingface_hub

7700768 verified 7 days ago

1.74 kB

license: apache-2.0
base_model: state-spaces/mamba2-2.7b
tags:
  - mamba
  - mamba3
  - ssm
  - subsuminator

Mamba3-2.7B (Alpha)

This model is a structurally transmutated version of Mamba2-2.7B, migrated to the Mamba3 architecture using the Subsuminator framework.

🔬 Alpha — base weights transmuted, instruction tuning not yet applied. CE ratio vs Mamba2-2.7B baseline: confirmed ≤1.05x on A10G (Actual Ratio: 1.0016x).

Model Highlights

Architecture: Mamba3 (SISO Trapezoidal + RoPE)
Scale: 2.7 Billion Parameters
Training Phase: Alpha (Pre-SFT Checkpoint)
Precision: BFloat16

Structural Changes Applied

Temporal Convolution Folding
SiLU Linearization via scalar alpha
B/C RMSNorm expectation initialization
Trap gate and RoPE zero-init
Data-dependent A structural extraction

This model requires fine-tuning before it can be reliably used for text generation.

Adapter Code

The nine-point weight mapping used to produce this checkpoint is fully open-sourced:

Rta-Forge/heists-galore — mamba2-to-mamba3/

Includes the weight adapter (mamba3_adapter.py), the empirical activation measurement rig (empirical_fit.py), and the CE verification harness (check_ce.py). No internal dependencies — runs against any standard mamba-ssm installation.

Methodology paper

FORGEry: A Multi-Model Adversarial Research Methodology for Independent AI Researchers — Zenodo preprint, CC BY 4.0.

DOI: 10.5281/zenodo.20374967