Mamba3-2.7B / README.md
tvastr's picture
Upload README.md with huggingface_hub
7700768 verified
metadata
license: apache-2.0
base_model: state-spaces/mamba2-2.7b
tags:
  - mamba
  - mamba3
  - ssm
  - subsuminator

Mamba3-2.7B (Alpha)

DOI

This model is a structurally transmutated version of Mamba2-2.7B, migrated to the Mamba3 architecture using the Subsuminator framework.

🔬 Alpha — base weights transmuted, instruction tuning not yet applied. CE ratio vs Mamba2-2.7B baseline: confirmed ≤1.05x on A10G (Actual Ratio: 1.0016x).

Model Highlights

  • Architecture: Mamba3 (SISO Trapezoidal + RoPE)
  • Scale: 2.7 Billion Parameters
  • Training Phase: Alpha (Pre-SFT Checkpoint)
  • Precision: BFloat16

Structural Changes Applied

  • Temporal Convolution Folding
  • SiLU Linearization via scalar alpha
  • B/C RMSNorm expectation initialization
  • Trap gate and RoPE zero-init
  • Data-dependent A structural extraction

This model requires fine-tuning before it can be reliably used for text generation.

Adapter Code

The nine-point weight mapping used to produce this checkpoint is fully open-sourced:

Rta-Forge/heists-galoremamba2-to-mamba3/

Includes the weight adapter (mamba3_adapter.py), the empirical activation measurement rig (empirical_fit.py), and the CE verification harness (check_ce.py). No internal dependencies — runs against any standard mamba-ssm installation.

Methodology paper

FORGEry: A Multi-Model Adversarial Research Methodology for Independent AI Researchers — Zenodo preprint, CC BY 4.0.

DOI: 10.5281/zenodo.20374967