RtaForge
/

Mamba3-2.7B

Model card Files Files and versions

Mamba3-2.7B / README.md

tvastr's picture

Upload README.md with huggingface_hub

7700768 verified 7 days ago

|

history blame contribute delete

1.74 kB

	---
	license: apache-2.0
	base_model: state-spaces/mamba2-2.7b
	tags:
	- mamba
	- mamba3
	- ssm
	- subsuminator
	---

	# Mamba3-2.7B (Alpha)

	[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20374967.svg)](https://doi.org/10.5281/zenodo.20374967)

	This model is a structurally transmutated version of Mamba2-2.7B, migrated to the Mamba3 architecture using the Subsuminator framework.

	🔬 Alpha — base weights transmuted, instruction tuning not yet applied.
	CE ratio vs Mamba2-2.7B baseline: confirmed ≤1.05x on A10G (Actual Ratio: 1.0016x).

	## Model Highlights
	- Architecture: Mamba3 (SISO Trapezoidal + RoPE)
	- Scale: 2.7 Billion Parameters
	- Training Phase: Alpha (Pre-SFT Checkpoint)
	- Precision: BFloat16

	## Structural Changes Applied
	- Temporal Convolution Folding
	- SiLU Linearization via scalar `alpha`
	- B/C RMSNorm expectation initialization
	- Trap gate and RoPE zero-init
	- Data-dependent A structural extraction

	This model requires fine-tuning before it can be reliably used for text generation.

	## Adapter Code

	The nine-point weight mapping used to produce this checkpoint is fully open-sourced:

	[Rta-Forge/heists-galore](https://github.com/Rta-Forge/heists-galore) — `mamba2-to-mamba3/`

	Includes the weight adapter (`mamba3_adapter.py`), the empirical activation measurement rig (`empirical_fit.py`), and the CE verification harness (`check_ce.py`). No internal dependencies — runs against any standard `mamba-ssm` installation.

	## Methodology paper

	[FORGEry: A Multi-Model Adversarial Research Methodology for Independent AI Researchers](https://zenodo.org/records/20374967) — Zenodo preprint, CC BY 4.0.

	DOI: [10.5281/zenodo.20374967](https://doi.org/10.5281/zenodo.20374967)