Slow-Mode Workspace (SMW)

A research scaffold for a falsifiable architectural hypothesis in Transformer language models. There are no trained weights yet — this card describes the architecture and is published so the community can critique, fork, and contribute training runs.

Hypothesis

Catastrophic forgetting, length-extrapolation failure, and uniform-compute miscalibration in Transformers may share a root cause: the absence of a conserved slow-mode subspace across the residual flow.

We propose a specific architectural instantiation (the Slow-Mode Workspace, SMW) and pre-register three falsifiable predictions. See the README for full prior-art positioning.

Architecture

A decoder-only Transformer with one structural addition: a shared Koopman-spectral projector S inserted every k blocks, satisfying:

Stiefel orthogonality (S^T S ≈ I)
Scale-equivariance (S(λx) ≈ λ^Δ S(x))
Contractive propagator (ρ(K = S^T A S) < 1)

The same S serves as:

Global-Workspace bottleneck across modules
Slow-mode subspace for episodic-cortical consolidation
Depth-wise renormalization-group projector

A small per-token metacognitive controller M outputs a continuous soft mask over S's eigenbands.

Intended use

Research only. Not for production deployment.
Fine-tuning experiments by the community on small-scale (≤1.3B param) Transformers.
Replication of the 5+1 ablation factorial described in the pre-registration.

Limitations

No trained weights are provided in this version.
The implementation is unoptimized for speed (no FlashAttention integration, no FSDP support yet).
The pre-registered predictions (P1, P2, P3) have not been tested.

How to use (architecture only)

from smw import SMWModel, SMWConfig, CONDITIONS

cfg = CONDITIONS["C5_smw"]
cfg.vocab_size = 32000
cfg.d_model = 768
cfg.n_layers = 12
cfg.d_w = 64
cfg.block_size = 1024

model = SMWModel(cfg)
# train as usual — model.regularizer(x) returns the SMW constraint loss term

Ablation conditions

ID	Description
`C0_baseline`	Vanilla decoder-only
`C1_gw_only`	Workspace bottleneck only
`C2_episodic_only`	Episodic ledger only
`C3_rg_only`	Scale-equivariance constraint only
`C4_all_independent`	All three components, separate operators
`C5_smw`	All three components, one shared operator (SMW)

The critical comparison is C5 vs. C4.

Pre-registered predictions

Pred	Description	Diagnostic in code
P1	Slow-band mass `≥ 0.3` after training	`model.slow_band_mass()`
P2	Replay manifold participation ratio in 20–60 dims; ablating S abolishes workspace AND replay jointly	(probe scripts forthcoming)
P3	Mask entropy ↑ monotonically with item difficulty, no saturation	`model.last_mask_entropy()`

Citation

@misc{shkhina2026smw,
  title  = {{Slow-Mode Workspace}: A Koopman-Spectral Instantiation of Latent-Memory Transformer Architectures},
  author = {{Shkhina AI Labs}},
  year   = {2026},
  url    = {https://huggingface.co/shkhina-ai-labs/slow-mode-workspace}
}

Shkhina-AI-Labs
/

slow-mode-workspace