πŸŒ€ Adamba: Adaptive Mamba

Adaptive Mamba: Elastic compute with dynamic Matryoshka scaling

Project location unixsysdev/adamba

Available Checkpoints

Variant Parameters Dim Features Status Download
phase1_6b_base 6.4B 2048 mamba_integration βœ… Download
phase2_6b_matryoshka 6.4B 2048 matryoshka, early_exit ⏳ β€”
phase3_9b_matryoshka 9.3B 2560 matryoshka, early_exit ⏳ β€”
phase3_20b_matryoshka 20B 4096 matryoshka, early_exit ⏳ β€”
sft_20b 20B 4096 matryoshka, early_exit, sft ⏳ β€”
rl_20b 20B 4096 matryoshka, early_exit, rl_agent ⏳ β€”

Architecture Overview

Adamba combines three efficiency techniques:

Technique Implementation Purpose
Matryoshka (MRL) Width: 128 β†’ 4096 per layer Elastic compute
Early Exit ConfidenceGate per layer Skip when confident
Static SSM Mamba at full dim Stable memory backbone
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  PROMPT β†’ LayerDimPredictor β†’ [dim per layer]   β”‚
β”‚                                                 β”‚
β”‚  Attention + MLP: Dynamic (Matryoshka sliced)   β”‚
β”‚  Mamba:           Static (full dim)             β”‚
β”‚                                                 β”‚
β”‚  Gate > 0.95 β†’ EXIT EARLY                       β”‚
β”‚  Gate < 0.50 β†’ EXPAND remaining layers          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Training Pipeline

nanochat-d32 (1.9B)
    ↓ Surgery (add 32 Mamba layers)
Phase 1: 6.4B  (dim=2048)  ← Mamba integration
    ↓ Enable Matryoshka
Phase 2: 6.4B  (dim=2048)  ← Full training
    ↓ Progressive expand
Phase 3: 9.3B β†’ 20B (dim=4096)
    ↓ Fine-tuning
SFT: Instruction tuning
RL:  Agent capabilities

Model Details

  • Base: karpathy/nanochat-d32
  • Architecture: 64 blocks (32 Attention + 32 Mamba interleaved)
  • Vocabulary: 65,536 tokens
  • Matryoshka Dims: [128, 256, 512, 1024, 2048, 4096]

Usage

# Coming soon - inference code
# See: https://github.com/unixsysdev/adamba

Links

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support