Mamba Hypernetwork Personalization v2

Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers.

Architecture

  • Hypernetwork: Mamba SSM encoder + delta heads (LoRA-style)
  • Target LLM: Injected via forward hooks on q_proj / v_proj (8 layers)
  • Training: GRPO with combined reward (RM + CR + PL + DIV)

Training Config

  • LR: 1e-5 (cosine schedule)
  • LAMBDA_GRPO: 0.2
  • LAMBDA_KL: 0.08
  • DELTA_SCALE: 0.003
  • Epochs: 5 | Steps: 350

Reward Weights

Metric Weight Description
RM +0.55 Persona grounding
CR +0.25 Context relevance
PL -0.30 Persona leakage penalty
DIV +0.10 Response diversity

Checkpoint Info

  • Saved at: epoch 5, step 400
  • Date: 2026-05-13

Files

  • mamba_weights_only.pt โ€” model weights only (for inference)
  • ckpt_e5_s350.pt โ€” full checkpoint (for resume training)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support