--- language: - en - vi tags: - mamba - hypernetwork - persona - grpo - personalization license: mit --- # Mamba Hypernetwork Personalization v2 Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers. ## Architecture - **Hypernetwork:** Mamba SSM encoder + delta heads (LoRA-style) - **Target LLM:** Injected via forward hooks on q_proj / v_proj (8 layers) - **Training:** GRPO with combined reward (RM + CR + PL + DIV) ## Training Config - LR: 1e-5 (cosine schedule) - LAMBDA_GRPO: 0.2 - LAMBDA_KL: 0.08 - DELTA_SCALE: 0.003 - Epochs: 5 | Steps: 350 ## Reward Weights | Metric | Weight | Description | |--------|--------|-------------| | RM | +0.55 | Persona grounding | | CR | +0.25 | Context relevance | | PL | -0.30 | Persona leakage penalty | | DIV | +0.10 | Response diversity | ## Checkpoint Info - Saved at: epoch 5, step 400 - Date: 2026-05-13 ## Files - `mamba_weights_only.pt` — model weights only (for inference) - `ckpt_e5_s350.pt` — full checkpoint (for resume training)