---
language:
- en
- vi
tags:
- mamba
- hypernetwork
- persona
- grpo
- personalization
license: mit
---

# Mamba Hypernetwork Personalization v2

Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers.

## Architecture
- **Hypernetwork:** Mamba SSM encoder + delta heads (LoRA-style)
- **Target LLM:** Injected via forward hooks on q_proj / v_proj (8 layers)
- **Training:** GRPO with combined reward (RM + CR + PL + DIV)

## Training Config
- LR: 1e-5 (cosine schedule)
- LAMBDA_GRPO: 0.2
- LAMBDA_KL: 0.08  
- DELTA_SCALE: 0.003
- Epochs: 5 | Steps: 350

## Reward Weights
| Metric | Weight | Description |
|--------|--------|-------------|
| RM     | +0.55  | Persona grounding |
| CR     | +0.25  | Context relevance |
| PL     | -0.30  | Persona leakage penalty |
| DIV    | +0.10  | Response diversity |

## Checkpoint Info
- Saved at: epoch 5, step 400
- Date: 2026-05-13

## Files
- `mamba_weights_only.pt` — model weights only (for inference)
- `ckpt_e5_s350.pt` — full checkpoint (for resume training)