phammminhhieu's picture
Add model card
212ec41 verified
---
language:
- en
- vi
tags:
- mamba
- hypernetwork
- persona
- grpo
- personalization
license: mit
---
# Mamba Hypernetwork Personalization v2
Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers.
## Architecture
- **Hypernetwork:** Mamba SSM encoder + delta heads (LoRA-style)
- **Target LLM:** Injected via forward hooks on q_proj / v_proj (8 layers)
- **Training:** GRPO with combined reward (RM + CR + PL + DIV)
## Training Config
- LR: 1e-5 (cosine schedule)
- LAMBDA_GRPO: 0.2
- LAMBDA_KL: 0.08
- DELTA_SCALE: 0.003
- Epochs: 5 | Steps: 350
## Reward Weights
| Metric | Weight | Description |
|--------|--------|-------------|
| RM | +0.55 | Persona grounding |
| CR | +0.25 | Context relevance |
| PL | -0.30 | Persona leakage penalty |
| DIV | +0.10 | Response diversity |
## Checkpoint Info
- Saved at: epoch 5, step 400
- Date: 2026-05-13
## Files
- `mamba_weights_only.pt` — model weights only (for inference)
- `ckpt_e5_s350.pt` — full checkpoint (for resume training)