phammminhhieu's picture
Add model card
212ec41 verified
metadata
language:
  - en
  - vi
tags:
  - mamba
  - hypernetwork
  - persona
  - grpo
  - personalization
license: mit

Mamba Hypernetwork Personalization v2

Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers.

Architecture

  • Hypernetwork: Mamba SSM encoder + delta heads (LoRA-style)
  • Target LLM: Injected via forward hooks on q_proj / v_proj (8 layers)
  • Training: GRPO with combined reward (RM + CR + PL + DIV)

Training Config

  • LR: 1e-5 (cosine schedule)
  • LAMBDA_GRPO: 0.2
  • LAMBDA_KL: 0.08
  • DELTA_SCALE: 0.003
  • Epochs: 5 | Steps: 350

Reward Weights

Metric Weight Description
RM +0.55 Persona grounding
CR +0.25 Context relevance
PL -0.30 Persona leakage penalty
DIV +0.10 Response diversity

Checkpoint Info

  • Saved at: epoch 5, step 400
  • Date: 2026-05-13

Files

  • mamba_weights_only.pt — model weights only (for inference)
  • ckpt_e5_s350.pt — full checkpoint (for resume training)