phammminhhieu
/

mamba-hypernetwork-personalization_v2

personalization

Model card Files Files and versions

mamba-hypernetwork-personalization_v2 / README.md

phammminhhieu's picture

Add model card

212ec41 verified 21 days ago

|

history blame contribute delete

1.07 kB

language:
  - en
  - vi
tags:
  - mamba
  - hypernetwork
  - persona
  - grpo
  - personalization
license: mit

Mamba Hypernetwork Personalization v2

Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers.

Architecture

Hypernetwork: Mamba SSM encoder + delta heads (LoRA-style)
Target LLM: Injected via forward hooks on q_proj / v_proj (8 layers)
Training: GRPO with combined reward (RM + CR + PL + DIV)

Training Config

LR: 1e-5 (cosine schedule)
LAMBDA_GRPO: 0.2
LAMBDA_KL: 0.08
DELTA_SCALE: 0.003
Epochs: 5 | Steps: 350

Reward Weights

Metric	Weight	Description
RM	+0.55	Persona grounding
CR	+0.25	Context relevance
PL	-0.30	Persona leakage penalty
DIV	+0.10	Response diversity

Checkpoint Info

Saved at: epoch 5, step 400
Date: 2026-05-13

Files

mamba_weights_only.pt — model weights only (for inference)
ckpt_e5_s350.pt — full checkpoint (for resume training)