phammminhhieu
/

mamba-hypernetwork-personalization_v2

personalization

Model card Files Files and versions

mamba-hypernetwork-personalization_v2 / README.md

phammminhhieu's picture

Add model card

212ec41 verified 21 days ago

|

history blame contribute delete

1.07 kB

	---
	language:
	- en
	- vi
	tags:
	- mamba
	- hypernetwork
	- persona
	- grpo
	- personalization
	license: mit
	---

	# Mamba Hypernetwork Personalization v2

	Mamba-based hypernetwork trained with GRPO to inject persona-conditioned deltas into LLM attention layers.

	## Architecture
	- Hypernetwork: Mamba SSM encoder + delta heads (LoRA-style)
	- Target LLM: Injected via forward hooks on q_proj / v_proj (8 layers)
	- Training: GRPO with combined reward (RM + CR + PL + DIV)

	## Training Config
	- LR: 1e-5 (cosine schedule)
	- LAMBDA_GRPO: 0.2
	- LAMBDA_KL: 0.08
	- DELTA_SCALE: 0.003
	- Epochs: 5 \| Steps: 350

	## Reward Weights
	\| Metric \| Weight \| Description \|
	\|--------\|--------\|-------------\|
	\| RM \| +0.55 \| Persona grounding \|
	\| CR \| +0.25 \| Context relevance \|
	\| PL \| -0.30 \| Persona leakage penalty \|
	\| DIV \| +0.10 \| Response diversity \|

	## Checkpoint Info
	- Saved at: epoch 5, step 400
	- Date: 2026-05-13

	## Files
	- `mamba_weights_only.pt` — model weights only (for inference)
	- `ckpt_e5_s350.pt` — full checkpoint (for resume training)