| # 🧠 Mamba Hypernetwork for LLM Personalization |
|
|
| Mamba Hypernetwork sinh LoRA delta cho LLM tại inference time, cho phép cá nhân hóa tức thì mà không cần retrain. |
|
|
| ## Architecture |
| - **Mamba**: 1024 dim, 16 state, 4 expand (~12M params) |
| - **LoRA**: rank 16, inject vào q_proj + v_proj của 8 layers đầu |
| - **LLM**: Llama-3.2-3B-Instruct |
|
|
| ## Training |
| - **Method**: GRPO (Group Relative Policy Optimization) |
| - **Reward Model**: `phammminhhieu/persona-reward-model` |
| - **Dataset**: Personachat TrueCased |
|
|
| ## Usage |
|
|
| ```python |
| from mamba_ssm import Mamba |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| # Load Mamba |
| config = json.load(open("config.json")) |
| mamba = MambaHypernetwork(config) |
| mamba.load_state_dict(torch.load("pytorch_model.bin")) |
| mamba.eval() |
| |
| # Load LLM |
| llm = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") |
| tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") |
| |
| # Sinh delta từ persona |
| persona = "I love dogs. I am a photographer." |
| deltas = mamba(persona_ids, persona_mask, history_ids, history_mask) |
| |
| |