Clover-Hill commited on
Commit
677c01c
·
verified ·
1 Parent(s): 18eb577

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -0
README.md CHANGED
@@ -5,3 +5,125 @@ language:
5
  base_model:
6
  - openai-community/gpt2
7
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  base_model:
6
  - openai-community/gpt2
7
  ---
8
+ # MemoryDecoder-GPT2-Small
9
+
10
+ ## Model Description
11
+
12
+ Memory Decoder is a pretrained, plug-and-play memory component designed for efficient domain adaptation of large language models. This checkpoint contains the GPT2-small Memory Decoder trained on WikiText-103, as described in our NeurIPS 2025 paper.
13
+
14
+ - **Paper:** [Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models](https://www.arxiv.org/abs/2508.09874)
15
+ - **GitHub:** [https://github.com/LUMIA-Group/MemoryDecoder](https://github.com/LUMIA-Group/MemoryDecoder/tree/main)
16
+ - **Conference:** NeurIPS 2025 (Poster)
17
+ - **Model Size:** 124M parameters
18
+ - **Base Architecture:** GPT2-small transformer decoder
19
+
20
+ ## Overview
21
+
22
+ Memory Decoder bridges the gap between non-parametric retrieval methods and parametric fine-tuning approaches. By pre-training a compact transformer decoder to internalize retrieval patterns, it provides:
23
+
24
+ - **Plug-and-Play Integration:** Works with any GPT2 model variant without modifying original parameters
25
+ - **Efficient Inference:** No retrieval overhead - just parallel forward passes
26
+ - **Domain Expertise:** Captures long-tail knowledge like kNN-LM but with parametric efficiency
27
+ - **Preserved Capabilities:** Original model remains unchanged
28
+
29
+ ## Quick Start
30
+
31
+ ### Step 1: Import Libraries and Initialize Models
32
+
33
+ ```python
34
+ from memDec import MemoryDecoder
35
+ import transformers
36
+ from transformers import AutoModelForCausalLM
37
+ from loguru import logger
38
+
39
+ # Define paths to your models
40
+ base_lm_path = "gpt2-xl" # or any GPT2 variant
41
+ knn_generator_path = "Clover-Hill/MemoryDecoder-gpt2-small"
42
+
43
+ # Load tokenizer and models
44
+ tokenizer = transformers.AutoTokenizer.from_pretrained(base_lm_path)
45
+ base_lm = AutoModelForCausalLM.from_pretrained(base_lm_path)
46
+ knn_generator = AutoModelForCausalLM.from_pretrained(knn_generator_path)
47
+ ```
48
+
49
+ ### Step 2: Prepare Models and Create Joint Model
50
+
51
+ ```python
52
+ # Resize embeddings and set to evaluation mode
53
+ base_lm.eval()
54
+ knn_generator.eval()
55
+
56
+ # Create the joint Memory Decoder model
57
+ joint = MemoryDecoder(base_lm, knn_generator, lmbda=0.55, knn_temp=1.0).to("cuda")
58
+ ```
59
+
60
+ ### Step 3: Generate Text and Compare Results
61
+
62
+ ```python
63
+ # Prepare input prompt
64
+ prompt = "As with previous Valkyira Chronicles games , Valkyria Chronicles III is"
65
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
66
+
67
+ # Generate with Memory Decoder
68
+ out_ids = joint.generate(**inputs, max_new_tokens=20, do_sample=False)
69
+ logger.info(f"Memory Decoder output: {tokenizer.decode(out_ids[0], skip_special_tokens=True)}")
70
+
71
+ # Generate with base model for comparison
72
+ out_ids = base_lm.generate(**inputs, max_new_tokens=20, do_sample=False)
73
+ logger.info(f"Base Model output: {tokenizer.decode(out_ids[0], skip_special_tokens=True)}")
74
+ ```
75
+
76
+ **📊 Generation Results Comparison:**
77
+
78
+ | Model | Generated Continuation |
79
+ |-------|------------------------|
80
+ | **Base Model** | *"...is a turn-based strategy game. The player takes control of a squad of Valkyria soldiers..."* |
81
+ | **+Memory Decoder** | *"...is a **role-playing** video game developed by Sega and published by Sega for the PlayStation 2."* |
82
+
83
+ > [!NOTE]
84
+ > Memory Decoder correctly identifies Valkyria Chronicles III as a **role-playing game** (factually accurate), while the base model incorrectly predicts it as a strategy game.
85
+
86
+ ## Performance on WikiText-103
87
+
88
+ | Model Configuration | Perplexity | Improvement |
89
+ |:-------------------|:----------:|:-----------:|
90
+ | GPT2-small (baseline) | 24.89 | - |
91
+ | GPT2-small + MemoryDecoder | **13.36** | -11.53 |
92
+ | GPT2-medium (baseline) | 18.29 | - |
93
+ | GPT2-medium + MemoryDecoder | **12.25** | -6.04 |
94
+ | GPT2-large (baseline) | 15.80 | - |
95
+ | GPT2-large + MemoryDecoder | **11.53** | -4.27 |
96
+ | GPT2-xl (baseline) | 14.39 | - |
97
+ | GPT2-xl + MemoryDecoder | **10.93** | -3.46 |
98
+
99
+ ## Key Features
100
+
101
+ - **Universal Compatibility:** Works with all GPT2 model sizes (small, medium, large, xl)
102
+ - **Parameter Efficient:** Only 124M additional parameters enhance models up to 1.5B
103
+ - **Domain Adaptation:** Trained to capture WikiText-103 domain knowledge
104
+ - **Inference Speed:** Minimal overhead compared to retrieval-based methods
105
+
106
+ ## Training Details
107
+
108
+ - **Training Data:** WikiText-103
109
+ - **Training Objective:** Hybrid KL divergence and language modeling loss
110
+ - **Supervision Signal:** kNN distributions from GPT2-xl
111
+ - **Hyperparameters:**
112
+ - Learning rate: 1e-3
113
+ - Beta (loss balance): 0.5
114
+ - Training Epoch: 10
115
+
116
+ ## Citation
117
+
118
+ ```bibtex
119
+ @article{cao2025memory,
120
+ title={Memory decoder: A pretrained, plug-and-play memory for large language models},
121
+ author={Cao, Jiaqi and Wang, Jiarui and Wei, Rubin and Guo, Qipeng and Chen, Kai and Zhou, Bowen and Lin, Zhouhan},
122
+ journal={arXiv preprint arXiv:2508.09874},
123
+ year={2025}
124
+ }
125
+ ```
126
+
127
+ ## Contact
128
+
129
+ For questions and support: maximus.cao@outlook.com