metadata
license: apache-2.0
language:
- pyt
base_model:
- Qwen/Qwen2.5-0.5B
Model Description
This Memory Decoder model is trained on the Finance domain and can be adapted to enhance any model in the Llama3, Llama3.1, and Llama3.2 families.
These Llama models are initialized from Qwen models with the embedding layer adapted to fit the Llama tokenizer. This enables efficient cross-model family knowledge transfer.
Paper: Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
GitHub: https://github.com/LUMIA-Group/MemoryDecoder
Training & Evaluation Data
Finance Domain Dataset: yahoo_finance_stockmarket_news
Test Split: MemoryDecoder-domain-data
Performance Results
Llama3 Family
| Model | Base Model | Base + MemDec |
|---|---|---|
| Llama3-8B | 8.63 | 4.32 |
| Llama3-70B | 6.87 | 4.01 |
Llama3.1 Family
| Model | Base Model | Base + MemDec |
|---|---|---|
| Llama3.1-8B | 8.46 | 4.30 |
| Llama3.1-70B | 6.68 | 3.97 |
Llama3.2 Family
| Model | Base Model | Base + MemDec |
|---|---|---|
| Llama3.2-1B | 11.85 | 4.85 |
| Llama3.2-3B | 9.70 | 4.45 |
Perplexity scores on Finance domain test set. Lower is better.
Citation
@article{cao2025memory,
title={Memory decoder: A pretrained, plug-and-play memory for large language models},
author={Cao, Jiaqi and Wang, Jiarui and Wei, Rubin and Guo, Qipeng and Chen, Kai and Zhou, Bowen and Lin, Zhouhan},
journal={arXiv preprint arXiv:2508.09874},
year={2025}
}
Contact
For questions and support: maximus.cao@outlook.com