Model Description

This Memory Decoder model is trained on the Biomedical domain and can be adapted to enhance any model in the Llama3, Llama3.1, and Llama3.2 families.

These Llama models are initialized from Qwen models with the embedding layer adapted to fit the Llama tokenizer. This enables efficient cross-model family knowledge transfer.

Paper: Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

GitHub: https://github.com/LUMIA-Group/MemoryDecoder

Training & Evaluation Data

Biomedical Domain Dataset: mimic_iii_diagnosis_anonymous

Test Split: MemoryDecoder-domain-data

Performance Results

Llama3 Family

Model Base Model Base + MemDec
Llama3-8B 7.95 3.92
Llama3-70B 5.92 3.74

Llama3.1 Family

Model Base Model Base + MemDec
Llama3.1-8B 7.82 3.91
Llama3.1-70B 5.85 3.73

Llama3.2 Family

Model Base Model Base + MemDec
Llama3.2-1B 12.81 4.06
Llama3.2-3B 9.83 3.99

Perplexity scores on Biomedical domain test set. Lower is better.

Citation

@article{cao2025memory,
  title={Memory decoder: A pretrained, plug-and-play memory for large language models},
  author={Cao, Jiaqi and Wang, Jiarui and Wei, Rubin and Guo, Qipeng and Chen, Kai and Zhou, Bowen and Lin, Zhouhan},
  journal={arXiv preprint arXiv:2508.09874},
  year={2025}
}

Contact

For questions and support: maximus.cao@outlook.com

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Clover-Hill/MemoryDecoder-Llama-biomed

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(529)
this model

Collection including Clover-Hill/MemoryDecoder-Llama-biomed

Paper for Clover-Hill/MemoryDecoder-Llama-biomed