arxiv:2412.01014

Detecting Memorization in Large Language Models

Published on Dec 2, 2024

Authors:

Abstract

Analytical method detects memorization in large language models by examining neuron activations and training classification probes with near-perfect accuracy.

AI-generated summary

Large language models (LLMs) have achieved impressive results in natural language processing but are prone to memorizing portions of their training data, which can compromise evaluation metrics, raise privacy concerns, and limit generalization. Traditional methods for detecting memorization rely on output probabilities or loss functions, often lacking precision due to confounding factors like common language patterns. In this paper, we introduce an analytical method that precisely detects memorization by examining neuron activations within the LLM. By identifying specific activation patterns that differentiate between memorized and not memorized tokens, we train classification probes that achieve near-perfect accuracy. The approach can also be applied to other mechanisms, such as repetition, as demonstrated in this study, highlighting its versatility. Intervening on these activations allows us to suppress memorization without degrading overall performance, enhancing evaluation integrity by ensuring metrics reflect genuine generalization. Additionally, our method supports large-scale labeling of tokens and sequences, crucial for next-generation AI models, improving training efficiency and results. Our findings contribute to model interpretability and offer practical tools for analyzing and controlling internal mechanisms in LLMs.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.01014 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.01014 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.01014 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.