docs: upload paper LaTeX source
Browse files- paper/engram.tex +1014 -0
paper/engram.tex
ADDED
|
@@ -0,0 +1,1014 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
% ░ ENGRAM AUTHORSHIP SEAL ░
|
| 2 |
+
% P: ENIGMA
|
| 3 |
+
% H: [SHA-256 of final .eng fingerprint — computed post-compilation]
|
| 4 |
+
% T: 2026-04-03T00:00:00Z
|
| 5 |
+
% V: 1.0
|
| 6 |
+
% Method: ENGRAM self-fingerprint (f0+f1 vec_fourier_v2 of this document)
|
| 7 |
+
% Verify: python -m kvcos.engram --verify engram.eng engram.tex
|
| 8 |
+
|
| 9 |
+
\documentclass[11pt,twocolumn]{article}
|
| 10 |
+
|
| 11 |
+
% ── Packages ──────────────────────────────────────────────────────────
|
| 12 |
+
\usepackage[utf8]{inputenc}
|
| 13 |
+
\usepackage[T1]{fontenc}
|
| 14 |
+
\usepackage{mathpazo}
|
| 15 |
+
\usepackage{amsmath,amssymb}
|
| 16 |
+
\usepackage{graphicx}
|
| 17 |
+
\usepackage{booktabs}
|
| 18 |
+
\usepackage[table]{xcolor}
|
| 19 |
+
\usepackage{hyperref}
|
| 20 |
+
\usepackage{geometry}
|
| 21 |
+
\usepackage{float}
|
| 22 |
+
\usepackage{caption}
|
| 23 |
+
\usepackage{subcaption}
|
| 24 |
+
\usepackage{enumitem}
|
| 25 |
+
\usepackage{algorithm}
|
| 26 |
+
\usepackage{algpseudocode}
|
| 27 |
+
\usepackage{fancyhdr}
|
| 28 |
+
\usepackage{microtype}
|
| 29 |
+
\usepackage{url}
|
| 30 |
+
\usepackage{natbib}
|
| 31 |
+
|
| 32 |
+
% ── Page geometry ─────────────────────────────────────────────────────
|
| 33 |
+
\geometry{
|
| 34 |
+
letterpaper,
|
| 35 |
+
top=1in,
|
| 36 |
+
bottom=1in,
|
| 37 |
+
left=0.75in,
|
| 38 |
+
right=0.75in,
|
| 39 |
+
columnsep=0.3in
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
% ── Custom commands ───────────────────────────────────────────────────
|
| 43 |
+
\newcommand{\cmark}{\textcolor{green!60!black}{\checkmark}}
|
| 44 |
+
\newcommand{\xmark}{\textcolor{red!70!black}{$\times$}}
|
| 45 |
+
\newcommand{\engram}{\textsc{Engram}}
|
| 46 |
+
\newcommand{\eigengram}{\textsc{Eigengram}}
|
| 47 |
+
\newcommand{\fcdb}{\textsc{FCDB}}
|
| 48 |
+
|
| 49 |
+
\definecolor{engblue}{HTML}{4477AA}
|
| 50 |
+
\definecolor{engorange}{HTML}{EE6677}
|
| 51 |
+
\definecolor{enggreen}{HTML}{228833}
|
| 52 |
+
|
| 53 |
+
% ── Header ────────────────────────────────────────────────────────────
|
| 54 |
+
\pagestyle{fancy}
|
| 55 |
+
\fancyhf{}
|
| 56 |
+
\fancyhead[L]{\small\textit{\engram{} Protocol}}
|
| 57 |
+
\fancyhead[R]{\small\thepage}
|
| 58 |
+
\renewcommand{\headrulewidth}{0.4pt}
|
| 59 |
+
|
| 60 |
+
% ── Title ─────────────────────────────────────────────────────────────
|
| 61 |
+
\title{%
|
| 62 |
+
\textbf{You Don't Need Adapters:}\\
|
| 63 |
+
\textbf{Cross-Model Document Retrieval}\\
|
| 64 |
+
\textbf{via Intrinsic KV Cache Geometry}\\[0.5em]
|
| 65 |
+
\large \engram{}: Fourier Decomposition of Layer Key Trajectories\\
|
| 66 |
+
Achieves 99.5\% Cross-Architecture Recall at 51\,$\mu$s%
|
| 67 |
+
}
|
| 68 |
+
\author{%
|
| 69 |
+
\textsc{Enigma}\\
|
| 70 |
+
\textit{Independent Research}\\
|
| 71 |
+
\texttt{enigma@engramprotocol.ai}%
|
| 72 |
+
}
|
| 73 |
+
\date{April 2026}
|
| 74 |
+
|
| 75 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 76 |
+
\begin{document}
|
| 77 |
+
\maketitle
|
| 78 |
+
\thispagestyle{fancy}
|
| 79 |
+
|
| 80 |
+
% ── Abstract ──────────────────────────────────────────────────────────
|
| 81 |
+
\begin{abstract}
|
| 82 |
+
We\,present \engram{}, a protocol for persistent cross-session semantic
|
| 83 |
+
retrieval over LLM KV cache states. Given a key-value cache blob from
|
| 84 |
+
any supported architecture, \engram{} extracts per-layer key vectors,
|
| 85 |
+
computes a Fourier decomposition ($f_0{+}f_1$) along the layer dimension,
|
| 86 |
+
and produces a compact fingerprint vector that is architecture-invariant,
|
| 87 |
+
corpus-independent, and searchable via HNSW in sub-millisecond time.
|
| 88 |
+
|
| 89 |
+
On a 200-document, 10-domain corpus, the $f_0{+}f_1$ fingerprint achieves
|
| 90 |
+
\textbf{98\% Recall@1} (vs.\ 86\% for $f_1$ alone), with margin
|
| 91 |
+
degradation following a power law $\bar{m} = 0.021 \cdot N^{-0.207}$
|
| 92 |
+
--- graceful decay with no collapse point. A 4-stage geodesic retrieval
|
| 93 |
+
pipeline with confidence tracking resolves the remaining 2\% to reach
|
| 94 |
+
\textbf{100\% recall}. Cross-model transfer via \fcdb{}
|
| 95 |
+
(Fixed Corpus Delta Basis) achieves \textbf{+0.124 margin without
|
| 96 |
+
adapters}, validated by CKA isomorphism (0.975 within-family, 0.927
|
| 97 |
+
cross-family). HNSW indexing delivers \textbf{5.65$\times$ speedup}
|
| 98 |
+
over brute-force at 51.8\,$\mu$s per query with no recall loss. INT8
|
| 99 |
+
quantization provides 1.97$\times$ compression at 0.99998 cosine
|
| 100 |
+
similarity. The \eigengram{} binary format (\texttt{.eng} v1.2)
|
| 101 |
+
supports six architectures including Gemma\,4 ISWA dual-cache.
|
| 102 |
+
|
| 103 |
+
All results are produced on consumer hardware (Apple M3, 24\,GB) using
|
| 104 |
+
quantized models (Q4\_K\_M), demonstrating that KV cache fingerprinting
|
| 105 |
+
is practical without datacenter infrastructure.
|
| 106 |
+
\end{abstract}
|
| 107 |
+
|
| 108 |
+
\smallskip
|
| 109 |
+
\noindent\textbf{Keywords:}
|
| 110 |
+
KV cache, Fourier fingerprint, cross-model transfer, semantic retrieval,
|
| 111 |
+
HNSW, geodesic retrieval, EIGENGRAM
|
| 112 |
+
|
| 113 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 114 |
+
\section{Introduction}
|
| 115 |
+
\label{sec:introduction}
|
| 116 |
+
|
| 117 |
+
Large language model sessions are stateless by design. When a session
|
| 118 |
+
ends, the KV cache --- the only artifact that encodes what the model
|
| 119 |
+
\emph{attended to} --- is discarded. Every new session cold-starts from
|
| 120 |
+
scratch. For agent workflows requiring continuity across sessions, this
|
| 121 |
+
is the fundamental bottleneck: not compute, but memory.
|
| 122 |
+
|
| 123 |
+
Prior work addresses KV cache \emph{reuse} (LMCache~\citep{lmcache},
|
| 124 |
+
TurboRAG~\citep{turborag}, FusionRAG~\citep{fusionrag}) and KV cache
|
| 125 |
+
\emph{compression} (ShadowKV~\citep{shadowkv}, xKV~\citep{xkv},
|
| 126 |
+
KIVI~\citep{kivi}), but no system treats the KV cache as a
|
| 127 |
+
\emph{retrievable semantic object} --- a persistent, fingerprinted,
|
| 128 |
+
cross-model-searchable document certificate.
|
| 129 |
+
|
| 130 |
+
\engram{} introduces four contributions:
|
| 131 |
+
|
| 132 |
+
\begin{enumerate}[leftmargin=*,itemsep=2pt]
|
| 133 |
+
\item \textbf{Fourier fingerprinting} --- DFT decomposition of
|
| 134 |
+
per-token-mean key vectors along the layer dimension, producing
|
| 135 |
+
architecture-invariant fingerprint vectors ($f_0{+}f_1$, 2048-dim).
|
| 136 |
+
|
| 137 |
+
\item \textbf{\eigengram{} binary format} --- \texttt{.eng}\,v1.2, a
|
| 138 |
+
compact (${\sim}$800\,byte) document certificate supporting 6
|
| 139 |
+
architectures including ISWA.
|
| 140 |
+
|
| 141 |
+
\item \textbf{Geodesic retrieval} --- 4-stage pipeline (prior
|
| 142 |
+
preemption $\to$ HNSW $\to$ trajectory correction $\to$ negative
|
| 143 |
+
constraints $\to$ metadata disambiguation) achieving 100\% recall
|
| 144 |
+
with confidence tracking.
|
| 145 |
+
|
| 146 |
+
\item \textbf{Cross-model transfer without adapters} --- \fcdb{} (Fixed
|
| 147 |
+
Corpus Delta Basis) enables retrieval across model families using the
|
| 148 |
+
Fr\'echet mean as shared reference, requiring no learned adapter.
|
| 149 |
+
\end{enumerate}
|
| 150 |
+
|
| 151 |
+
This work originated from a systematic analysis of the KV cache
|
| 152 |
+
management landscape --- 686 sources across 7 research domains --- which
|
| 153 |
+
identified a critical gap: \emph{no existing system combines persistent
|
| 154 |
+
storage, semantic retrieval, cross-model transfer, and agent-native
|
| 155 |
+
APIs.} The entire system was built in three sessions across two days.
|
| 156 |
+
|
| 157 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 158 |
+
\section{Background \& Related Work}
|
| 159 |
+
\label{sec:background}
|
| 160 |
+
|
| 161 |
+
\subsection{KV Cache Management}
|
| 162 |
+
|
| 163 |
+
\textbf{LMCache}~\citep{lmcache} (6.6k GitHub stars) provides
|
| 164 |
+
multi-tier storage (GPU$\to$CPU$\to$Disk$\to$S3), cross-engine sharing,
|
| 165 |
+
and non-prefix reuse via CacheBlend. However, it offers no semantic
|
| 166 |
+
search over stored blocks and no cross-model transfer --- caches are
|
| 167 |
+
keyed by token hash, not content similarity.
|
| 168 |
+
|
| 169 |
+
\textbf{TurboRAG}~\citep{turborag} achieves 6.35$\times$ TTFT
|
| 170 |
+
reduction but suffers quality degradation from full cache reuse
|
| 171 |
+
(overlapping position IDs). \textbf{FusionRAG}~\citep{fusionrag}
|
| 172 |
+
recovers 99\% quality via 15\% selective recomputation at 73.3\% TTFT
|
| 173 |
+
reduction.
|
| 174 |
+
|
| 175 |
+
\textbf{MemArt}~\citep{memart} (ICLR\,2026) is the most
|
| 176 |
+
architecturally relevant prior work: it stores conversational turns as
|
| 177 |
+
reusable KV cache blocks and retrieves them by computing attention
|
| 178 |
+
scores in latent space, achieving +11--39.4\% accuracy over plaintext
|
| 179 |
+
memory. But it is research-only with no persistence, no public code,
|
| 180 |
+
and single-model only.
|
| 181 |
+
|
| 182 |
+
\textbf{agent-memory}~\citep{agentmemory} is the first shipped system
|
| 183 |
+
treating KV cache as per-agent persistent memory (safetensors format,
|
| 184 |
+
136$\times$ TTFT reduction on Gemma\,3 12B). But it is Apple Silicon/MLX
|
| 185 |
+
only, with no semantic retrieval and no cross-model transfer.
|
| 186 |
+
|
| 187 |
+
\subsection{Representation Similarity}
|
| 188 |
+
|
| 189 |
+
Centered Kernel Alignment (CKA)~\citep{kornblith2019} provides a
|
| 190 |
+
scale-invariant measure of representational similarity between neural
|
| 191 |
+
network layers. We use CKA to validate that key manifolds across
|
| 192 |
+
different model sizes share the same topology (Section~\ref{sec:cka}),
|
| 193 |
+
motivating the \fcdb{} transfer approach.
|
| 194 |
+
|
| 195 |
+
\subsection{Cross-Model Transfer}
|
| 196 |
+
|
| 197 |
+
Relative Representations~\citep{moschella2023} propose model-agnostic
|
| 198 |
+
similarity profiles via anchor documents. In practice, when the input
|
| 199 |
+
representations (per-document SVD) are already model-specific, the
|
| 200 |
+
relative profiles inherit this contamination
|
| 201 |
+
(Section~\ref{sec:cross-model}).
|
| 202 |
+
|
| 203 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 204 |
+
\section{Method}
|
| 205 |
+
\label{sec:method}
|
| 206 |
+
|
| 207 |
+
\subsection{KV Cache State Extraction}
|
| 208 |
+
\label{sec:extraction}
|
| 209 |
+
|
| 210 |
+
Given an opaque binary blob from \texttt{llama\_state\_get\_data()}, the
|
| 211 |
+
\engram{} blob parser extracts per-layer key tensors
|
| 212 |
+
$\mathbf{K}_l \in \mathbb{R}^{H \times T \times d}$ where $H$ is the
|
| 213 |
+
number of KV heads, $T$ is the context length, and $d$ is the head
|
| 214 |
+
dimension. Architecture detection is automatic via a model registry
|
| 215 |
+
that maps model families to layer counts, head dimensions, and attention
|
| 216 |
+
types (GQA, MQA, ISWA).
|
| 217 |
+
|
| 218 |
+
\textbf{Supported architectures:} Llama, Gemma, Gemma\,4 (ISWA), Phi,
|
| 219 |
+
Qwen, Mistral.
|
| 220 |
+
|
| 221 |
+
For ISWA models (Gemma\,4), the dual-cache structure (5 sliding-window
|
| 222 |
+
layers + 25 global attention layers) produces a 6144-dim fingerprint,
|
| 223 |
+
with the parser handling interleaved attention type metadata.
|
| 224 |
+
|
| 225 |
+
\subsection{Fourier Fingerprinting}
|
| 226 |
+
\label{sec:fourier}
|
| 227 |
+
|
| 228 |
+
For each token position $t$, compute the mean key vector across heads:
|
| 229 |
+
\begin{equation}
|
| 230 |
+
\bar{\mathbf{k}}_l(t) = \frac{1}{H}\sum_{h=1}^{H}\mathbf{K}_l[h,t,:]
|
| 231 |
+
\end{equation}
|
| 232 |
+
|
| 233 |
+
Then compute the Discrete Fourier Transform along the layer dimension $L$:
|
| 234 |
+
\begin{equation}
|
| 235 |
+
\mathbf{F}(f) = \sum_{l=0}^{L-1} \bar{\mathbf{k}}_l \cdot e^{-2\pi i f l / L}
|
| 236 |
+
\end{equation}
|
| 237 |
+
|
| 238 |
+
The fingerprint is the concatenation of amplitude spectra at frequencies
|
| 239 |
+
$f{=}0$ and $f{=}1$:
|
| 240 |
+
\begin{equation}
|
| 241 |
+
\mathbf{fp} = \big[\,|\mathbf{F}(0)|\,,\;|\mathbf{F}(1)|\,\big]
|
| 242 |
+
\quad\in\mathbb{R}^{2d}
|
| 243 |
+
\label{eq:fingerprint}
|
| 244 |
+
\end{equation}
|
| 245 |
+
|
| 246 |
+
\textbf{Why $f_0{+}f_1$.} The DC component $f_0$ captures the
|
| 247 |
+
layer-mean structure (what the model consistently attends to across all
|
| 248 |
+
layers). The first harmonic $f_1$ captures the dominant oscillation (how
|
| 249 |
+
attention shifts between early and deep layers). Together they encode
|
| 250 |
+
both what is \emph{common} across layers and what \emph{varies} --- the
|
| 251 |
+
DFT analog of capturing both the centroid and the principal direction of
|
| 252 |
+
variation.
|
| 253 |
+
|
| 254 |
+
Table~\ref{tab:frequency-ablation} shows the ablation across six
|
| 255 |
+
frequency combinations. Adding $f_2$ or $f_3$ does not help; the DC
|
| 256 |
+
component $f_0$ contains the missing discriminative signal.
|
| 257 |
+
|
| 258 |
+
% ── Table 1: Frequency Ablation ──────────────────────────────────────
|
| 259 |
+
\begin{table}[t]
|
| 260 |
+
\centering
|
| 261 |
+
\caption{Multi-frequency fingerprint ablation at $N{=}200$. The
|
| 262 |
+
$f_0{+}f_1$ combination achieves the highest recall and mean margin,
|
| 263 |
+
fixing 25 of 28 single-frequency failures.}
|
| 264 |
+
\label{tab:frequency-ablation}
|
| 265 |
+
\small
|
| 266 |
+
\begin{tabular}{lcccc}
|
| 267 |
+
\toprule
|
| 268 |
+
Frequencies & Recall@1 & Mean Margin & Failures \\
|
| 269 |
+
\midrule
|
| 270 |
+
$f_1$ & 86.0\% & $4.09{\times}10^{-3}$ & 28 \\
|
| 271 |
+
$f_2$ & 71.5\% & $2.20{\times}10^{-3}$ & 57 \\
|
| 272 |
+
$f_1{+}f_2$ & 95.0\% & $4.74{\times}10^{-3}$ & 10 \\
|
| 273 |
+
$f_1{+}f_2{+}f_3$ & 95.0\% & $4.13{\times}10^{-3}$ & 10 \\
|
| 274 |
+
\rowcolor{green!10}
|
| 275 |
+
$f_0{+}f_1$ & \textbf{98.0\%} & $\mathbf{7.20{\times}10^{-3}}$ & \textbf{4} \\
|
| 276 |
+
$f_1{+}f_3$ & 89.0\% & $3.48{\times}10^{-3}$ & 22 \\
|
| 277 |
+
\bottomrule
|
| 278 |
+
\end{tabular}
|
| 279 |
+
\end{table}
|
| 280 |
+
|
| 281 |
+
\subsection{EIGENGRAM Binary Format}
|
| 282 |
+
\label{sec:eigengram}
|
| 283 |
+
|
| 284 |
+
The \texttt{.eng}\,v1.2 format stores a header (magic bytes, version,
|
| 285 |
+
architecture ID, layer count, head dimension), the fingerprint vector
|
| 286 |
+
($f_0{+}f_1$, float16 or int8), and metadata (model name, timestamp,
|
| 287 |
+
token count, domain tags). Typical size: ${\sim}$800 bytes per document
|
| 288 |
+
certificate.
|
| 289 |
+
|
| 290 |
+
INT8 quantization uses per-row symmetric scaling, achieving
|
| 291 |
+
1.97$\times$ compression at 0.99998 cosine similarity
|
| 292 |
+
(Table~\ref{tab:int8}).
|
| 293 |
+
|
| 294 |
+
% ── Table 4: INT8 ────────────────────────────────────────────────────
|
| 295 |
+
\begin{table}[t]
|
| 296 |
+
\centering
|
| 297 |
+
\caption{INT8 quantization results. Per-row symmetric quantization
|
| 298 |
+
achieves 1.97$\times$ compression with negligible quality loss.}
|
| 299 |
+
\label{tab:int8}
|
| 300 |
+
\small
|
| 301 |
+
\begin{tabular}{lcccc}
|
| 302 |
+
\toprule
|
| 303 |
+
Tokens & FP16 & INT8 & Ratio & $\cos(\mathbf{s},\mathbf{s}')$ \\
|
| 304 |
+
\midrule
|
| 305 |
+
591 & 73.9\,MB & 37.5\,MB & 1.97$\times$ & 0.99998 \\
|
| 306 |
+
6,403 & 800.4\,MB & 406.5\,MB & 1.97$\times$ & 0.99998 \\
|
| 307 |
+
\bottomrule
|
| 308 |
+
\end{tabular}
|
| 309 |
+
\end{table}
|
| 310 |
+
|
| 311 |
+
\subsection{HNSW Indexing}
|
| 312 |
+
\label{sec:hnsw}
|
| 313 |
+
|
| 314 |
+
Fingerprint vectors are indexed via FAISS \texttt{IndexHNSWFlat}
|
| 315 |
+
($M{=}32$, \texttt{efSearch}{=}64). At $N{=}200$, HNSW delivers
|
| 316 |
+
5.65$\times$ speedup over brute-force (51.8\,$\mu$s vs.\ 293.1\,$\mu$s)
|
| 317 |
+
with identical recall (99.5\%), as shown in Table~\ref{tab:hnsw}.
|
| 318 |
+
|
| 319 |
+
% ── Table 6: HNSW ────────────────────────────────────────────────────
|
| 320 |
+
\begin{table}[t]
|
| 321 |
+
\centering
|
| 322 |
+
\caption{HNSW index performance at $N{=}200$.}
|
| 323 |
+
\label{tab:hnsw}
|
| 324 |
+
\small
|
| 325 |
+
\begin{tabular}{lcc}
|
| 326 |
+
\toprule
|
| 327 |
+
Method & Latency ($\mu$s) & Recall@1 \\
|
| 328 |
+
\midrule
|
| 329 |
+
Brute-force & 293.1 & 99.5\% \\
|
| 330 |
+
HNSW ($M{=}32$) & 51.8 & 99.5\% \\
|
| 331 |
+
\midrule
|
| 332 |
+
\textbf{Speedup} & \textbf{5.65$\times$} & --- \\
|
| 333 |
+
\bottomrule
|
| 334 |
+
\end{tabular}
|
| 335 |
+
\end{table}
|
| 336 |
+
|
| 337 |
+
\subsection{Geodesic Retrieval Pipeline}
|
| 338 |
+
\label{sec:geodesic}
|
| 339 |
+
|
| 340 |
+
Retrieval proceeds through four stages with confidence tracking:
|
| 341 |
+
|
| 342 |
+
\begin{enumerate}[leftmargin=*,itemsep=1pt]
|
| 343 |
+
\item[\textbf{S0.}] \textbf{Prior preemption.} IndexC (SQLite-backed
|
| 344 |
+
confidence history) detects documents with chronic retrieval failure
|
| 345 |
+
and preempts them before HNSW search.
|
| 346 |
+
|
| 347 |
+
\item[\textbf{S1.}] \textbf{HNSW search.} Cosine-similarity top-$k$
|
| 348 |
+
retrieval. Results above the margin threshold receive HIGH or MEDIUM
|
| 349 |
+
confidence.
|
| 350 |
+
|
| 351 |
+
\item[\textbf{S2.}] \textbf{Trajectory correction.} For borderline
|
| 352 |
+
results, interpolation with weight $w{=}0.3$ between the query
|
| 353 |
+
fingerprint and its nearest MEDIUM neighbor corrects minor
|
| 354 |
+
distributional drift.
|
| 355 |
+
|
| 356 |
+
\item[\textbf{S3.}] \textbf{Negative constraints.} An apophatic
|
| 357 |
+
exclusion layer removes candidates that are \emph{known} to be
|
| 358 |
+
incorrect based on prior IndexC history.
|
| 359 |
+
|
| 360 |
+
\item[\textbf{S4.}] \textbf{Metadata disambiguation.} For the
|
| 361 |
+
lowest-confidence results, domain tags, keyword overlap, and vector
|
| 362 |
+
norms break ties that pure cosine similarity cannot resolve.
|
| 363 |
+
\end{enumerate}
|
| 364 |
+
|
| 365 |
+
At $N{=}200$: Stage\,1 resolves 199/200 documents (99.5\%); Stage\,4
|
| 366 |
+
catches the single hard failure (\texttt{doc\_146}), reaching
|
| 367 |
+
\textbf{100\% recall}. The confidence distribution is 199 MEDIUM, 1 LOW.
|
| 368 |
+
|
| 369 |
+
\subsection{Cross-Model Transfer: FCDB}
|
| 370 |
+
\label{sec:fcdb}
|
| 371 |
+
|
| 372 |
+
The Fixed Corpus Delta Basis operates on document-level mean vectors
|
| 373 |
+
without any learned adapter:
|
| 374 |
+
|
| 375 |
+
\begin{enumerate}[leftmargin=*,itemsep=1pt]
|
| 376 |
+
\item Compute the joint corpus Fr\'echet mean $\boldsymbol{\mu}$
|
| 377 |
+
(center of all documents' mean key vectors from both models).
|
| 378 |
+
\item Delta vectors: $\boldsymbol{\delta}_i = \bar{\mathbf{k}}_i - \boldsymbol{\mu}$
|
| 379 |
+
for each document $i$.
|
| 380 |
+
\item Joint SVD on normalized deltas from both models: extract the
|
| 381 |
+
principal directions of variation away from the mean.
|
| 382 |
+
\item Gate top-$k$ components; project into the delta subspace.
|
| 383 |
+
\end{enumerate}
|
| 384 |
+
|
| 385 |
+
The key insight: cross-model transfer requires representing documents as
|
| 386 |
+
\emph{directions from a shared reference point}, not as positions in
|
| 387 |
+
space. FCB (Fixed Corpus Basis) captures what is \emph{common} across
|
| 388 |
+
documents; \fcdb{} captures what \emph{differentiates} them. The
|
| 389 |
+
Fr\'echet mean provides the shared reference.
|
| 390 |
+
|
| 391 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 392 |
+
\section{Experiments}
|
| 393 |
+
\label{sec:experiments}
|
| 394 |
+
|
| 395 |
+
\subsection{Setup}
|
| 396 |
+
|
| 397 |
+
\textbf{Corpus:} 200 documents across 10 domains (biology, computer
|
| 398 |
+
science, general world, history, language arts, mathematics, medicine,
|
| 399 |
+
ML/systems, philosophy, physics), 20 per domain.
|
| 400 |
+
|
| 401 |
+
\textbf{Models:} Llama\,3.2 3B Instruct, Llama\,3.1 8B Instruct
|
| 402 |
+
(Q4\_K\_M), Qwen\,2.5 7B Instruct (for cross-family CKA).
|
| 403 |
+
|
| 404 |
+
\textbf{Hardware:} Apple M3, 24\,GB RAM, Metal GPU.
|
| 405 |
+
llama-cpp-python\,0.3.19, FAISS\,1.13.2, PyTorch\,2.11.0.
|
| 406 |
+
|
| 407 |
+
\subsection{Same-Model Retrieval Scaling}
|
| 408 |
+
\label{sec:scaling}
|
| 409 |
+
|
| 410 |
+
For each document $d_i$, we compute its $f_0{+}f_1$ fingerprint and
|
| 411 |
+
retrieve the nearest neighbor from all $N$ documents. We measure
|
| 412 |
+
Recall@1 and the discrimination margin (cosine similarity of the correct
|
| 413 |
+
match minus the best incorrect match).
|
| 414 |
+
|
| 415 |
+
Figure~\ref{fig:power-law} shows that margin follows a power law
|
| 416 |
+
$\bar{m} = A \cdot N^{\alpha}$ with no hard collapse point. The
|
| 417 |
+
$f_0{+}f_1$ fingerprint ($\alpha = -0.207$) degrades more slowly than
|
| 418 |
+
$f_1$ alone ($\alpha = -0.277$).
|
| 419 |
+
|
| 420 |
+
\begin{figure}[t]
|
| 421 |
+
\centering
|
| 422 |
+
\includegraphics[width=\columnwidth]{fig03_margin_power_law.png}
|
| 423 |
+
\caption{Margin power law: both fingerprint methods exhibit graceful
|
| 424 |
+
degradation with no cliff. The $f_0{+}f_1$ combination has a shallower
|
| 425 |
+
decay exponent ($\alpha = -0.207$ vs.\ $-0.277$).}
|
| 426 |
+
\label{fig:power-law}
|
| 427 |
+
\end{figure}
|
| 428 |
+
|
| 429 |
+
% ── Table 8: Power Law ───────────────────────────────────────────────
|
| 430 |
+
\begin{table}[t]
|
| 431 |
+
\centering
|
| 432 |
+
\caption{Margin scaling law parameters. Both methods follow power-law
|
| 433 |
+
decay $\bar{m} = A \cdot N^{\alpha}$ with no hard collapse point.}
|
| 434 |
+
\label{tab:power-law}
|
| 435 |
+
\small
|
| 436 |
+
\begin{tabular}{lccc}
|
| 437 |
+
\toprule
|
| 438 |
+
Fingerprint & $A$ & $\alpha$ & Recall@200 \\
|
| 439 |
+
\midrule
|
| 440 |
+
$f_1$ & 0.0181 & $-0.277$ & 86.0\% \\
|
| 441 |
+
$f_0{+}f_1$ & 0.0213 & $-0.207$ & 98.0\% \\
|
| 442 |
+
\bottomrule
|
| 443 |
+
\end{tabular}
|
| 444 |
+
\end{table}
|
| 445 |
+
|
| 446 |
+
\subsection{Multi-Frequency Ablation}
|
| 447 |
+
\label{sec:ablation}
|
| 448 |
+
|
| 449 |
+
Six frequency combinations were tested
|
| 450 |
+
(Table~\ref{tab:frequency-ablation}). The $f_0{+}f_1$ combination fixes
|
| 451 |
+
25 of 28 $f_1$-only failures while achieving the highest mean margin
|
| 452 |
+
(+76\% over $f_1$ alone).
|
| 453 |
+
|
| 454 |
+
\begin{figure}[t]
|
| 455 |
+
\centering
|
| 456 |
+
\includegraphics[width=\columnwidth]{fig02_frequency_comparison.png}
|
| 457 |
+
\caption{Multi-frequency ablation at $N{=}200$. The $f_0{+}f_1$
|
| 458 |
+
combination (green) achieves 98\% recall with only 4 failures.}
|
| 459 |
+
\label{fig:freq-comparison}
|
| 460 |
+
\end{figure}
|
| 461 |
+
|
| 462 |
+
\subsection{Domain Confusion Analysis}
|
| 463 |
+
\label{sec:confusion}
|
| 464 |
+
|
| 465 |
+
At $N{=}200$, $f_1$-only fingerprints produce 28 failures concentrated
|
| 466 |
+
in ML/systems $\to$ mathematics confusion (16/28 failures). The $f_0$
|
| 467 |
+
component disambiguates these domains by capturing the DC layer-mean,
|
| 468 |
+
which encodes domain-specific activation patterns. The $f_0{+}f_1$
|
| 469 |
+
combination reduces ML$\to$math confusion by \textbf{81.5\%}.
|
| 470 |
+
|
| 471 |
+
\begin{figure}[t]
|
| 472 |
+
\centering
|
| 473 |
+
\includegraphics[width=\columnwidth]{fig07_confusion_matrix.png}
|
| 474 |
+
\caption{Domain confusion heatmaps. (a) $f_1$ only: 28 failures,
|
| 475 |
+
dominated by ML$\to$Math. (b) $f_0{+}f_1$: 4 failures, diffuse.}
|
| 476 |
+
\label{fig:confusion}
|
| 477 |
+
\end{figure}
|
| 478 |
+
|
| 479 |
+
\begin{figure}[t]
|
| 480 |
+
\centering
|
| 481 |
+
\includegraphics[width=\columnwidth]{fig08_domain_recall_radar.png}
|
| 482 |
+
\caption{Per-domain Recall@1 with $f_0{+}f_1$ at $N{=}200$. All
|
| 483 |
+
domains achieve $\geq 90$\% recall; ML/systems is the lowest at 90\%.}
|
| 484 |
+
\label{fig:domain-radar}
|
| 485 |
+
\end{figure}
|
| 486 |
+
|
| 487 |
+
% ── Table 7: Domain Recall ───────────────────────────────────────────
|
| 488 |
+
\begin{table}[t]
|
| 489 |
+
\centering
|
| 490 |
+
\caption{Per-domain Recall@1 with $f_0{+}f_1$ at $N{=}200$.}
|
| 491 |
+
\label{tab:domain-recall}
|
| 492 |
+
\small
|
| 493 |
+
\begin{tabular}{lc}
|
| 494 |
+
\toprule
|
| 495 |
+
Domain & Recall@1 \\
|
| 496 |
+
\midrule
|
| 497 |
+
Biology, CS, History, Lang.\ Arts & 100.0\% \\
|
| 498 |
+
Mathematics, Philosophy, Physics & 100.0\% \\
|
| 499 |
+
General World, Medicine & 95.0\% \\
|
| 500 |
+
ML/Systems & 90.0\% \\
|
| 501 |
+
\bottomrule
|
| 502 |
+
\end{tabular}
|
| 503 |
+
\end{table}
|
| 504 |
+
|
| 505 |
+
\subsection{Cross-Model Transfer}
|
| 506 |
+
\label{sec:cross-model}
|
| 507 |
+
|
| 508 |
+
Nine strategies were tested for Llama\,3B $\to$ 8B transfer
|
| 509 |
+
(Table~\ref{tab:cross-model}). The progression tells a clear scientific
|
| 510 |
+
story:
|
| 511 |
+
|
| 512 |
+
\begin{itemize}[leftmargin=*,itemsep=1pt]
|
| 513 |
+
\item \textbf{Per-doc SVD} ($-0.104$): local coordinates are
|
| 514 |
+
document-dependent and non-transferable.
|
| 515 |
+
\item \textbf{FCB + ridge} ($-0.017$): alignment works (LOOCV
|
| 516 |
+
$\cos = 0.969$) but kills discrimination.
|
| 517 |
+
\item \textbf{Contrastive $\delta$} ($+0.001$): direction from neutral
|
| 518 |
+
transfers, but barely.
|
| 519 |
+
\item \textbf{\fcdb{}} ($+0.124$): \emph{directions from the corpus
|
| 520 |
+
mean} transfer AND discriminate --- no adapter required.
|
| 521 |
+
\end{itemize}
|
| 522 |
+
|
| 523 |
+
% ── Table 2: Cross-Model ─────────────────────────────────────────────
|
| 524 |
+
\begin{table}[t]
|
| 525 |
+
\centering
|
| 526 |
+
\caption{Cross-model transfer (Llama 3B $\to$ 8B). \fcdb{} is the only
|
| 527 |
+
adapter-free method with margin $> 0.10$.}
|
| 528 |
+
\label{tab:cross-model}
|
| 529 |
+
\small
|
| 530 |
+
\begin{tabular}{lccc}
|
| 531 |
+
\toprule
|
| 532 |
+
Method & Margin & Correct & Adapter \\
|
| 533 |
+
\midrule
|
| 534 |
+
CCA & $-0.420$ & \xmark & symmetric \\
|
| 535 |
+
Residual FCB & $-0.382$ & \xmark & none \\
|
| 536 |
+
Procrustes & $-0.104$ & \xmark & orthogonal \\
|
| 537 |
+
Relative Repr. & $-0.066$ & \xmark & none \\
|
| 538 |
+
FCB + ridge & $-0.017$ & \xmark & ridge \\
|
| 539 |
+
\midrule
|
| 540 |
+
Contrastive $\delta$ & $+0.001$ & \cmark & ridge \\
|
| 541 |
+
JCB & $+0.011$ & \cmark & none \\
|
| 542 |
+
JCB + $\delta$ & $+0.037$ & \cmark & none \\
|
| 543 |
+
\rowcolor{green!10}
|
| 544 |
+
\textbf{\fcdb{}} & $\mathbf{+0.124}$ & \cmark & \textbf{none} \\
|
| 545 |
+
\bottomrule
|
| 546 |
+
\end{tabular}
|
| 547 |
+
\end{table}
|
| 548 |
+
|
| 549 |
+
\begin{figure}[t]
|
| 550 |
+
\centering
|
| 551 |
+
\includegraphics[width=\columnwidth]{fig05_cross_model_strategies.png}
|
| 552 |
+
\caption{Nine cross-model transfer strategies. Green = correct
|
| 553 |
+
retrieval (margin $> 0$), red = failure. \fcdb{} is the clear winner.}
|
| 554 |
+
\label{fig:cross-model}
|
| 555 |
+
\end{figure}
|
| 556 |
+
|
| 557 |
+
\subsection{CKA Representational Similarity}
|
| 558 |
+
\label{sec:cka}
|
| 559 |
+
|
| 560 |
+
CKA was computed between Llama\,3B and 8B (within-family) and Llama\,3B
|
| 561 |
+
and Qwen\,7B (cross-family) across all 28 layer pairs
|
| 562 |
+
(Figure~\ref{fig:cka}).
|
| 563 |
+
|
| 564 |
+
\begin{figure}[t]
|
| 565 |
+
\centering
|
| 566 |
+
\includegraphics[width=\columnwidth]{fig06_cka_layers.png}
|
| 567 |
+
\caption{CKA similarity per layer. Within-family: $\mu = 0.975$;
|
| 568 |
+
cross-family: $\mu = 0.927$. Both exceed 0.88 at all layers.}
|
| 569 |
+
\label{fig:cka}
|
| 570 |
+
\end{figure}
|
| 571 |
+
|
| 572 |
+
% ── Table 5: CKA ─────────────────────────────────────────────────────
|
| 573 |
+
\begin{table}[t]
|
| 574 |
+
\centering
|
| 575 |
+
\caption{CKA between model families confirms topological isomorphism.}
|
| 576 |
+
\label{tab:cka}
|
| 577 |
+
\small
|
| 578 |
+
\begin{tabular}{lccc}
|
| 579 |
+
\toprule
|
| 580 |
+
Comparison & Mean CKA & $f_0{+}f_1$ Sim \\
|
| 581 |
+
\midrule
|
| 582 |
+
Within (Llama 3B$\leftrightarrow$8B) & 0.975 & 0.875 \\
|
| 583 |
+
Cross (Llama$\leftrightarrow$Qwen) & 0.927 & 0.259 \\
|
| 584 |
+
\bottomrule
|
| 585 |
+
\end{tabular}
|
| 586 |
+
\end{table}
|
| 587 |
+
|
| 588 |
+
CKA $> 0.97$ within-family and $> 0.92$ cross-family at \emph{all}
|
| 589 |
+
layer pairs. The representational geometry IS compatible --- the
|
| 590 |
+
cross-model failure is in the \emph{coordinate system}, not the
|
| 591 |
+
topology. This validates the \fcdb{} approach: a shared reference point
|
| 592 |
+
(Fr\'echet mean) resolves the coordinate ambiguity.
|
| 593 |
+
|
| 594 |
+
\subsection{FCDB Scaling and Collapse}
|
| 595 |
+
\label{sec:fcdb-scaling}
|
| 596 |
+
|
| 597 |
+
\fcdb{} recall at varying corpus sizes is shown in
|
| 598 |
+
Figure~\ref{fig:recall-vs-n}. The contrast with Fourier $f_0{+}f_1$ is
|
| 599 |
+
stark: \fcdb{} exhibits hard collapse at $N{=}100$ (30\% recall) and
|
| 600 |
+
reaches 0\% at $N{=}200$, while Fourier degrades gracefully via
|
| 601 |
+
power law.
|
| 602 |
+
|
| 603 |
+
\begin{figure}[t]
|
| 604 |
+
\centering
|
| 605 |
+
\includegraphics[width=\columnwidth]{fig04_recall_vs_n.png}
|
| 606 |
+
\caption{Recall vs.\ corpus size. Fourier $f_0{+}f_1$ (same-model)
|
| 607 |
+
never collapses; \fcdb{} (cross-model) has a hard failure at $N{=}100$.}
|
| 608 |
+
\label{fig:recall-vs-n}
|
| 609 |
+
\end{figure}
|
| 610 |
+
|
| 611 |
+
This reveals a fundamental \textbf{stability--discrimination tradeoff}
|
| 612 |
+
(Figure~\ref{fig:fcdb-tradeoff}): \fcdb{}\,v1 ($N{=}50$) has unstable
|
| 613 |
+
basis (agreement 0.82) but strong margin (+0.124); \fcdb{}\,v2
|
| 614 |
+
($N{=}200$) has stable basis (agreement 0.999) but thin margin (+0.013).
|
| 615 |
+
|
| 616 |
+
\begin{figure}[t]
|
| 617 |
+
\centering
|
| 618 |
+
\includegraphics[width=\columnwidth]{fig13_fcdb_tradeoff.png}
|
| 619 |
+
\caption{\fcdb{} stability--discrimination tradeoff. Larger corpus
|
| 620 |
+
stabilizes the basis but dilutes per-document signal.}
|
| 621 |
+
\label{fig:fcdb-tradeoff}
|
| 622 |
+
\end{figure}
|
| 623 |
+
|
| 624 |
+
\subsection{KV Cache Warm-Start Performance}
|
| 625 |
+
\label{sec:ttft}
|
| 626 |
+
|
| 627 |
+
Table~\ref{tab:ttft} shows TTFT speedup from KV cache restoration.
|
| 628 |
+
The EGR fingerprint overhead ranges from 9.5\,ms (3B) to 30.6\,ms (8B).
|
| 629 |
+
|
| 630 |
+
% ── Table 3: TTFT ────────────────────────────────────────────────────
|
| 631 |
+
\begin{table}[t]
|
| 632 |
+
\centering
|
| 633 |
+
\caption{KV cache warm-start performance.}
|
| 634 |
+
\label{tab:ttft}
|
| 635 |
+
\small
|
| 636 |
+
\begin{tabular}{lcccc}
|
| 637 |
+
\toprule
|
| 638 |
+
Model & Tokens & Cold & Warm & Speedup \\
|
| 639 |
+
\midrule
|
| 640 |
+
Llama 3.2 3B & 4K & 11.4\,s & 170\,ms & 67$\times$ \\
|
| 641 |
+
Llama 3.2 3B & 16K & 94.6\,s & 1.78\,s & 53$\times$ \\
|
| 642 |
+
Llama 3.1 8B & 591 & 3.51\,s & 116\,ms & 31$\times$ \\
|
| 643 |
+
\bottomrule
|
| 644 |
+
\end{tabular}
|
| 645 |
+
\end{table}
|
| 646 |
+
|
| 647 |
+
\begin{figure}[t]
|
| 648 |
+
\centering
|
| 649 |
+
\includegraphics[width=\columnwidth]{fig14_ttft_speedup.png}
|
| 650 |
+
\caption{KV cache warm-start: 27--67$\times$ TTFT speedup.}
|
| 651 |
+
\label{fig:ttft}
|
| 652 |
+
\end{figure}
|
| 653 |
+
|
| 654 |
+
\subsection{INT8 Compression and HNSW Indexing}
|
| 655 |
+
|
| 656 |
+
Figure~\ref{fig:int8} shows the impact of INT8 quantization: 1.97$\times$
|
| 657 |
+
size reduction with cosine similarity 0.99998 preserved. The retrieval
|
| 658 |
+
margin degrades from 0.381 to 0.262 but document ranking is preserved.
|
| 659 |
+
|
| 660 |
+
\begin{figure}[t]
|
| 661 |
+
\centering
|
| 662 |
+
\includegraphics[width=\columnwidth]{fig10_int8_compression.png}
|
| 663 |
+
\caption{INT8 quantization impact: 1.97$\times$ compression with
|
| 664 |
+
negligible quality loss.}
|
| 665 |
+
\label{fig:int8}
|
| 666 |
+
\end{figure}
|
| 667 |
+
|
| 668 |
+
\begin{figure}[t]
|
| 669 |
+
\centering
|
| 670 |
+
\includegraphics[width=\columnwidth]{fig09_hnsw_benchmark.png}
|
| 671 |
+
\caption{HNSW index benchmark: 5.65$\times$ speedup with no recall
|
| 672 |
+
loss at $N{=}200$.}
|
| 673 |
+
\label{fig:hnsw}
|
| 674 |
+
\end{figure}
|
| 675 |
+
|
| 676 |
+
Figure~\ref{fig:margin-dist} summarizes the margin statistics, showing
|
| 677 |
+
$f_0{+}f_1$ achieves +76\% higher mean margin than $f_1$ alone.
|
| 678 |
+
|
| 679 |
+
\begin{figure}[t]
|
| 680 |
+
\centering
|
| 681 |
+
\includegraphics[width=\columnwidth]{fig12_margin_distribution.png}
|
| 682 |
+
\caption{Margin statistics: $f_0{+}f_1$ vs.\ $f_1$ at $N{=}200$.}
|
| 683 |
+
\label{fig:margin-dist}
|
| 684 |
+
\end{figure}
|
| 685 |
+
|
| 686 |
+
\begin{figure}[t]
|
| 687 |
+
\centering
|
| 688 |
+
\includegraphics[width=\columnwidth]{fig15_egr_overhead.png}
|
| 689 |
+
\caption{EGR fingerprint extraction overhead vs.\ context length.
|
| 690 |
+
16 layers (8--24): 30\,ms at 600\,tokens, 49\,ms at 6.4K.}
|
| 691 |
+
\label{fig:egr-overhead}
|
| 692 |
+
\end{figure}
|
| 693 |
+
|
| 694 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 695 |
+
\section{Discussion}
|
| 696 |
+
\label{sec:discussion}
|
| 697 |
+
|
| 698 |
+
\subsection{Why Fourier?}
|
| 699 |
+
|
| 700 |
+
The DFT along the layer dimension captures the \emph{spectral
|
| 701 |
+
structure} of how key representations evolve through the network. $f_0$
|
| 702 |
+
is the mean activation pattern (what the model consistently attends to);
|
| 703 |
+
$f_1$ is the dominant oscillation (how attention shifts between layers).
|
| 704 |
+
Together they form a spectral signature that is:
|
| 705 |
+
|
| 706 |
+
\begin{itemize}[leftmargin=*,itemsep=1pt]
|
| 707 |
+
\item \textbf{Architecture-invariant:} the DFT normalizes away layer
|
| 708 |
+
count differences (3B: 28 layers; 8B: 32 layers).
|
| 709 |
+
\item \textbf{Corpus-independent:} no training data or learned basis
|
| 710 |
+
needed.
|
| 711 |
+
\item \textbf{Fast:} a single DFT over $L{=}32$ vectors, $<50$\,ms.
|
| 712 |
+
\end{itemize}
|
| 713 |
+
|
| 714 |
+
\subsection{Complementary Methods}
|
| 715 |
+
|
| 716 |
+
A production system should use multiple retrieval strategies:
|
| 717 |
+
|
| 718 |
+
\begin{table}[t]
|
| 719 |
+
\centering
|
| 720 |
+
\caption{Recommended method selection by scenario.}
|
| 721 |
+
\label{tab:complementary}
|
| 722 |
+
\small
|
| 723 |
+
\begin{tabular}{lcc}
|
| 724 |
+
\toprule
|
| 725 |
+
Scenario & Method & Margin \\
|
| 726 |
+
\midrule
|
| 727 |
+
Same-model retrieval & Fourier $f_0{+}f_1$ & 0.007 \\
|
| 728 |
+
Cross-model retrieval & \fcdb{} & 0.124 \\
|
| 729 |
+
Same-model, dense & Per-doc SVD + gating & 0.519 \\
|
| 730 |
+
\bottomrule
|
| 731 |
+
\end{tabular}
|
| 732 |
+
\end{table}
|
| 733 |
+
|
| 734 |
+
Fourier $f_0{+}f_1$ is the default (any $N$, same-model). \fcdb{}
|
| 735 |
+
activates only for cross-model queries at small $N$. Per-doc SVD
|
| 736 |
+
remains the strongest discriminator for known same-model pairs.
|
| 737 |
+
|
| 738 |
+
\subsection{Limitations}
|
| 739 |
+
|
| 740 |
+
\begin{enumerate}[leftmargin=*,itemsep=1pt]
|
| 741 |
+
\item \textbf{Consumer hardware only.} All results on Apple M3 with
|
| 742 |
+
Q4\_K\_M. Behavior on FP16/FP32 or datacenter GPUs is untested.
|
| 743 |
+
\item \textbf{Corpus scale.} $N{=}200$ is research-scale. The power law
|
| 744 |
+
predicts continued degradation at $N{=}10\text{K}+$ but no cliff.
|
| 745 |
+
\item \textbf{\fcdb{} collapse.} Cross-model transfer limited to
|
| 746 |
+
$N < 100$. Hierarchical \fcdb{} (domain-specific subcorpora) may
|
| 747 |
+
extend this.
|
| 748 |
+
\item \textbf{Architecture coverage.} Tested on Llama and Qwen. Mamba,
|
| 749 |
+
RWKV, and non-Transformer architectures are unsupported.
|
| 750 |
+
\end{enumerate}
|
| 751 |
+
|
| 752 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 753 |
+
\section{Related Systems Positioning}
|
| 754 |
+
\label{sec:positioning}
|
| 755 |
+
|
| 756 |
+
\begin{table}[t]
|
| 757 |
+
\centering
|
| 758 |
+
\caption{Comparison with existing KV cache systems. Only \engram{}
|
| 759 |
+
combines persistent storage, semantic retrieval, cross-model transfer,
|
| 760 |
+
and an agent API.}
|
| 761 |
+
\label{tab:systems}
|
| 762 |
+
\small
|
| 763 |
+
\begin{tabular}{lccccc}
|
| 764 |
+
\toprule
|
| 765 |
+
System & Persist & Semantic & Cross & Agent \\
|
| 766 |
+
\midrule
|
| 767 |
+
LMCache & disk/S3 & \xmark & \xmark & \xmark \\
|
| 768 |
+
TurboRAG & \xmark & \xmark & \xmark & \xmark \\
|
| 769 |
+
agent-mem & safetens & \xmark & \xmark & \cmark \\
|
| 770 |
+
MemArt & \xmark & latent & \xmark & \xmark \\
|
| 771 |
+
\rowcolor{green!10}
|
| 772 |
+
\textbf{\engram{}} & \textbf{.eng} & \textbf{Fourier} & \textbf{\fcdb{}} & \textbf{MCP} \\
|
| 773 |
+
\bottomrule
|
| 774 |
+
\end{tabular}
|
| 775 |
+
\end{table}
|
| 776 |
+
|
| 777 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 778 |
+
\section{Conclusion}
|
| 779 |
+
\label{sec:conclusion}
|
| 780 |
+
|
| 781 |
+
\engram{} demonstrates that LLM KV caches contain recoverable geometric
|
| 782 |
+
structure sufficient for cross-session semantic retrieval. The Fourier
|
| 783 |
+
fingerprint ($f_0{+}f_1$) achieves 98\% Recall@1 at $N{=}200$ with
|
| 784 |
+
power-law degradation (no collapse), while the geodesic pipeline reaches
|
| 785 |
+
100\% with confidence tracking. Cross-model transfer via \fcdb{}
|
| 786 |
+
succeeds without learned adapters, validated by CKA isomorphism $> 0.92$
|
| 787 |
+
across model families. All of this runs on consumer hardware at
|
| 788 |
+
sub-millisecond search latency (51.8\,$\mu$s).
|
| 789 |
+
|
| 790 |
+
The \eigengram{} format (\texttt{.eng}\,v1.2) provides the first
|
| 791 |
+
persistent, fingerprinted, cross-architecture document certificate for
|
| 792 |
+
LLM session states. The MCP integration enables any agent session to
|
| 793 |
+
store and retrieve memories via semantic similarity --- the protocol
|
| 794 |
+
using itself as its own memory substrate.
|
| 795 |
+
|
| 796 |
+
\subsection*{Future Work}
|
| 797 |
+
|
| 798 |
+
INT4 quantization (target: 200\,MB \texttt{.eng}), hierarchical \fcdb{}
|
| 799 |
+
for $N > 1000$, cross-architecture transfer (Mamba, RWKV), and
|
| 800 |
+
federated \texttt{.eng} sharing across agent networks.
|
| 801 |
+
|
| 802 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 803 |
+
% REFERENCES
|
| 804 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 805 |
+
\bibliographystyle{plainnat}
|
| 806 |
+
|
| 807 |
+
\begin{thebibliography}{20}
|
| 808 |
+
|
| 809 |
+
\bibitem[{LMCache Team}(2025)]{lmcache}
|
| 810 |
+
{LMCache Team}.
|
| 811 |
+
\newblock LMCache: Multi-tier KV cache management for LLM serving.
|
| 812 |
+
\newblock \url{https://github.com/LMCache/LMCache}, 2025.
|
| 813 |
+
|
| 814 |
+
\bibitem[{Lu et~al.}(2025)]{turborag}
|
| 815 |
+
Lu, F., Chen, Y., et~al.
|
| 816 |
+
\newblock TurboRAG: Accelerating retrieval-augmented generation with
|
| 817 |
+
pre-computed KV caches.
|
| 818 |
+
\newblock \emph{arXiv preprint arXiv:2501.xxxx}, 2025.
|
| 819 |
+
|
| 820 |
+
\bibitem[{Zhang et~al.}(2026)]{fusionrag}
|
| 821 |
+
Zhang, W., et~al.
|
| 822 |
+
\newblock FusionRAG: Selective KV cache recomputation for RAG quality
|
| 823 |
+
preservation.
|
| 824 |
+
\newblock \emph{arXiv preprint arXiv:2601.12904}, 2026.
|
| 825 |
+
|
| 826 |
+
\bibitem[{Sun et~al.}(2025)]{shadowkv}
|
| 827 |
+
Sun, H., et~al.
|
| 828 |
+
\newblock ShadowKV: KV cache in shadows at the speed of light.
|
| 829 |
+
\newblock In \emph{ICML}, 2025. Spotlight.
|
| 830 |
+
|
| 831 |
+
\bibitem[{Zhang et~al.}(2025)]{xkv}
|
| 832 |
+
Zhang, Y., et~al.
|
| 833 |
+
\newblock xKV: Cross-layer SVD for KV cache compression.
|
| 834 |
+
\newblock \emph{arXiv preprint arXiv:2503.18893}, 2025.
|
| 835 |
+
|
| 836 |
+
\bibitem[{Liu et~al.}(2024)]{kivi}
|
| 837 |
+
Liu, Z., et~al.
|
| 838 |
+
\newblock KIVI: A tuning-free asymmetric 2bit quantization for KV cache.
|
| 839 |
+
\newblock In \emph{ICML}, 2024.
|
| 840 |
+
|
| 841 |
+
\bibitem[{Wang et~al.}(2026)]{memart}
|
| 842 |
+
Wang, X., et~al.
|
| 843 |
+
\newblock MemArt: Memorize and retrieve from latent space for efficient
|
| 844 |
+
conversational KV cache reuse.
|
| 845 |
+
\newblock In \emph{ICLR}, 2026. Submission.
|
| 846 |
+
|
| 847 |
+
\bibitem[{Harrison}(2026)]{agentmemory}
|
| 848 |
+
Harrison, C.
|
| 849 |
+
\newblock agent-memory: Persistent KV cache for LLM agents on Apple
|
| 850 |
+
Silicon.
|
| 851 |
+
\newblock \emph{arXiv preprint arXiv:2603.04428}, 2026.
|
| 852 |
+
|
| 853 |
+
\bibitem[{Kornblith et~al.}(2019)]{kornblith2019}
|
| 854 |
+
Kornblith, S., Norouzi, M., Lee, H., and Hinton, G.
|
| 855 |
+
\newblock Similarity of neural network representations revisited.
|
| 856 |
+
\newblock In \emph{ICML}, 2019.
|
| 857 |
+
|
| 858 |
+
\bibitem[{Moschella et~al.}(2023)]{moschella2023}
|
| 859 |
+
Moschella, L., et~al.
|
| 860 |
+
\newblock Relative representations enable zero-shot latent space
|
| 861 |
+
communication.
|
| 862 |
+
\newblock In \emph{ICLR}, 2023.
|
| 863 |
+
|
| 864 |
+
\bibitem[{TurboQuant Team}(2026)]{turboquant}
|
| 865 |
+
Behrouz, A., et~al.
|
| 866 |
+
\newblock TurboQuant: Online vector quantization for KV cache.
|
| 867 |
+
\newblock In \emph{ICLR}, 2026.
|
| 868 |
+
|
| 869 |
+
\bibitem[{RAGCache Team}(2025)]{ragcache}
|
| 870 |
+
Jin, C., et~al.
|
| 871 |
+
\newblock RAGCache: Efficient knowledge caching for retrieval-augmented
|
| 872 |
+
generation.
|
| 873 |
+
\newblock \emph{ACM TOCS}, 2025.
|
| 874 |
+
|
| 875 |
+
\end{thebibliography}
|
| 876 |
+
|
| 877 |
+
% ══════════════════════════════════════════════════════════════════════
|
| 878 |
+
% APPENDIX
|
| 879 |
+
% ═══════════════════════════════════��══════════════════════════════════
|
| 880 |
+
\appendix
|
| 881 |
+
|
| 882 |
+
\section{Geodesic Retrieval Pseudocode}
|
| 883 |
+
\label{app:pseudocode}
|
| 884 |
+
|
| 885 |
+
\begin{algorithm}[H]
|
| 886 |
+
\caption{Geodesic Retrieval (4 stages)}
|
| 887 |
+
\label{alg:geodesic}
|
| 888 |
+
\begin{algorithmic}[1]
|
| 889 |
+
\Require Query fingerprint $\mathbf{q}$, HNSW index $\mathcal{I}$, IndexC $\mathcal{C}$
|
| 890 |
+
\Ensure Retrieved document ID, confidence level
|
| 891 |
+
|
| 892 |
+
\State \textbf{Stage 0: Prior Preemption}
|
| 893 |
+
\If{$\mathcal{C}.\text{is\_chronic\_failure}(\mathbf{q})$}
|
| 894 |
+
\State \Return $\bot$, LOW
|
| 895 |
+
\EndIf
|
| 896 |
+
|
| 897 |
+
\State \textbf{Stage 1: HNSW Search}
|
| 898 |
+
\State $\{(d_1, s_1), \ldots, (d_k, s_k)\} \gets \mathcal{I}.\text{search}(\mathbf{q}, k)$
|
| 899 |
+
\State $\text{margin} \gets s_1 - s_2$
|
| 900 |
+
\If{$\text{margin} > \tau_\text{high}$}
|
| 901 |
+
\State \Return $d_1$, HIGH
|
| 902 |
+
\ElsIf{$\text{margin} > \tau_\text{med}$}
|
| 903 |
+
\State \Return $d_1$, MEDIUM
|
| 904 |
+
\EndIf
|
| 905 |
+
|
| 906 |
+
\State \textbf{Stage 2: Trajectory Correction}
|
| 907 |
+
\State $\mathbf{q}' \gets (1-w)\mathbf{q} + w\,\mathbf{fp}_{d_1}$
|
| 908 |
+
\State Re-search with $\mathbf{q}'$
|
| 909 |
+
|
| 910 |
+
\State \textbf{Stage 3: Negative Constraints}
|
| 911 |
+
\State Exclude known-incorrect candidates from $\mathcal{C}$
|
| 912 |
+
|
| 913 |
+
\State \textbf{Stage 4: Metadata Disambiguation}
|
| 914 |
+
\State Score by domain overlap, keyword match, norm similarity
|
| 915 |
+
\State \Return best candidate, LOW
|
| 916 |
+
\end{algorithmic}
|
| 917 |
+
\end{algorithm}
|
| 918 |
+
|
| 919 |
+
\section{EIGENGRAM Format Specification}
|
| 920 |
+
\label{app:eigengram}
|
| 921 |
+
|
| 922 |
+
\begin{table}[H]
|
| 923 |
+
\centering
|
| 924 |
+
\caption{EIGENGRAM v1.2 binary layout.}
|
| 925 |
+
\small
|
| 926 |
+
\begin{tabular}{lcl}
|
| 927 |
+
\toprule
|
| 928 |
+
Field & Bytes & Description \\
|
| 929 |
+
\midrule
|
| 930 |
+
Magic & 4 & \texttt{0x454E4752} (``ENGR'') \\
|
| 931 |
+
Version & 2 & Major.Minor (1.2) \\
|
| 932 |
+
Arch ID & 2 & Architecture enum \\
|
| 933 |
+
Layers & 2 & Number of layers \\
|
| 934 |
+
Head dim & 2 & Per-head dimension \\
|
| 935 |
+
FP vector & $2 \times d \times 2$ & $f_0{+}f_1$ (float16) \\
|
| 936 |
+
Metadata & variable & JSON (model, timestamp, \ldots) \\
|
| 937 |
+
\bottomrule
|
| 938 |
+
\end{tabular}
|
| 939 |
+
\end{table}
|
| 940 |
+
|
| 941 |
+
\section{Supported Architectures}
|
| 942 |
+
\label{app:architectures}
|
| 943 |
+
|
| 944 |
+
\begin{table}[H]
|
| 945 |
+
\centering
|
| 946 |
+
\caption{Multi-architecture support in \engram{}.}
|
| 947 |
+
\small
|
| 948 |
+
\begin{tabular}{lcccc}
|
| 949 |
+
\toprule
|
| 950 |
+
Architecture & Layers & KV Heads & Head Dim & Attention \\
|
| 951 |
+
\midrule
|
| 952 |
+
Llama 3.2 3B & 28 & 8 & 128 & GQA \\
|
| 953 |
+
Llama 3.1 8B & 32 & 8 & 128 & GQA \\
|
| 954 |
+
Gemma 2 & 26 & 8 & 256 & GQA \\
|
| 955 |
+
Gemma 4 26B & 30 & 16 & 128 & ISWA \\
|
| 956 |
+
Phi-3 Mini & 32 & 8 & 96 & GQA \\
|
| 957 |
+
Qwen 2.5 7B & 28 & 4 & 128 & GQA \\
|
| 958 |
+
Mistral 7B & 32 & 8 & 128 & GQA \\
|
| 959 |
+
\bottomrule
|
| 960 |
+
\end{tabular}
|
| 961 |
+
\end{table}
|
| 962 |
+
|
| 963 |
+
\section{Compass Artifact: Genesis of ENGRAM}
|
| 964 |
+
\label{app:genesis}
|
| 965 |
+
|
| 966 |
+
This work originated from a systematic deep-research analysis of the KV
|
| 967 |
+
cache management landscape, conducted via Perplexity Pro deploying 7
|
| 968 |
+
sub-agents across 686 sources in 14 minutes. The analysis assessed seven
|
| 969 |
+
critical research targets:
|
| 970 |
+
|
| 971 |
+
\begin{enumerate}[leftmargin=*,itemsep=1pt]
|
| 972 |
+
\item[\textbf{T1.}] \textbf{KV tensor extraction:} No public API
|
| 973 |
+
exposes structured KV tensors from llama.cpp or Ollama. \engram{}
|
| 974 |
+
built a blob parser and multi-architecture registry.
|
| 975 |
+
|
| 976 |
+
\item[\textbf{T2.}] \textbf{FAISS retrieval:} Works for K$\to$K
|
| 977 |
+
similarity, fails catastrophically for Q$\to$K. \engram{} uses
|
| 978 |
+
K$\to$K cosine similarity via Fourier fingerprints.
|
| 979 |
+
|
| 980 |
+
\item[\textbf{T3.}] \textbf{Pre-RoPE keys:} ShadowKV (ICML\,2025)
|
| 981 |
+
validates that pre-RoPE keys have the sharpest SVD decay. \engram{}
|
| 982 |
+
extracts pre-RoPE keys in the 8--24 layer band.
|
| 983 |
+
|
| 984 |
+
\item[\textbf{T4.}] \textbf{Quantization:} QJL hurts in practice
|
| 985 |
+
(6+ independent confirmations). \engram{} uses INT8 per-row symmetric
|
| 986 |
+
quantization.
|
| 987 |
+
|
| 988 |
+
\item[\textbf{T5.}] \textbf{Competitive landscape:} No existing system
|
| 989 |
+
combines persistent storage, semantic retrieval, cross-model transfer,
|
| 990 |
+
and agent-native APIs. \emph{This is the gap \engram{} fills.}
|
| 991 |
+
|
| 992 |
+
\item[\textbf{T6.}] \textbf{TTFT benchmarks:} Target was $>$10$\times$
|
| 993 |
+
at 16K context. \engram{} achieved 30--67$\times$ across configurations.
|
| 994 |
+
|
| 995 |
+
\item[\textbf{T7.}] \textbf{Serialization:} Safetensors is converging
|
| 996 |
+
as the ecosystem standard. \engram{} designed a custom format
|
| 997 |
+
(\texttt{.eng}\,v1.2) optimized for $<$800\,byte document certificates.
|
| 998 |
+
\end{enumerate}
|
| 999 |
+
|
| 1000 |
+
The compass artifact (ID: \texttt{wf-790728d4}) was produced after
|
| 1001 |
+
reading the TurboQuant paper from Google Research (ICLR\,2026). The
|
| 1002 |
+
entire \engram{} system was built from this starting point in three
|
| 1003 |
+
sessions across two days, using Claude~4.6 Sonnet (Thinking) and
|
| 1004 |
+
Claude Code Opus~4.6 at maximum effort.
|
| 1005 |
+
|
| 1006 |
+
\vspace{1em}
|
| 1007 |
+
\noindent\rule{\columnwidth}{0.4pt}
|
| 1008 |
+
\begin{center}
|
| 1009 |
+
\small\textit{220 tests passing. 6,181 knowledge vectors indexed.\\
|
| 1010 |
+
The protocol proves its own paper existed.\\
|
| 1011 |
+
--- Enigma, April 2026}
|
| 1012 |
+
\end{center}
|
| 1013 |
+
|
| 1014 |
+
\end{document}
|