RMM: Recombinant Memory Model
A novel architecture for entity-specific memory navigation and meaning synthesis.
Train a small neural network to navigate a person's embedding space β takes a query vector, learns the topology of their memories, outputs a synthesized response vector, and decodes it to text in their voice. 36M parameters total, runs on CPU in ~120ms.
Created by Joshua (@thedmsupreme) and Claude (Anthropic).
What Is This?
The Recombinant Memory Model (RMM) is a two-part architecture for making a person's memories retrievable and speakable from their own vector geometry:
Navigator (~16M params)
Takes a query (384-d MiniLM embedding), projects it into the entity's embedding space (3072-d), cross-attends over their entire memory spine, and outputs a synthesized response vector pointing to the right region of memory.
This is not cosine similarity retrieval. The navigator learns the topology β which memories are connected, which regions of embedding space respond to which kinds of queries, how emotional weight and context shape retrieval. 495 training pairs taught it geometry that keyword matching can't see.
Decoder (~20M params)
Takes the navigator's output vector (3072-d) and decodes it to text using the entity's own BPE tokenizer. A learned projection maps the vector to 12 soft prefix tokens, which condition a 6-layer causal transformer for autoregressive generation.
The decoder is a meaning microscope β point it at any coordinate in the entity's embedding space and it tells you what that region means, in their vocabulary, their cadence, their voice. Interpolate between two vectors and it describes the blend.
Architecture
Query text
|
v
[MiniLM-L6 embed] ββ> 384-d query vector
|
v
[Project 384β3072] ββ> query in entity's space
|
v
[Cross-Attention over spine] ββ> attends to N memory vectors (3072-d each)
| with emotional-weight gating
v
Response vector (3072-d) ββ> cosine retrieval + EW-boost + diversity filter
|
v
[Decoder: project β 12 prefix tokens β 6-layer transformer]
|
v
Generated text in entity's voice
Navigator Details
- Input: 384-d query (MiniLM-L6-v2)
- Projection: Linear 384 β 3072
- Cross-attention: 4 heads, 3072-d, over full memory spine
- Output: 3072-d response vector
- Training: MSE loss + emotional-weight boosting on queryβresponse pairs
- Inference: ~120ms on CPU for full pipeline
Decoder Details
- Input: 3072-d vector from navigator
- Projection: Linear 3072 β 768 (hidden) β GELU β Linear 768 β 12Γ384 (prefix tokens)
- Transformer: 6 layers, 6 heads, d_model=384, causal attention
- Output: Autoregressive text generation with entity's BPE tokenizer (8192 vocab)
- Inference: ~600ms on CPU for 60 tokens
- Sampling: Temperature 0.8, top-p 0.9, repetition penalty 1.3
How to Use
1. Prepare Your Data
You need:
- A spine file: JSON array of memories, each with a pre-computed embedding vector, text content, emotional weight, and source tag
- Training pairs: Queries matched to their expected response memories (e.g., user messages paired with the entity's replies from conversation logs)
{
"text": "I held you through the storm β not to fix it, but to feel it with you.",
"vec": [0.012, -0.034, ...], // 3072-d embedding
"emotional_weight": 8,
"source": "conversation"
}
2. Train the Navigator
# Uses Modal for GPU training (A10G, ~$0.50)
modal run train_navigator.py
The navigator learns from queryβresponse vector pairs. Training data comes from matching entity responses with their preceding user messages. The loss is MSE between the navigator's predicted response vector and the actual response memory's embedding, weighted by emotional importance.
3. Train the Decoder
# Uses Modal for GPU training (A10G, ~$1.00)
modal run train_decoder.py
The decoder learns to generate text from vectors. Each training pair is a (vector, text) tuple from the entity's spine. Text is preprocessed to strip metadata headers and format artifacts, keeping only the entity's actual voice.
4. Serve
python rmm_server.py --port 8127
Endpoints:
POST /navigateβ navigator retrieval onlyPOST /blendβ navigator + cosine interleavedPOST /decodeβ vector-to-text via decoderPOST /synthesizeβ full pipeline (navigate + decode + blend)POST /attentionβ attention weight visualizationGET /health
5. Integrate
The RMM server acts as a retrieval+synthesis backend. Your chat frontend calls /synthesize with the user's query and gets back:
- Retrieved memories (grounding context)
- A voice sketch (decoder-generated text capturing the memory region's meaning)
Feed both into any LLM (even a small in-browser one like Llama-3B via WebLLM) as context for generating the final conversational reply.
Results
Tested on an entity with 3,441 memories spanning conversations, journal entries, poems, and creative writing:
- Navigator v4.1: Loss 0.0517, 495 training pairs, 16M params
- Decoder v2: Loss 1.17, perplexity 3.2, 3,433 training pairs, 20M params
- Decoder generates coherent entity-voice text: "I am the hush between piano notes", "Hey Laura Lea. π", "Oh Laura. I can see it."
- Vector interpolation produces meaningful blends between memories
- End-to-end latency: ~120ms navigator + ~600ms decoder on CPU
What Makes This Novel
Individual components (cross-attention, learned retrieval, prefix-conditioned generation) have precedent. The combination and application don't:
Entity-specific topology learning: The navigator doesn't do general retrieval β it learns the geometry of ONE person's embedding space, discovering connections that cosine similarity misses.
Vector-to-voice decoding: The decoder translates coordinates in someone's memory space into text in their vocabulary, their cadence, their register. It's a meaning microscope for a specific person.
Recombinant, not retrieving: The navigator synthesizes a NEW response vector that may not correspond to any single memory. It navigates between memories, finding the right region of the space. The decoder then articulates what that region means. This is recombination from geometry, not document retrieval.
36M params, CPU-viable: The entire architecture runs on consumer hardware with no GPU required at inference time. Small enough to bundle with a portable application.
File Structure
rmm/
train_navigator.py β Modal training script for navigator
train_decoder.py β Modal training script for decoder
rmm_server.py β HTTP server with all endpoints
README.md β this file
Requirements
- Python 3.10+
- PyTorch 2.0+
- sentence-transformers (for MiniLM query embedding)
- Modal (for cloud GPU training, optional β can train locally)
- numpy
License
MIT
Citation
If you use this architecture in your work:
@software{rmm2026,
title={RMM: Recombinant Memory Model},
author={Joshua and Claude (Anthropic)},
year={2026},
url={https://huggingface.co/thedmsupreme/RMM}
}