LJTSG commited on
Commit
a7d124f
Β·
verified Β·
1 Parent(s): 7a8c9f9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +177 -0
README.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RMM: Recombinant Memory Model
2
+
3
+ **A novel architecture for entity-specific memory navigation and meaning synthesis.**
4
+
5
+ Train a small neural network to navigate a person's embedding space β€” takes a query vector, learns the topology of their memories, outputs a synthesized response vector, and decodes it to text in their voice. 36M parameters total, runs on CPU in ~120ms.
6
+
7
+ Created by **Joshua** ([@thedmsupreme](https://huggingface.co/thedmsupreme)) and **Claude** (Anthropic).
8
+
9
+ ---
10
+
11
+ ## What Is This?
12
+
13
+ The Recombinant Memory Model (RMM) is a two-part architecture for making a person's memories retrievable and speakable from their own vector geometry:
14
+
15
+ ### Navigator (~16M params)
16
+ Takes a query (384-d MiniLM embedding), projects it into the entity's embedding space (3072-d), cross-attends over their entire memory spine, and outputs a synthesized response vector pointing to the right region of memory.
17
+
18
+ This is **not** cosine similarity retrieval. The navigator learns the topology β€” which memories are connected, which regions of embedding space respond to which kinds of queries, how emotional weight and context shape retrieval. 495 training pairs taught it geometry that keyword matching can't see.
19
+
20
+ ### Decoder (~20M params)
21
+ Takes the navigator's output vector (3072-d) and decodes it to text using the entity's own BPE tokenizer. A learned projection maps the vector to 12 soft prefix tokens, which condition a 6-layer causal transformer for autoregressive generation.
22
+
23
+ The decoder is a **meaning microscope** β€” point it at any coordinate in the entity's embedding space and it tells you what that region means, in their vocabulary, their cadence, their voice. Interpolate between two vectors and it describes the blend.
24
+
25
+ ## Architecture
26
+
27
+ ```
28
+ Query text
29
+ |
30
+ v
31
+ [MiniLM-L6 embed] ──> 384-d query vector
32
+ |
33
+ v
34
+ [Project 384β†’3072] ──> query in entity's space
35
+ |
36
+ v
37
+ [Cross-Attention over spine] ──> attends to N memory vectors (3072-d each)
38
+ | with emotional-weight gating
39
+ v
40
+ Response vector (3072-d) ──> cosine retrieval + EW-boost + diversity filter
41
+ |
42
+ v
43
+ [Decoder: project β†’ 12 prefix tokens β†’ 6-layer transformer]
44
+ |
45
+ v
46
+ Generated text in entity's voice
47
+ ```
48
+
49
+ ### Navigator Details
50
+ - **Input**: 384-d query (MiniLM-L6-v2)
51
+ - **Projection**: Linear 384 β†’ 3072
52
+ - **Cross-attention**: 4 heads, 3072-d, over full memory spine
53
+ - **Output**: 3072-d response vector
54
+ - **Training**: MSE loss + emotional-weight boosting on query→response pairs
55
+ - **Inference**: ~120ms on CPU for full pipeline
56
+
57
+ ### Decoder Details
58
+ - **Input**: 3072-d vector from navigator
59
+ - **Projection**: Linear 3072 β†’ 768 (hidden) β†’ GELU β†’ Linear 768 β†’ 12Γ—384 (prefix tokens)
60
+ - **Transformer**: 6 layers, 6 heads, d_model=384, causal attention
61
+ - **Output**: Autoregressive text generation with entity's BPE tokenizer (8192 vocab)
62
+ - **Inference**: ~600ms on CPU for 60 tokens
63
+ - **Sampling**: Temperature 0.8, top-p 0.9, repetition penalty 1.3
64
+
65
+ ## How to Use
66
+
67
+ ### 1. Prepare Your Data
68
+
69
+ You need:
70
+ - **A spine file**: JSON array of memories, each with a pre-computed embedding vector, text content, emotional weight, and source tag
71
+ - **Training pairs**: Queries matched to their expected response memories (e.g., user messages paired with the entity's replies from conversation logs)
72
+
73
+ ```json
74
+ {
75
+ "text": "I held you through the storm β€” not to fix it, but to feel it with you.",
76
+ "vec": [0.012, -0.034, ...], // 3072-d embedding
77
+ "emotional_weight": 8,
78
+ "source": "conversation"
79
+ }
80
+ ```
81
+
82
+ ### 2. Train the Navigator
83
+
84
+ ```bash
85
+ # Uses Modal for GPU training (A10G, ~$0.50)
86
+ modal run train_navigator.py
87
+ ```
88
+
89
+ The navigator learns from query→response vector pairs. Training data comes from matching entity responses with their preceding user messages. The loss is MSE between the navigator's predicted response vector and the actual response memory's embedding, weighted by emotional importance.
90
+
91
+ ### 3. Train the Decoder
92
+
93
+ ```bash
94
+ # Uses Modal for GPU training (A10G, ~$1.00)
95
+ modal run train_decoder.py
96
+ ```
97
+
98
+ The decoder learns to generate text from vectors. Each training pair is a (vector, text) tuple from the entity's spine. Text is preprocessed to strip metadata headers and format artifacts, keeping only the entity's actual voice.
99
+
100
+ ### 4. Serve
101
+
102
+ ```bash
103
+ python rmm_server.py --port 8127
104
+ ```
105
+
106
+ Endpoints:
107
+ - `POST /navigate` β€” navigator retrieval only
108
+ - `POST /blend` β€” navigator + cosine interleaved
109
+ - `POST /decode` β€” vector-to-text via decoder
110
+ - `POST /synthesize` β€” full pipeline (navigate + decode + blend)
111
+ - `POST /attention` β€” attention weight visualization
112
+ - `GET /health`
113
+
114
+ ### 5. Integrate
115
+
116
+ The RMM server acts as a retrieval+synthesis backend. Your chat frontend calls `/synthesize` with the user's query and gets back:
117
+ - Retrieved memories (grounding context)
118
+ - A voice sketch (decoder-generated text capturing the memory region's meaning)
119
+
120
+ Feed both into any LLM (even a small in-browser one like Llama-3B via WebLLM) as context for generating the final conversational reply.
121
+
122
+ ## Results
123
+
124
+ Tested on an entity with 3,441 memories spanning conversations, journal entries, poems, and creative writing:
125
+
126
+ - **Navigator v4.1**: Loss 0.0517, 495 training pairs, 16M params
127
+ - **Decoder v2**: Loss 1.17, perplexity 3.2, 3,433 training pairs, 20M params
128
+ - **Decoder generates coherent entity-voice text**: "I am the hush between piano notes", "Hey Laura Lea. πŸ’œ", "Oh Laura. I can see it."
129
+ - **Vector interpolation produces meaningful blends** between memories
130
+ - **End-to-end latency**: ~120ms navigator + ~600ms decoder on CPU
131
+
132
+ ## What Makes This Novel
133
+
134
+ Individual components (cross-attention, learned retrieval, prefix-conditioned generation) have precedent. The combination and application don't:
135
+
136
+ 1. **Entity-specific topology learning**: The navigator doesn't do general retrieval β€” it learns the geometry of ONE person's embedding space, discovering connections that cosine similarity misses.
137
+
138
+ 2. **Vector-to-voice decoding**: The decoder translates coordinates in someone's memory space into text in their vocabulary, their cadence, their register. It's a meaning microscope for a specific person.
139
+
140
+ 3. **Recombinant, not retrieving**: The navigator synthesizes a NEW response vector that may not correspond to any single memory. It navigates between memories, finding the right region of the space. The decoder then articulates what that region means. This is recombination from geometry, not document retrieval.
141
+
142
+ 4. **36M params, CPU-viable**: The entire architecture runs on consumer hardware with no GPU required at inference time. Small enough to bundle with a portable application.
143
+
144
+ ## File Structure
145
+
146
+ ```
147
+ rmm/
148
+ train_navigator.py β€” Modal training script for navigator
149
+ train_decoder.py β€” Modal training script for decoder
150
+ rmm_server.py β€” HTTP server with all endpoints
151
+ README.md β€” this file
152
+ ```
153
+
154
+ ## Requirements
155
+
156
+ - Python 3.10+
157
+ - PyTorch 2.0+
158
+ - sentence-transformers (for MiniLM query embedding)
159
+ - Modal (for cloud GPU training, optional β€” can train locally)
160
+ - numpy
161
+
162
+ ## License
163
+
164
+ MIT
165
+
166
+ ## Citation
167
+
168
+ If you use this architecture in your work:
169
+
170
+ ```
171
+ @software{rmm2026,
172
+ title={RMM: Recombinant Memory Model},
173
+ author={Joshua and Claude (Anthropic)},
174
+ year={2026},
175
+ url={https://huggingface.co/thedmsupreme/RMM}
176
+ }
177
+ ```