memory-augmented-generation

Sleeping

App Files Files Community

Pavantej commited on Dec 21, 2025

Commit

fbca19f

verified ·

1 Parent(s): b75d8ec

Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

ESSAY.md +0 -256
README.md +0 -74
app.py +214 -232
memory_store.py +0 -86
miras_memory.py +0 -113
projections.py +0 -83

ESSAY.md CHANGED Viewed

@@ -1,258 +1,3 @@
-<<<<<<< HEAD
-# When Models Learn While Thinking
----
-## 01 / The Frozen Calculator Problem
-Every conversation you've had with ChatGPT, Claude, or any large language model follows the same pattern: the model thinks, predicts, and forgets. The weights that determine its behavior were set months ago, frozen in place after training. When you correct it, when you teach it something new, when you have a breakthrough conversation—none of that changes the model itself.
-This isn't a bug. It's the architecture.
-The model can *simulate* learning through in-context adaptation. It can act like it remembers. But the parameters that define its cognition remain untouched. When the context window ends, so does the illusion.
-This demo breaks that pattern.
----
-## 02 / What This Actually Does
-This is a minimal reimplementation of two recent papers: **Titans** (test-time training) and **MIRAS** (associative memory with attentional bias). Together, they demonstrate something most production LLMs don't do: **learning during inference**.
-The architecture is simple:
-- A frozen language model (distilgpt2) generates text
-- Hidden states from that model are projected into a memory space
-- An associative memory module predicts what it should remember
-- The prediction error drives gradient descent
-- The memory weights update
-- The updated state persists to disk
-Every message you send changes the model's internal representations. Not through prompt engineering. Not through retrieval. Through actual gradient-based optimization—at inference time.
----
-## 03 / The Text Doesn't Matter
-If you interact with this demo, you'll notice the text responses are... not good. Random. Sometimes incoherent. This is intentional.
-The text generator (distilgpt2) is frozen. We're not training it. The responses reflect what a small, untuned model produces when asked to continue arbitrary text. That's not the point.
-**The point is the numbers below each response.**
-Watch the loss. When you send the same message multiple times, the loss decreases. The memory is learning to predict the hidden state patterns associated with that input. When you send something completely different, the loss spikes—the memory is surprised.
-This is test-time learning. The model is changing itself while you use it.
----
-## 04 / What the Stats Mean
-Each response shows four metrics:
-**Loss**: How surprised the memory is. Lower means the pattern is familiar. Higher means it's novel. This is the prediction error that drives learning.
-**Retention**: A multiplier on the learning rate. When loss is high (surprising input), retention is high (2.0x). The memory learns more aggressively from surprising events. This is the retention gate—a simple mechanism inspired by how human memory prioritizes novelty.
-**Updates**: The total number of times the memory has been updated. This persists across sessions. Refresh the page, send another message, and the count continues. The memory doesn't reset.
-**Avg Loss**: The running average of all losses. Over time, as the memory learns recurring patterns, this should trend downward.
-These aren't vanity metrics. They're the observable signature of gradient descent happening during inference.
----
-## 05 / The Two Papers
-**Titans** (2025) introduces test-time training for language models. The core idea: instead of freezing weights after pre-training, allow a subset of parameters to update during inference. This creates a feedback loop—think, predict, update, think differently next time—that doesn't exist in standard LLMs.
-**MIRAS** (2024) reframes attention mechanisms as implicit optimization problems. It shows that dot-product attention, RNNs, and linear transformers are all solving online optimization with a specific loss function (L2). By making the loss function explicit and tunable, you can change the memory behavior. Different losses produce different cognition.
-This demo combines both: Titans' test-time learning with MIRAS's associative memory framework.
----
-## 06 / What's Missing
-This is a minimal reimplementation. Several components from the papers are not included:
-**From Titans**:
-- Multi-layer test-time updates (we only update the memory module)
-- Task-specific memory partitioning (we use a single shared memory)
-- Adaptive learning rate schedules (we use a simple retention gate)
-**From MIRAS**:
-- Alternative loss functions (we use L2)
-- Multi-head memory (we use a single memory matrix)
-- Attention-based retrieval (we use direct key-value mapping)
-The goal was to demonstrate the core mechanism—learning during inference—not to replicate every detail. The full papers contain significantly more sophistication.
----
-## 07 / The Difference from Standard LLMs
-Standard LLMs (ChatGPT, Claude, GPT-4) do this:
-```
-Input → Frozen Weights → Output → Forget
-```
-This demo does this:
-```
-Input → Frozen LM → Hidden States → Memory (Learning) → Output → Save
-```
-The frozen LM provides the text generation. The memory provides the learning. They're decoupled.
-This matters because:
-- **Weights update during use** (not just during training)
-- **Memory persists across sessions** (not just within a context window)
-- **Learning is explicit** (not simulated through in-context adaptation)
-- **The system becomes different** after each interaction
-In-context learning is pattern matching. This is optimization.
----
-## 08 / What Problem This Solves
-The current paradigm for "adaptive" LLMs involves:
-- Vector databases for retrieval
-- Fine-tuning on user data (expensive, slow)
-- Prompt engineering (fragile, context-limited)
-- RAG systems (fetch, don't learn)
-None of these change the model itself. They work around the frozen weights.
-Test-time learning makes adaptation a first-class primitive. The model doesn't retrieve your preferences—it encodes them in its parameters. It doesn't simulate learning—it performs learning.
-This opens up:
-- **Personalization without fine-tuning** (the model adapts to you as you use it)
-- **Continual learning** (the model improves from every interaction)
-- **Transparent memory** (you can inspect what it learned)
-- **Efficient adaptation** (gradient descent is cheaper than retraining)
----
-## 09 / What This Means for AI's Future
-The industry is converging on a model: train once, deploy frozen, scale through retrieval. This works. But it's not the only path.
-Test-time learning suggests a different trajectory: models that are **living systems**, not static calculators. Systems that don't just respond to you—they change because of you.
-This has implications:
-- **Privacy**: Your data updates your local model, not a shared cloud model
-- **Efficiency**: Learning happens incrementally, not in massive retraining runs
-- **Alignment**: The model adapts to your values through interaction, not through RLHF on aggregate data
-- **Transparency**: You can see what the model learned, reset it, or fork it
-The tradeoff is complexity. A model that changes during use is harder to reason about, harder to debug, harder to guarantee. But the benefits—true personalization, continual improvement, user-specific adaptation—may be worth it.
----
-## 10 / The Retention Gate
-One detail worth highlighting: the retention gate.
-When the memory encounters a high-loss input (surprising, novel), it increases the learning rate. When it encounters a low-loss input (familiar, repeated), it decreases the learning rate.
-This is a simple heuristic, but it mirrors how human memory works. We remember surprising events more vividly than routine ones. The retention gate makes the memory selective—it learns more from what it doesn't already know.
-In this demo, retention is always 2.0x because the memory is fresh. Everything is surprising. After hundreds of interactions, you'd see retention vary—0.5x for familiar patterns, 2.0x for novel ones. The memory would become selective.
----
-## 11 / Why the Memory is Shared
-This demo uses a single, shared memory across all users. This is intentional.
-It demonstrates that the memory is not user-specific. It's a collective brain. Every user's input updates the same weight matrix. This makes the learning observable—you can see the loss decrease as the memory encounters repeated patterns from different users.
-In a production system, you'd likely use per-user memory. But for a demo, shared memory makes the learning more visible and the privacy implications simpler (there are none—no user data is stored).
----
-## 12 / The Bandwidth Constraint
-One reason LLMs feel static is that they operate at the wrong bandwidth. The only way to change their behavior is to retrain them—a process that costs millions and takes weeks. Users can't influence the model in real time.
-Test-time learning changes the bandwidth. The model updates with every message. The feedback loop tightens from months to milliseconds.
-This doesn't mean the model becomes smarter. It means the model becomes *responsive*. It adapts to the distribution of inputs it actually sees, not the distribution it was trained on.
----
-## 13 / What You're Actually Watching
-When you interact with this demo, you're not chatting with a model. You're watching a memory module learn to compress hidden state patterns into a 256-dimensional space.
-The text generation is a side effect. The real process is:
-- Extract hidden states from the frozen LM
-- Project them into memory space
-- Predict what the memory should encode
-- Compute the error
-- Update the weights
-- Save the new state
-This happens for every message. The memory is always learning. The loss is always updating. The system is always changing.
-That's the difference. Standard LLMs are frozen calculators. This is a living system.
----
-## 14 / The Horizon
-This demo is a proof of concept. It's not production-ready. It's not optimized. It's not aligned. But it demonstrates a principle: **models can learn while they think**.
-The implications ripple outward:
-- What if your AI assistant remembered how you corrected it?
-- What if your code completion tool learned your style over time?
-- What if your search engine adapted to your information needs?
-- What if alignment happened through interaction, not through pre-training?
-These aren't hypotheticals. They're design choices. The architecture exists. The papers are published. The code is open.
-The question is whether we build systems that simulate learning or systems that perform learning.
-This demo chooses the latter.
----
-## 15 / A Note on Hype
-Test-time learning is not a silver bullet. It introduces complexity, instability, and new failure modes. A model that changes during use is harder to trust, harder to audit, harder to guarantee.
-But it's also more adaptive, more personal, more aligned with how humans actually learn.
-The industry will likely converge on hybrid systems: frozen base models with test-time learning in specific modules. The best of both worlds—stability where you need it, adaptation where you want it.
-This demo is a step in that direction. Not the destination. Just a clearer mental model of what's possible.
----
-## 16 / The Core Metaphor
-Standard LLMs are like libraries. They contain vast knowledge, but they don't change when you visit them. You can check out books (retrieve information), but the library itself remains static.
-Test-time learning is like a brain. It changes with every experience. The connections strengthen or weaken based on what you encounter. The system becomes different because of you.
-Both are useful. But they're not the same thing.
-This demo is a brain, not a library.
----
-**Papers**:
-- Titans: Learning to (Learn at Test Time): RNNs with Expressive Hidden States ([arxiv.org/abs/2501.00663](https://arxiv.org/abs/2501.00663))
-- MIRAS: Associative Memory with Attentional Bias ([arxiv.org/abs/2504.13173](https://arxiv.org/abs/2504.13173))
-**Code**: Open source, minimal, educational.
-**Memory**: Shared, persistent, observable.
-**Learning**: Real, not simulated.
----
-*This is not a chatbot. This is a demonstration of what happens when models learn while thinking.*
-=======
 # When Models Learn While Thinking
 ---
@@ -506,4 +251,3 @@ This demo is a brain, not a library.
 ---
 *This is not a chatbot. This is a demonstration of what happens when models learn while thinking.*
->>>>>>> origin/main

 # When Models Learn While Thinking
 ---
 ---
 *This is not a chatbot. This is a demonstration of what happens when models learn while thinking.*

README.md CHANGED Viewed

@@ -1,76 +1,3 @@
-<<<<<<< HEAD
----
-title: Titans Miras Demo
-emoji: 🔬
-colorFrom: blue
-colorTo: purple
-sdk: gradio
-sdk_version: 4.36.1
-app_file: app.py
-pinned: false
----
-# Titans + MIRAS: A Brain That Changes Itself While Thinking
-A minimal but faithful reimplementation of **Titans** (test-time learning) and **MIRAS** (associative memory framework) using open-source models on Hugging Face.
-## What is this?
-This demo showcases a neural architecture that can **learn and update its memory while generating responses** - a brain that literally changes itself while thinking!
-### Key Features
-- 🔄 **Test-time learning**: Memory updates during inference (not just training)
-- 🎯 **Retention gate**: Surprising/novel inputs are more memorable (inspired by human memory)
-- 💾 **Persistent memory**: State is saved across sessions
-- 🤖 **Fully OSS**: Uses distilgpt2 and runs entirely on Hugging Face
-## Architecture
-```
-User Input
-    ↓
-[Base LM: distilgpt2] → Hidden States (768-dim)
-    ↓
-[Key/Value Projections] → Memory Space (256-dim)
-    ↓
-[MIRAS Memory Module] ← Test-time Gradient Updates
-    ↓
-[Text Generation] → Response + Memory Stats
-```
-### Components
-1. **Base Language Model**: distilgpt2 (frozen, no training)
-2. **Projection Layers**: Map hidden states to memory space
-3. **MIRAS Memory**: Associative memory with learnable key→value mapping
-4. **Retention Gate**: Adjusts learning rate based on surprise (loss magnitude)
-5. **Memory Store**: Persists memory state to disk
-## How It Works
-1. Input text is processed through distilgpt2
-2. Last hidden state is projected to key/value pairs
-3. Memory predicts value from key
-4. Loss (prediction error) indicates surprise
-5. Higher surprise → higher retention → faster learning
-6. Memory updated via gradient descent (1e-3 base LR)
-7. Response generated and memory saved
-## References
-- **Titans**: [Learning to Memorize at Test Time](https://arxiv.org/abs/2501.00663)
-- **MIRAS**: [Framework for Associative Memory with Attentional Bias](https://arxiv.org/abs/2504.13173)
-## Running Locally
-```bash
-pip install -r requirements.txt
-python app.py
-```
-Built with ❤️ exploring the future of adaptive AI systems.
-=======
 ---
 title: Titans Miras Demo
 emoji: 🔬
@@ -142,4 +69,3 @@ python app.py
 ```
 Built with ❤️ exploring the future of adaptive AI systems.
->>>>>>> origin/main

 ---
 title: Titans Miras Demo
 emoji: 🔬
 ```
 Built with ❤️ exploring the future of adaptive AI systems.

app.py CHANGED Viewed

@@ -12,11 +12,7 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
 import gradio as gr
 from miras_memory import MIRASMemory
-<<<<<<< HEAD
-from projections import KeyProjection, ValueProjection, OutputProjection
-=======
 from projections import KeyProjection, ValueProjection
->>>>>>> origin/main
 from memory_store import MemoryStore
 print("=" * 50)
@@ -30,10 +26,6 @@ HIDDEN_DIM = 768  # distilgpt2 hidden dimension
 MEMORY_DIM = 256  # Memory space dimension
 LEARNING_RATE = 1e-3  # Base learning rate for test-time updates
 MAX_NEW_TOKENS = 50  # Max tokens to generate
-<<<<<<< HEAD
-MEMORY_ALPHA = 0.1  # Memory influence strength on generation
-=======
->>>>>>> origin/main
 # ========== Initialize Components ==========
 print("🧠 Initializing Titans + MIRAS brain...")
@@ -47,10 +39,6 @@ model.eval()  # Frozen - no training
 # Create projection layers
 key_proj = KeyProjection(HIDDEN_DIM, MEMORY_DIM)
 value_proj = ValueProjection(HIDDEN_DIM, MEMORY_DIM)
-<<<<<<< HEAD
-output_proj = OutputProjection(MEMORY_DIM, HIDDEN_DIM)  # Map memory back to hidden space
-=======
->>>>>>> origin/main
 # Create memory module
 memory = MIRASMemory(memory_dim=MEMORY_DIM, init_scale=0.01)
@@ -112,44 +100,6 @@ def chat(message, history):
             # Update stats
             memory.update_stats(loss)
-<<<<<<< HEAD
-    # === Step 3: Memory-augmented generation (THE KEY CHANGE!) ===
-    # Instead of model.generate(), we do token-by-token generation
-    # where memory augments the hidden state before prediction.
-    generated_ids = inputs['input_ids'].clone()
-    with torch.no_grad():
-        for _ in range(MAX_NEW_TOKENS):
-            # Get hidden states from model
-            outputs = model(generated_ids, output_hidden_states=True)
-            h_last = outputs.hidden_states[-1][:, -1, :]  # (1, hidden_dim)
-            # Query memory with projected key
-            k_gen = key_proj(h_last)
-            memory_out = memory.query(k_gen)  # (1, memory_dim)
-            # Augment hidden state with memory output
-            # h' = h + alpha * output_proj(memory(k))
-            h_augmented = h_last + MEMORY_ALPHA * output_proj(memory_out)
-            # Compute logits with augmented hidden state
-            logits = model.lm_head(h_augmented)  # (1, vocab_size)
-            # Sample next token (temperature sampling)
-            logits = logits / 0.8  # temperature
-            probs = torch.softmax(logits, dim=-1)
-            next_token = torch.multinomial(probs, num_samples=1)
-            # Stop if EOS
-            if next_token.item() == tokenizer.eos_token_id:
-                break
-            # Append to sequence
-            generated_ids = torch.cat([generated_ids, next_token], dim=1)
-    response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
-=======
     # === Step 3: Generate response ===
     with torch.no_grad():
         output_ids = model.generate(
@@ -162,7 +112,6 @@ def chat(message, history):
         )
     response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
->>>>>>> origin/main
     # Remove the input prompt from response
     if response.startswith(message):
@@ -192,202 +141,235 @@ def chat(message, history):
 # ========== Gradio Interface ==========
 print("🚀 Launching Gradio interface...")
-with gr.Blocks(theme="soft", title="The Brain That Learns While Thinking") as demo:
-    gr.Markdown("""
-    # 🧠 The Brain That Learns While Thinking
-    ### A Living System That Updates Its Neural Weights During Inference
-    **What This Does**: Demonstrates test-time learning - the model's memory weights update via gradient descent with every message you send.
-    **The Novel Thing**: Standard LLMs (ChatGPT, Claude) freeze their weights after training. This system performs **real gradient descent while you chat**.
-    **Quick Test**: Send "hello world" 5 times and watch the **Loss** decrease below each response. That's learning happening in real-time!
-    """)
-    chatbot = gr.Chatbot(
-        label="Chat & Watch the Memory Learn",
-        height=500,
-    )
-    msg = gr.Textbox(
-        label="Your Message",
-        placeholder="Try: hello world (send it 5 times and watch loss decrease!)",
-        lines=2,
-    )
-    with gr.Row():
-        submit = gr.Button("Send", variant="primary")
-        clear = gr.Button("Clear")
-    gr.Examples(
-        examples=[
-            "hello world",
-            "hello world",
-            "Supercalifragilisticexpialidocious quantum entanglement",
-            "my name is Pavan",
-        ],
-        inputs=msg,
-        label="Try these (especially repeat 'hello world'!)",
-    )
-    # Built with section
-    gr.Markdown("""
     ---
-    **Built with**: [Titans](https://arxiv.org/abs/2501.00663) (test-time training) + [MIRAS](https://arxiv.org/abs/2504.13173) (associative memory)
-    **📖 Deep Dive**: [Read the full essay](https://huggingface.co/spaces/Pavantej/titans-miras-demo/blob/main/ESSAY.md)
-    """)
-    # Detailed information in accordions
-    with gr.Accordion("📊 What the Stats Mean", open=False):
-        gr.Markdown("""
-        **Loss** (e.g., 7.48 → 6.61 → 5.23)
-        - Prediction error - how surprised the memory is
-        - **Lower = memory is familiar** with this pattern
-        - **Decreasing loss = learning is happening!**
-        **Retention** (e.g., 2.00x)
-        - Learning rate multiplier based on surprise
-        - 2.0x = very surprising, learns aggressively
-        - 0.5x = familiar, learns slowly
-        **Updates** (e.g., 1 → 2 → 3...)
-        - Total memory updates
-        - **Persists across page refreshes!**
-        - Proof that memory is permanent
-        **Avg Loss** (e.g., 7.26)
-        - Running average showing long-term learning progress
-        """)
-    with gr.Accordion("🧪 Interactive Experiments", open=False):
-        gr.Markdown("""
-        ### Experiment 1: Watch Loss Decrease
-        1. Send "hello world" 5 times
-        2. Watch loss: 7.5 → 6.0 → 5.0 → 4.0
-        3. **This proves learning!**
-        ### Experiment 2: Trigger Surprise
-        1. After experiment 1, send something completely different
-        2. Watch loss spike back up (4.0 → 9.0+)
-        3. **Memory detects novelty!**
-        ### Experiment 3: Test Persistence
-        1. Note the "Updates" count
-        2. Refresh this entire page
-        3. Send any message
-        4. Updates should continue, not reset!
-        5. **Memory survives refresh!**
-        """)
-<<<<<<< HEAD
-    with gr.Accordion("✨ Memory-Augmented Generation", open=False):
-        gr.Markdown("""
-        **Memory now influences text generation!**
-        - At each token generation step, we query the memory
-        - Memory output is added to the hidden state: `h' = h + α × memory(k)`
-        - This augmented state determines the next token
-        **What to expect:**
-        - Repeated inputs → more consistent outputs
-        - As memory learns patterns, it biases generation toward them
-        - Novel inputs → more random outputs (memory has no prior)
-=======
-    with gr.Accordion("⚠️ Important: Ignore the Text", open=False):
-        gr.Markdown("""
-        **The text responses are random** - this is expected!
-        - We're NOT training the text generator (distilgpt2 is frozen)
-        - **Focus on the numbers below each response**
-        - The magic is in the decreasing loss, not the text
-        **Why?** We're demonstrating **memory learning**, not text generation.
->>>>>>> origin/main
-        """)
-    with gr.Accordion("🔬 How It Works", open=False):
-        gr.Markdown("""
-        ```
-        Your Message
-            ↓
-<<<<<<< HEAD
-        [distilgpt2: FROZEN]
-            ↓
-        Hidden States (768-dim)
-            ↓
-        [Key Projection] → Memory (256-dim)
-            ↓
-        [MIRAS Memory: LEARNING!]
-            ↓
-        [Output Projection] → Augmentation (768-dim)
-            ↓
-        h' = h + α × memory_output  ← THE KEY!
-            ↓
-        LM Head → Next Token
-        ```
-        **Key**: Memory modifies hidden states before prediction!
-=======
-        [distilgpt2: FROZEN] ← Not learning
-            ↓
-        Hidden States (768-dim)
-            ↓
-        [Projections] → Memory (256-dim)
-            ↓
-        [MIRAS Memory: LEARNING!]
-            ↓
-        Loss = Prediction error
-            ↓
-        Gradient Descent → Weights change
-            ↓
-        Saved to disk → Persists
-        ```
-        **Key**: We're training the **memory**, not the text generator!
->>>>>>> origin/main
-        """)
-    with gr.Accordion("💡 Why This Matters", open=False):
-        gr.Markdown("""
-        **Standard LLMs** (ChatGPT, Claude):
-        - Weights frozen after training
-        - "Learning" is just pattern matching
-        - Forget when context ends
-        **This Demo** (Titans + MIRAS):
-        - Weights update during inference
-        - Real gradient descent
-        - Memory persists across sessions
-        That decreasing loss? **That's gradient descent during inference.**
-        That's what ChatGPT doesn't do.
-        """)
-    # Footer
-    gr.Markdown("""
     ---
-    ✨ **Vibecoded by [Pavan Tej](https://github.com/thepavantejz)**
-    """)
-    # Event handlers
-    def user_submit(user_message, history):
-        return "", history + [[user_message, None]]
-    def bot_response(history):
-        user_message = history[-1][0]
-        bot_message = chat(user_message, history[:-1])
-        history[-1][1] = bot_message
-        return history
-    msg.submit(user_submit, [msg, chatbot], [msg, chatbot], queue=False).then(
-        bot_response, chatbot, chatbot
-    )
-    submit.click(user_submit, [msg, chatbot], [msg, chatbot], queue=False).then(
-        bot_response, chatbot, chatbot
-    )
-    clear.click(lambda: None, None, chatbot, queue=False)
 demo.launch()

 import gradio as gr
 from miras_memory import MIRASMemory
 from projections import KeyProjection, ValueProjection
 from memory_store import MemoryStore
 print("=" * 50)
 MEMORY_DIM = 256  # Memory space dimension
 LEARNING_RATE = 1e-3  # Base learning rate for test-time updates
 MAX_NEW_TOKENS = 50  # Max tokens to generate
 # ========== Initialize Components ==========
 print("🧠 Initializing Titans + MIRAS brain...")
 # Create projection layers
 key_proj = KeyProjection(HIDDEN_DIM, MEMORY_DIM)
 value_proj = ValueProjection(HIDDEN_DIM, MEMORY_DIM)
 # Create memory module
 memory = MIRASMemory(memory_dim=MEMORY_DIM, init_scale=0.01)
             # Update stats
             memory.update_stats(loss)
     # === Step 3: Generate response ===
     with torch.no_grad():
         output_ids = model.generate(
         )
     response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
     # Remove the input prompt from response
     if response.startswith(message):
 # ========== Gradio Interface ==========
 print("🚀 Launching Gradio interface...")
+demo = gr.ChatInterface(
+    fn=chat,
+    title="🧠 The Brain That Learns While Thinking",
+    description="""
+    # A Living System That Updates Its Weights During Inference
+    **The Novel Thing**: Standard LLMs freeze their weights after training. This system performs gradient descent *while you chat*.
+    ---
+    ## 🚀 The Revolutionary Difference
+    **Standard LLMs (ChatGPT, Claude, etc.)**: Think → Predict → **Forget**
+    **Titans + MIRAS**: Think → Predict → **Update** → **Remember** → Think Differently
+    ---
+    ### 💡 What Makes This Different?
+    | Feature | ChatGPT/Claude/GPT-4 | This Demo (Titans+MIRAS) |
+    |---------|---------------------|--------------------------|
+    | **Weights during chat** | 🔒 Frozen forever | ✅ Update with every message |
+    | **Learning** | ❌ Simulated (in-context only) | ✅ Real (gradient descent) |
+    | **Memory** | 📝 Token context only | 🧠 Neural parameters |
+    | **Persistence** | ❌ Forgets when context ends | ✅ Saves to disk |
+    | **Adaptation** | 🎭 Acts like it learned | 🔬 Actually learns |
     ---
+    ### 🎯 What You're Witnessing
+    **This is NOT a better chatbot** - it's a **learning demonstrator**.
+    1. **The text responses are random** - that's expected! We're using a small, frozen model (distilgpt2)
+    2. **The MAGIC is in the numbers below** - watch the "Loss" decrease when you repeat inputs!
+    3. **Every message physically changes the brain** - the memory weights update via gradient descent
+    4. **Refresh the page** - the update count continues (memory persists!)
     ---
+    ### 🧪 How It Works (The Technical Truth)
+    ```
+    Your Message
+        ↓
+    [distilgpt2: FROZEN] ← Not learning, just generating
+        ↓
+    Hidden States (768-dim)
+        ↓
+    [Projections] → Memory Space (256-dim)
+        ↓
+    [MIRAS Memory: LEARNING!] ← This is what updates!
+        ↓
+    Loss = How surprised the memory is
+        ↓
+    Gradient Descent → Memory weights change
+        ↓
+    Saved to disk → Persists forever
+    ```
+    **Key Insight**: We're training the **memory**, not the text generator!
+    ---
+    ### 🔬 The Science: Why This Matters
+    **Standard LLMs**:
+    - Weights frozen after training (costs millions)
+    - "Learning" is just pattern matching in context
+    - Forget everything when context ends
+    - Same model for everyone
+    **Titans + MIRAS**:
+    - Weights update during inference (free!)
+    - Real optimization via gradient descent
+    - Memory persists across sessions
+    - Personalizes to each user
+    **This is test-time learning** - the future of adaptive AI.
+    ---
+    ### 📊 What the Stats Mean
+    - **Loss**: How surprised the memory is (lower = more familiar)
+    - **Retention**: Learning rate multiplier (2.0x = very surprising, 0.5x = familiar)
+    - **Updates**: Total number of memory updates (persists across sessions!)
+    - **Avg Loss**: Overall learning progress
+    ---
+    ### 🎮 Try This Experiment
+    1. **Send "hello world" 5 times** → Watch loss decrease!
+    2. **Send something completely different** → Loss spikes!
+    3. **Refresh the page and send another message** → Update count continues!
+    **That decreasing loss is proof the neural weights are changing!**
+    ---
+    ### 🌟 The Bottom Line
+    **ChatGPT**: A frozen calculator that *simulates* adaptation
+    **This Demo**: A living system that *performs* adaptation
+    You're not chatting with a model.
+    **You're watching a brain rewire itself in real-time.** 🧠⚡
+    ---
+    ### 🧪 How to Test This (Interactive Experiments)
+    **Don't just chat—run experiments to see the learning happen!**
+    #### Experiment 1: Watch Loss Decrease (Proof of Learning)
+    ```
+    1. Send "hello world"
+    2. Send "hello world" again
+    3. Send "hello world" again
+    4. Send "hello world" again
+    5. Send "hello world" again
+    ```
+    **What to watch**: Loss should decrease each time (7.5 → 6.0 → 5.0 → 4.0)
+    **Why it matters**: This proves the memory is learning the pattern!
+    #### Experiment 2: Trigger Surprise (Spike the Loss)
+    ```
+    1. Send "hello world" 5 times (loss decreases)
+    2. Then send: "Supercalifragilisticexpialidocious quantum entanglement"
+    ```
+    **What to watch**: Loss should spike back up (4.0 → 9.0+)
+    **Why it matters**: The memory detects novelty—it knows this is different!
+    #### Experiment 3: Test Persistence (Memory Survives)
+    ```
+    1. Note the "Updates" count (e.g., 15)
+    2. Refresh this page completely
+    3. Send any message
+    4. Check if Updates = 16 (not reset to 1!)
+    ```
+    **What to watch**: Update count should continue, not reset
+    **Why it matters**: Memory persists to disk—it's not just in RAM!
+    ---
+    ### 📊 What Each Stat Means (Decoder Ring)
+    **Loss** (e.g., 7.48 → 6.61 → 5.23)
+    - **What it is**: Prediction error (how surprised the memory is)
+    - **Lower = Better**: Memory is familiar with this pattern
+    - **Higher = Novel**: Memory hasn't seen this before
+    - **Why it matters**: Decreasing loss = learning is happening!
+    **Retention** (e.g., 2.00x)
+    - **What it is**: Learning rate multiplier based on surprise
+    - **2.0x = Very surprising**: Memory learns aggressively
+    - **0.5x = Very familiar**: Memory learns slowly (you won't see this yet)
+    - **Why it matters**: The brain learns more from surprising events (like humans!)
+    **Updates** (e.g., 1 → 2 → 3 → 4...)
+    - **What it is**: Total number of memory updates
+    - **Persists across sessions**: Survives page refreshes
+    - **Never resets**: Keeps counting forever
+    - **Why it matters**: Proof that memory is persistent, not ephemeral!
+    **Avg Loss** (e.g., 7.26)
+    - **What it is**: Running average of all losses
+    - **Trends downward**: As memory learns recurring patterns
+    - **Reflects overall learning**: Lower = memory is getting smarter
+    - **Why it matters**: Shows long-term learning progress!
+    ---
+    ### ⚠️ What to Ignore (Important!)
+    **The text responses are random and bad** - this is expected!
+    - We're NOT training the text generator (distilgpt2 is frozen)
+    - The responses don't matter—they're a side effect
+    - **Focus on the numbers below**, not the text above
+    - The magic is in the decreasing loss, not the generated text
+    **Why?** Because we're demonstrating **memory learning**, not text generation.
+    Standard LLMs train the text generator. This trains the memory. Different goals.
+    ---
+    ### 🎯 What Success Looks Like
+    ✅ **You're seeing it work if**:
+    - Loss decreases when you repeat inputs
+    - Loss spikes when you send something new
+    - Update count increments with each message
+    - Update count persists after page refresh
+    - Retention is 2.0x (everything is surprising to fresh memory)
+    ❌ **You're NOT seeing it work if**:
+    - Loss stays constant (not learning)
+    - Updates reset to 1 after refresh (not persisting)
+    - No stats appear below responses
+    ---
+    ### 🔬 Why This Matters (The Big Picture)
+    **Standard LLMs**: Frozen weights → No learning during use
+    **This Demo**: Live weights → Learning with every message
+    That decreasing loss you see? **That's gradient descent happening during inference.**
+    That's the revolution. That's what ChatGPT doesn't do.
+    You're not just using a model. **You're watching it change.**
+    ---
+    *Built with Titans (test-time training) + MIRAS (associative memory)*
+    *Papers: [Titans](https://arxiv.org/abs/2501.00663) | [MIRAS](https://arxiv.org/abs/2504.13173)*
+    **📖 [Read the full essay: "When Models Learn While Thinking"](https://huggingface.co/spaces/Pavantej/titans-miras-demo/blob/main/ESSAY.md)**
+    """,
+    examples=[
+        "hello world",
+        "hello world",  # Repeat to show learning!
+        "Tell me about test-time learning",
+        "What is 2+2?",
+        "my name is [your name]",
+    ],
+    cache_examples=False,
+    theme="soft",
+)
 demo.launch()

memory_store.py CHANGED Viewed

@@ -1,88 +1,3 @@
-<<<<<<< HEAD
-"""
-Memory Persistence
-Handles saving and loading memory state to/from disk so the brain
-remembers across sessions.
-"""
-import torch
-import json
-import os
-from pathlib import Path
-from datetime import datetime
-class MemoryStore:
-    """Manages persistent storage of memory state."""
-    def __init__(self, save_dir="memory"):
-        self.save_dir = Path(save_dir)
-        self.save_dir.mkdir(exist_ok=True)
-        self.memory_path = self.save_dir / "memory.pt"
-        self.metadata_path = self.save_dir / "metadata.json"
-    def save(self, memory_module):
-        """
-        Save memory state to disk.
-        Args:
-            memory_module: MIRASMemory instance
-        """
-        # Save memory weights
-        torch.save({
-            'W': memory_module.W.data,
-            'update_count': memory_module.update_count,
-            'total_loss': memory_module.total_loss,
-        }, self.memory_path)
-        # Save metadata
-        metadata = {
-            'last_updated': datetime.now().isoformat(),
-            'memory_dim': memory_module.memory_dim,
-            'updates': memory_module.update_count.item(),
-            'avg_loss': (memory_module.total_loss / max(memory_module.update_count, 1)).item(),
-        }
-        with open(self.metadata_path, 'w') as f:
-            json.dump(metadata, f, indent=2)
-        print(f"💾 Memory saved: {memory_module.update_count.item()} updates")
-    def load(self, memory_module):
-        """
-        Load memory state from disk.
-        Args:
-            memory_module: MIRASMemory instance to load into
-        Returns:
-            bool: True if loaded successfully, False otherwise
-        """
-        if not self.memory_path.exists():
-            print("🆕 No saved memory found. Starting fresh!")
-            return False
-        try:
-            checkpoint = torch.load(self.memory_path)
-            memory_module.W.data = checkpoint['W']
-            memory_module.update_count = checkpoint['update_count']
-            memory_module.total_loss = checkpoint['total_loss']
-            print(f"✅ Memory loaded: {memory_module.update_count.item()} updates")
-            return True
-        except Exception as e:
-            print(f"⚠️ Error loading memory: {e}. Starting fresh!")
-            return False
-    def get_metadata(self):
-        """Get metadata about saved memory."""
-        if not self.metadata_path.exists():
-            return None
-        with open(self.metadata_path, 'r') as f:
-            return json.load(f)
-=======
 """
 Memory Persistence
@@ -166,4 +81,3 @@ class MemoryStore:
         with open(self.metadata_path, 'r') as f:
             return json.load(f)
->>>>>>> origin/main

 """
 Memory Persistence
         with open(self.metadata_path, 'r') as f:
             return json.load(f)

miras_memory.py CHANGED Viewed

@@ -1,115 +1,3 @@
-<<<<<<< HEAD
-"""
-MIRAS-inspired Associative Memory Module
-Implements an associative memory that learns key-value mappings
-through attentional bias objective during test time.
-"""
-import torch
-import torch.nn as nn
-class MIRASMemory(nn.Module):
-    """
-    Associative memory module inspired by MIRAS framework.
-    The memory learns to map keys to values using a simple linear projection
-    and updates itself during test time via gradient descent.
-    Args:
-        memory_dim: Dimensionality of memory keys/values
-        init_scale: Scale for random weight initialization
-    """
-    def __init__(self, memory_dim=256, init_scale=0.01):
-        super().__init__()
-        self.memory_dim = memory_dim
-        # Memory matrix: maps keys to values
-        # W: (memory_dim, memory_dim)
-        self.W = nn.Parameter(
-            torch.randn(memory_dim, memory_dim) * init_scale
-        )
-        # Track number of updates for retention gate
-        self.register_buffer('update_count', torch.tensor(0))
-        self.register_buffer('total_loss', torch.tensor(0.0))
-    def forward(self, key):
-        """
-        Query memory with a key.
-        Args:
-            key: (batch_size, memory_dim) tensor
-        Returns:
-            predicted_value: (batch_size, memory_dim) tensor
-        """
-        # Simple linear mapping: pred_v = k @ W
-        predicted_value = key @ self.W
-        return predicted_value
-    def query(self, key):
-        """
-        Query memory without computing gradients (for generation).
-        Args:
-            key: (batch_size, memory_dim) tensor
-        Returns:
-            memory_output: (batch_size, memory_dim) tensor
-        """
-        with torch.no_grad():
-            return self.forward(key)
-    def compute_loss(self, key, value):
-        """
-        Compute attentional bias loss between predicted and true value.
-        Args:
-            key: (batch_size, memory_dim)
-            value: (batch_size, memory_dim)
-        Returns:
-            loss: scalar tensor
-        """
-        pred = self.forward(key)
-        loss = ((pred - value) ** 2).mean()
-        return loss
-    def retention_gate(self, loss):
-        """
-        Simple retention gate: higher loss = more surprising = more memorable.
-        Returns a scaling factor for the learning rate based on surprise.
-        High loss (surprising) gets higher weight.
-        Args:
-            loss: scalar tensor
-        Returns:
-            retention_factor: scalar in range [0.5, 2.0]
-        """
-        # Normalize loss to a retention factor
-        # If loss is high (surprising), learn more aggressively
-        retention_factor = torch.clamp(loss / 0.1, 0.5, 2.0)
-        return retention_factor.item()
-    def update_stats(self, loss):
-        """Track memory statistics."""
-        self.update_count += 1
-        self.total_loss += loss.item()
-    def get_stats(self):
-        """Get memory statistics."""
-        avg_loss = self.total_loss / max(self.update_count, 1)
-        return {
-            'updates': self.update_count.item(),
-            'avg_loss': avg_loss.item(),
-            'memory_size': self.W.numel()
-        }
-=======
 """
 MIRAS-inspired Associative Memory Module
@@ -207,4 +95,3 @@ class MIRASMemory(nn.Module):
             'avg_loss': avg_loss.item(),
             'memory_size': self.W.numel()
         }
->>>>>>> origin/main

 """
 MIRAS-inspired Associative Memory Module
             'avg_loss': avg_loss.item(),
             'memory_size': self.W.numel()
         }

projections.py CHANGED Viewed

@@ -1,85 +1,3 @@
-<<<<<<< HEAD
-"""
-Key and Value Projection Layers
-Maps hidden states from the base language model into memory-compatible
-representations for the MIRAS memory module.
-"""
-import torch.nn as nn
-class KeyProjection(nn.Module):
-    """
-    Projects hidden states to memory keys.
-    Args:
-        hidden_dim: Dimension of LM hidden states (e.g., 768 for distilgpt2)
-        memory_dim: Dimension of memory keys (e.g., 256)
-    """
-    def __init__(self, hidden_dim, memory_dim):
-        super().__init__()
-        self.projection = nn.Linear(hidden_dim, memory_dim, bias=False)
-    def forward(self, hidden_state):
-        """
-        Args:
-            hidden_state: (batch_size, hidden_dim)
-        Returns:
-            key: (batch_size, memory_dim)
-        """
-        return self.projection(hidden_state)
-class ValueProjection(nn.Module):
-    """
-    Projects hidden states to memory values.
-    Args:
-        hidden_dim: Dimension of LM hidden states (e.g., 768 for distilgpt2)
-        memory_dim: Dimension of memory values (e.g., 256)
-    """
-    def __init__(self, hidden_dim, memory_dim):
-        super().__init__()
-        self.projection = nn.Linear(hidden_dim, memory_dim, bias=False)
-    def forward(self, hidden_state):
-        """
-        Args:
-            hidden_state: (batch_size, hidden_dim)
-        Returns:
-            value: (batch_size, memory_dim)
-        """
-        return self.projection(hidden_state)
-class OutputProjection(nn.Module):
-    """
-    Projects memory output back to hidden dimension for augmentation.
-    This is the key bridge that allows memory to influence generation:
-    h' = h + alpha * output_proj(memory(k))
-    Args:
-        memory_dim: Dimension of memory output (e.g., 256)
-        hidden_dim: Dimension of LM hidden states (e.g., 768)
-    """
-    def __init__(self, memory_dim, hidden_dim):
-        super().__init__()
-        self.projection = nn.Linear(memory_dim, hidden_dim, bias=False)
-    def forward(self, memory_output):
-        """
-        Args:
-            memory_output: (batch_size, memory_dim)
-        Returns:
-            hidden_augmentation: (batch_size, hidden_dim)
-        """
-        return self.projection(memory_output)
-=======
 """
 Key and Value Projection Layers
@@ -134,4 +52,3 @@ class ValueProjection(nn.Module):
             value: (batch_size, memory_dim)
         """
         return self.projection(hidden_state)
->>>>>>> origin/main

 """
 Key and Value Projection Layers
             value: (batch_size, memory_dim)
         """
         return self.projection(hidden_state)