On-Device Memory Improvements for LLM Chat

Drop-in improvements for on-device chat apps using small language models with semantic memory (e.g., Llama-3.2-3B + EmbeddingGemma-300M + SQLite vector store).

πŸ“¦ Files

File Purpose
src/types.ts Type definitions & configuration
src/deduplication.ts Improvement 1: Mem0-style ADD/UPDATE/NOOP deduplication
src/memoryDecay.ts Improvement 2: Heat-based decay & eviction (MemoryOS)
src/typedMemory.ts Improvement 3: ENGRAM-style episodic/semantic/procedural router
src/assistantMemory.ts Improvement 4: Store AI replies as memories
src/smartFilter.ts Improvement 5: Only store declarative/factual content
src/dynamicRetrieval.ts Improvement 6: Adaptive top-k, threshold, type filtering
src/schema.ts SQLite schema migration
src/index.ts ImprovedMemoryService class (combines all improvements)
src/integrationGuide.ts Exact code changes for each file in your app
src/beforeAfterWalkthrough.ts Side-by-side comparison of old vs new behavior
tests/demo.py Validated Python demo showing all improvements

πŸš€ Quick Start

import { ImprovedMemoryService, DEFAULT_MEMORY_CONFIG } from './src';

const memoryService = new ImprovedMemoryService(db, embedFn, DEFAULT_MEMORY_CONFIG);

// Retrieve memories (replaces your current searchMemories)
const { context, memoryIds } = await memoryService.retrieve(userMessage, embedding);

// Store user message (with smart filtering + dedup)
await memoryService.storeUserMessage(userMessage, embedding);

// Store assistant reply (new!)
await memoryService.storeAssistantReply(assistantReply);

πŸ“Š Before vs After

Aspect Before After
Memories stored 200+ (noisy) ~80 (clean, deduplicated)
Context slots 5 (3 wasted on duplicates) 4-12 (all unique, relevant)
Token budget Fixed Dynamic (100-500 based on query)
Prompt structure Flat list Typed sections
Assistant recall None Tracks recommendations/promises
Stale memory handling Never cleaned Heat-based eviction
Duplicate handling None 0.85 threshold auto-dedup

πŸ“š Research Basis

  • ENGRAM (arxiv 2511.12960) β€” Typed memory stores with router
  • Mem0 (arxiv 2504.19413) β€” ADD/UPDATE/DELETE operations with deduplication
  • MemoryOS (arxiv 2506.06326) β€” Heat-based decay and hierarchical storage
  • MemLoRA (arxiv 2512.04763) β€” Distilled memory adapters for small models

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "loudiman/memory-improvements"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support