--- tags: - ml-intern --- # On-Device Memory Improvements for LLM Chat Drop-in improvements for on-device chat apps using small language models with semantic memory (e.g., Llama-3.2-3B + EmbeddingGemma-300M + SQLite vector store). ## 📦 Files | File | Purpose | |------|---------| | `src/types.ts` | Type definitions & configuration | | `src/deduplication.ts` | **Improvement 1:** Mem0-style ADD/UPDATE/NOOP deduplication | | `src/memoryDecay.ts` | **Improvement 2:** Heat-based decay & eviction (MemoryOS) | | `src/typedMemory.ts` | **Improvement 3:** ENGRAM-style episodic/semantic/procedural router | | `src/assistantMemory.ts` | **Improvement 4:** Store AI replies as memories | | `src/smartFilter.ts` | **Improvement 5:** Only store declarative/factual content | | `src/dynamicRetrieval.ts` | **Improvement 6:** Adaptive top-k, threshold, type filtering | | `src/schema.ts` | SQLite schema migration | | `src/index.ts` | `ImprovedMemoryService` class (combines all improvements) | | `src/integrationGuide.ts` | Exact code changes for each file in your app | | `src/beforeAfterWalkthrough.ts` | Side-by-side comparison of old vs new behavior | | `tests/demo.py` | Validated Python demo showing all improvements | ## 🚀 Quick Start ```typescript import { ImprovedMemoryService, DEFAULT_MEMORY_CONFIG } from './src'; const memoryService = new ImprovedMemoryService(db, embedFn, DEFAULT_MEMORY_CONFIG); // Retrieve memories (replaces your current searchMemories) const { context, memoryIds } = await memoryService.retrieve(userMessage, embedding); // Store user message (with smart filtering + dedup) await memoryService.storeUserMessage(userMessage, embedding); // Store assistant reply (new!) await memoryService.storeAssistantReply(assistantReply); ``` ## 📊 Before vs After | Aspect | Before | After | |--------|--------|-------| | Memories stored | 200+ (noisy) | ~80 (clean, deduplicated) | | Context slots | 5 (3 wasted on duplicates) | 4-12 (all unique, relevant) | | Token budget | Fixed | Dynamic (100-500 based on query) | | Prompt structure | Flat list | Typed sections | | Assistant recall | None | Tracks recommendations/promises | | Stale memory handling | Never cleaned | Heat-based eviction | | Duplicate handling | None | 0.85 threshold auto-dedup | ## 📚 Research Basis - **ENGRAM** (arxiv 2511.12960) — Typed memory stores with router - **Mem0** (arxiv 2504.19413) — ADD/UPDATE/DELETE operations with deduplication - **MemoryOS** (arxiv 2506.06326) — Heat-based decay and hierarchical storage - **MemLoRA** (arxiv 2512.04763) — Distilled memory adapters for small models ## Generated by ML Intern This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. - Try ML Intern: https://smolagents-ml-intern.hf.space - Source code: https://github.com/huggingface/ml-intern ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "loudiman/memory-improvements" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) ``` For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.