loudiman
/

memory-improvements

Model card Files Files and versions

memory-improvements / README.md

loudiman's picture

Update ML Intern artifact metadata

2d96b1a verified 17 days ago

|

history blame contribute delete

3.3 kB

	---
	tags:
	- ml-intern
	---
	# On-Device Memory Improvements for LLM Chat

	Drop-in improvements for on-device chat apps using small language models with semantic memory (e.g., Llama-3.2-3B + EmbeddingGemma-300M + SQLite vector store).

	## 📦 Files

	\| File \| Purpose \|
	\|------\|---------\|
	\| `src/types.ts` \| Type definitions & configuration \|
	\| `src/deduplication.ts` \| Improvement 1: Mem0-style ADD/UPDATE/NOOP deduplication \|
	\| `src/memoryDecay.ts` \| Improvement 2: Heat-based decay & eviction (MemoryOS) \|
	\| `src/typedMemory.ts` \| Improvement 3: ENGRAM-style episodic/semantic/procedural router \|
	\| `src/assistantMemory.ts` \| Improvement 4: Store AI replies as memories \|
	\| `src/smartFilter.ts` \| Improvement 5: Only store declarative/factual content \|
	\| `src/dynamicRetrieval.ts` \| Improvement 6: Adaptive top-k, threshold, type filtering \|
	\| `src/schema.ts` \| SQLite schema migration \|
	\| `src/index.ts` \| `ImprovedMemoryService` class (combines all improvements) \|
	\| `src/integrationGuide.ts` \| Exact code changes for each file in your app \|
	\| `src/beforeAfterWalkthrough.ts` \| Side-by-side comparison of old vs new behavior \|
	\| `tests/demo.py` \| Validated Python demo showing all improvements \|

	## 🚀 Quick Start

	```typescript
	import { ImprovedMemoryService, DEFAULT_MEMORY_CONFIG } from './src';

	const memoryService = new ImprovedMemoryService(db, embedFn, DEFAULT_MEMORY_CONFIG);

	// Retrieve memories (replaces your current searchMemories)
	const { context, memoryIds } = await memoryService.retrieve(userMessage, embedding);

	// Store user message (with smart filtering + dedup)
	await memoryService.storeUserMessage(userMessage, embedding);

	// Store assistant reply (new!)
	await memoryService.storeAssistantReply(assistantReply);
	```

	## 📊 Before vs After

	\| Aspect \| Before \| After \|
	\|--------\|--------\|-------\|
	\| Memories stored \| 200+ (noisy) \| ~80 (clean, deduplicated) \|
	\| Context slots \| 5 (3 wasted on duplicates) \| 4-12 (all unique, relevant) \|
	\| Token budget \| Fixed \| Dynamic (100-500 based on query) \|
	\| Prompt structure \| Flat list \| Typed sections \|
	\| Assistant recall \| None \| Tracks recommendations/promises \|
	\| Stale memory handling \| Never cleaned \| Heat-based eviction \|
	\| Duplicate handling \| None \| 0.85 threshold auto-dedup \|

	## 📚 Research Basis

	- ENGRAM (arxiv 2511.12960) — Typed memory stores with router
	- Mem0 (arxiv 2504.19413) — ADD/UPDATE/DELETE operations with deduplication
	- MemoryOS (arxiv 2506.06326) — Heat-based decay and hierarchical storage
	- MemLoRA (arxiv 2512.04763) — Distilled memory adapters for small models

	<!-- ml-intern-provenance -->
	## Generated by ML Intern

	This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.

	- Try ML Intern: https://smolagents-ml-intern.hf.space
	- Source code: https://github.com/huggingface/ml-intern

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "loudiman/memory-improvements"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)
	```

	For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.