metadata
license: mit
task_categories:
- question-answering
- text-retrieval
tags:
- memory
- benchmark
- agent
- retrieval
- local-ai
- sovereign-ai
pretty_name: CEM888 MemoryAgentBench Results
size_categories:
- n<1K
CEM888.AI MemoryAgentBench Results
99.9% AR · 77.2% BEAM — Filesystem-native memory agent on MemoryAgentBench (ICLR 2026).
Scores
| Benchmark | CEM888 (Vetta) | Best Published |
|---|---|---|
| AR Retrieval | 99.9% | 71.5% (Hindsight) |
| BEAM Memory | 77.2% | 64.1% (Hindsight honest) |
- AR: 2,000 retrieval questions — 2 misses out of 2,000
- BEAM: 200 multi-category memory questions
Architecture
- Model: DeepSeek V4 Pro
- Retrieval: Filesystem-first, deterministic search — no RAG, no embeddings, no vector DB
- Memory: Agent-native sovereign vault — the filesystem is ground truth
- Deployment: Fully local. No cloud. No data leakage.
Contents
AR-Results-99.9pct.md— Full AR breakdown with all categoriesVetta-BEAM-Honest-77.2pct.md— BEAM methodology and per-category scoresvetta_beam_v9_results.jsonl— All 200 BEAM questions with scoresvetta_live_results.jsonl— All 2,000 AR questions with scores
Links
- GitHub: CEM888.AI-Site
- Reddit: r/LocalLLaMA discussion
- Contact: creator@cem888.ai
Building this solo. Looking for sponsors and collaborators.