cem888-benchmarks / README.md
CEM888AI's picture
CEM888 MemoryAgentBench results: 99.9% AR, 77.2% BEAM
9b209aa verified
|
Raw
History Blame Contribute Delete
1.54 kB
metadata
license: mit
task_categories:
  - question-answering
  - text-retrieval
tags:
  - memory
  - benchmark
  - agent
  - retrieval
  - local-ai
  - sovereign-ai
pretty_name: CEM888 MemoryAgentBench Results
size_categories:
  - n<1K

CEM888.AI MemoryAgentBench Results

99.9% AR · 77.2% BEAM — Filesystem-native memory agent on MemoryAgentBench (ICLR 2026).

Scores

Benchmark CEM888 (Vetta) Best Published
AR Retrieval 99.9% 71.5% (Hindsight)
BEAM Memory 77.2% 64.1% (Hindsight honest)
  • AR: 2,000 retrieval questions — 2 misses out of 2,000
  • BEAM: 200 multi-category memory questions

Architecture

  • Model: DeepSeek V4 Pro
  • Retrieval: Filesystem-first, deterministic search — no RAG, no embeddings, no vector DB
  • Memory: Agent-native sovereign vault — the filesystem is ground truth
  • Deployment: Fully local. No cloud. No data leakage.

Contents

  • AR-Results-99.9pct.md — Full AR breakdown with all categories
  • Vetta-BEAM-Honest-77.2pct.md — BEAM methodology and per-category scores
  • vetta_beam_v9_results.jsonl — All 200 BEAM questions with scores
  • vetta_live_results.jsonl — All 2,000 AR questions with scores

Links


Building this solo. Looking for sponsors and collaborators.