Post
102

We ran Feather DB v0.8.0 on LongMemEval (ICLR 2025) β 500 questions across real multi-session conversations, up to 115K tokens each.
**Score: 0.693** Β· GPT-4o full-context baseline: 0.640
Full 500-question run with Gemini-Flash: **$2.40**
Per-axis breakdown:
β Info-extraction: **0.942**
β Knowledge-update: **0.714**
β Multi-session: **0.606**
β Temporal: **0.477** β the hard one, Phase 9 addresses this
Architecture: Hybrid BM25+dense Β· adaptive temporal decay Β· embedded (no server) Β· p50 = 0.19ms Β· MIT
Raw results + audit JSONs: Hawky-ai/longmemeval-results
We ran Feather DB v0.8.0 on LongMemEval (ICLR 2025) β 500 questions across real multi-session conversations, up to 115K tokens each.
**Score: 0.693** Β· GPT-4o full-context baseline: 0.640
Full 500-question run with Gemini-Flash: **$2.40**
Per-axis breakdown:
β Info-extraction: **0.942**
β Knowledge-update: **0.714**
β Multi-session: **0.606**
β Temporal: **0.477** β the hard one, Phase 9 addresses this
Architecture: Hybrid BM25+dense Β· adaptive temporal decay Β· embedded (no server) Β· p50 = 0.19ms Β· MIT
pip install feather-dbRaw results + audit JSONs: Hawky-ai/longmemeval-results