multi-record: anchor-first hybrid resolver (client 0.4.9 + mcp 0.1.8)
Browse filesFix the multi-record / linked-record retrieval regression at scale. The old
corpus-fraction selectivity cutoff (round(0.15 * corpus_n)) lost meaning past
~150 records and let cross-cluster records pollute results (~0.36 recall at
50-100 companies). The resolver is now anchor-first with a hybrid gate: a
candidate survives if it is in the query's anchor cluster (matches a rarest-term
anchor) OR clears a high-coverage bar. Scale-invariant; abstention and
terminal/prep gates unchanged.
Also: search() cross-tier rank tiebreaker (content tiers before contentless
journal) and search_entities(category=) anchor filter. mcp pins client>=0.4.9.
Validated: client 110 + mcp 22 tests green; synthetic 480-record A/B (full
recall, 0 cross-cluster pollution vs 1920); LongMemEval retrieval diagnostic
(per-question oracle NEW>=OLD).
- sibyl-memory-client/CHANGELOG.md +39 -0
- sibyl-memory-client/pyproject.toml +1 -1
- sibyl-memory-client/src/sibyl_memory_client/client.py +18 -5
- sibyl-memory-client/src/sibyl_memory_client/multi_record.py +66 -25
- sibyl-memory-client/tests/test_anchor_resolver_2026_06_06.py +91 -0
- sibyl-memory-mcp/CHANGELOG.md +11 -0
- sibyl-memory-mcp/pyproject.toml +2 -2
|
@@ -4,6 +4,45 @@ All notable changes to `sibyl-memory-client` are recorded here. Format
|
|
| 4 |
follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning
|
| 5 |
follows [SemVer](https://semver.org/).
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
## [0.4.8] - 2026-06-04
|
| 8 |
|
| 9 |
### Fixed
|
|
|
|
| 4 |
follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning
|
| 5 |
follows [SemVer](https://semver.org/).
|
| 6 |
|
| 7 |
+
## [0.4.9] - 2026-06-06
|
| 8 |
+
|
| 9 |
+
### Fixed
|
| 10 |
+
|
| 11 |
+
- **Multi-record search recall/precision regression at scale (anchor-first hybrid resolver).**
|
| 12 |
+
`multi_record_search` used a corpus-fraction selectivity cutoff
|
| 13 |
+
(`round(0.15 * corpus_n)`) calibrated on a 24-record reconstruction. Past ~150
|
| 14 |
+
records the cutoff lost meaning: almost every term read as "selective," so
|
| 15 |
+
cross-cluster records cleared the gate and polluted results (tester Sylvain
|
| 16 |
+
Runs 16/17, ~0.36 recall at 50-100 companies). The resolver is now anchor-first:
|
| 17 |
+
anchor terms are the rarest tokens, defined RELATIVE to the rarest query term
|
| 18 |
+
(`df <= ANCHOR_BAND * min_df`, scale-invariant). The gate is a HYBRID: a
|
| 19 |
+
candidate survives if it is in the anchor's cluster (matches an anchor term) OR
|
| 20 |
+
clears the high-coverage bar `ANCHOR_HYBRID_HI` (genuinely relevant despite
|
| 21 |
+
lacking the rare anchor). A pure strict filter killed cross-cluster pollution
|
| 22 |
+
but over-dropped natural-language evidence; the hybrid keeps both. Abstention
|
| 23 |
+
(zero-support term) and the terminal/prep gates are unchanged. Validated two
|
| 24 |
+
ways: (a) synthetic 480-record workflow A/B — full recall, 0 cross-cluster
|
| 25 |
+
pollution vs the old code's 1,920 polluting hits over 120 queries (matches
|
| 26 |
+
tester Runs 24-29); (b) real-data LongMemEval retrieval diagnostic — per-question
|
| 27 |
+
(oracle) retrieval is not regressed (NEW >= OLD, +3.4pts), and in a combined-
|
| 28 |
+
store contamination stress NEW cuts cross-question pollution ~29% for a small
|
| 29 |
+
recall trade. Regression guard: `tests/test_anchor_resolver_2026_06_06.py`.
|
| 30 |
+
|
| 31 |
+
- **Cross-tier rank comparability.** `search()` BM25 ranks are not on a common
|
| 32 |
+
scale across FTS tables (`journal_events_fts` is contentless). Added a tier
|
| 33 |
+
tiebreaker so content tiers (entity/state/reference) sort before journal at equal
|
| 34 |
+
rank, layered on the existing 0.4.7 journal cap. (tester email 19e7eb3096b4dae5)
|
| 35 |
+
|
| 36 |
+
### Added
|
| 37 |
+
|
| 38 |
+
- **`search_entities(category=...)`.** Optional exact-match category anchor on
|
| 39 |
+
entity FTS, removing topical bleed across categories on multi-entity workloads
|
| 40 |
+
(tester email 19e7e75af0b7780a). Backward compatible (defaults to all categories).
|
| 41 |
+
|
| 42 |
+
Sourced from Sylvain's beta Runs 24-29 + the bugflow batch dedup; this single
|
| 43 |
+
patch also supersedes ~20 already-fixed entries that had accumulated in the
|
| 44 |
+
bug-batch queue.
|
| 45 |
+
|
| 46 |
## [0.4.8] - 2026-06-04
|
| 47 |
|
| 48 |
### Fixed
|
|
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|
| 4 |
|
| 5 |
[project]
|
| 6 |
name = "sibyl-memory-client"
|
| 7 |
-
version = "0.4.
|
| 8 |
description = "Local-first agentic memory SDK. SQLite-backed five-tier hierarchical schema, FTS5 search, multi-tenant, with self-learning skill detection and local memory linter. Foundation of the Sibyl Memory Plugin family."
|
| 9 |
authors = [{ name = "SIBYL, Sibyl Labs LLC", email = "sibyl@sibyllabs.org" }]
|
| 10 |
license = { text = "MIT" }
|
|
|
|
| 4 |
|
| 5 |
[project]
|
| 6 |
name = "sibyl-memory-client"
|
| 7 |
+
version = "0.4.9"
|
| 8 |
description = "Local-first agentic memory SDK. SQLite-backed five-tier hierarchical schema, FTS5 search, multi-tenant, with self-learning skill detection and local memory linter. Foundation of the Sibyl Memory Plugin family."
|
| 9 |
authors = [{ name = "SIBYL, Sibyl Labs LLC", email = "sibyl@sibyllabs.org" }]
|
| 10 |
license = { text = "MIT" }
|
|
@@ -881,7 +881,8 @@ class MemoryClient:
|
|
| 881 |
# ------------------------------------------------------------------
|
| 882 |
# FTS5 search
|
| 883 |
# ------------------------------------------------------------------
|
| 884 |
-
def search_entities(self, query: str, *, limit: int = 20, prefix: bool = False
|
|
|
|
| 885 |
"""Full-text search over entity name + category + body via FTS5.
|
| 886 |
|
| 887 |
Returns warm-tier entity rows only. For cross-tier search (entities +
|
|
@@ -891,6 +892,11 @@ class MemoryClient:
|
|
| 891 |
(``name:foo``) and unclosed quotes can't escape into the parser.
|
| 892 |
Set ``prefix=True`` for prefix matching on the final token.
|
| 893 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 894 |
Returns: list of entity rows. Each row is a dict with keys
|
| 895 |
id, tenant_id, category, name, status, body, created_at, updated_at
|
| 896 |
(body is JSON-deserialized).
|
|
@@ -904,15 +910,18 @@ class MemoryClient:
|
|
| 904 |
# external-content FTS5: join by rowid back to base table.
|
| 905 |
# _fts_query handles classification (v0.4.0 KAPPA) + corruption
|
| 906 |
# containment (poisoned-index DatabaseError self-heals or returns []).
|
|
|
|
|
|
|
|
|
|
| 907 |
with self._storage.connection() as conn:
|
| 908 |
rows = _fts_query(
|
| 909 |
conn,
|
| 910 |
"SELECT e.id, e.tenant_id, e.category, e.name, e.status, e.body, e.created_at, e.updated_at "
|
| 911 |
"FROM entities_fts f "
|
| 912 |
"JOIN entities e ON e.rowid = f.rowid "
|
| 913 |
-
"WHERE entities_fts MATCH ? AND f.tenant_id = ? "
|
| 914 |
"ORDER BY rank LIMIT ?",
|
| 915 |
-
|
| 916 |
"entities_fts",
|
| 917 |
)
|
| 918 |
return [self._row_to_entity(r) for r in rows]
|
|
@@ -1049,8 +1058,12 @@ class MemoryClient:
|
|
| 1049 |
},
|
| 1050 |
"snippet": r["snip"], "rank": r["rank"], "ts": r["ts"],
|
| 1051 |
})
|
| 1052 |
-
# Sort by rank (lower = better in FTS5)
|
| 1053 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1054 |
return hits[:limit]
|
| 1055 |
|
| 1056 |
# ------------------------------------------------------------------
|
|
|
|
| 881 |
# ------------------------------------------------------------------
|
| 882 |
# FTS5 search
|
| 883 |
# ------------------------------------------------------------------
|
| 884 |
+
def search_entities(self, query: str, *, limit: int = 20, prefix: bool = False,
|
| 885 |
+
category: str | None = None) -> list[dict[str, Any]]:
|
| 886 |
"""Full-text search over entity name + category + body via FTS5.
|
| 887 |
|
| 888 |
Returns warm-tier entity rows only. For cross-tier search (entities +
|
|
|
|
| 892 |
(``name:foo``) and unclosed quotes can't escape into the parser.
|
| 893 |
Set ``prefix=True`` for prefix matching on the final token.
|
| 894 |
|
| 895 |
+
Pass ``category="<name>"`` to anchor the search to a single entity
|
| 896 |
+
category (exact match); this removes topical bleed across categories on
|
| 897 |
+
multi-entity workloads (tester email 19e7e75af0b7780a). Omit to search
|
| 898 |
+
all categories.
|
| 899 |
+
|
| 900 |
Returns: list of entity rows. Each row is a dict with keys
|
| 901 |
id, tenant_id, category, name, status, body, created_at, updated_at
|
| 902 |
(body is JSON-deserialized).
|
|
|
|
| 910 |
# external-content FTS5: join by rowid back to base table.
|
| 911 |
# _fts_query handles classification (v0.4.0 KAPPA) + corruption
|
| 912 |
# containment (poisoned-index DatabaseError self-heals or returns []).
|
| 913 |
+
cat_clause = " AND e.category = ?" if category else ""
|
| 914 |
+
params = ((match_q, self._tenant_id, category, limit) if category
|
| 915 |
+
else (match_q, self._tenant_id, limit))
|
| 916 |
with self._storage.connection() as conn:
|
| 917 |
rows = _fts_query(
|
| 918 |
conn,
|
| 919 |
"SELECT e.id, e.tenant_id, e.category, e.name, e.status, e.body, e.created_at, e.updated_at "
|
| 920 |
"FROM entities_fts f "
|
| 921 |
"JOIN entities e ON e.rowid = f.rowid "
|
| 922 |
+
"WHERE entities_fts MATCH ? AND f.tenant_id = ?" + cat_clause + " "
|
| 923 |
"ORDER BY rank LIMIT ?",
|
| 924 |
+
params,
|
| 925 |
"entities_fts",
|
| 926 |
)
|
| 927 |
return [self._row_to_entity(r) for r in rows]
|
|
|
|
| 1058 |
},
|
| 1059 |
"snippet": r["snip"], "rank": r["rank"], "ts": r["ts"],
|
| 1060 |
})
|
| 1061 |
+
# Sort by rank (lower = better in FTS5), with a tier tiebreaker: at
|
| 1062 |
+
# comparable rank the content tiers (entity/state/reference) sort before
|
| 1063 |
+
# the contentless journal tier, whose BM25 scores are not on the same
|
| 1064 |
+
# scale (cross-tier rank comparability, tester email 19e7eb3096b4dae5).
|
| 1065 |
+
_tier_rank = {"entity": 0, "state": 0, "reference": 0, "journal": 1}
|
| 1066 |
+
hits.sort(key=lambda h: (h["rank"], _tier_rank.get(h["tier"], 0)))
|
| 1067 |
return hits[:limit]
|
| 1068 |
|
| 1069 |
# ------------------------------------------------------------------
|
|
@@ -14,20 +14,37 @@ linked records returns only the single strongest match and misses the rest.
|
|
| 14 |
(so "rejected" / "denied" / injection queries return []);
|
| 15 |
- on a terminal-state query, drop purely-preparatory records
|
| 16 |
(draft / triage / forecast), negation-aware;
|
| 17 |
-
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
Uses only the public MemoryClient surface (search / list_entities), so it adds
|
| 33 |
no coupling to client internals.
|
|
@@ -51,10 +68,17 @@ _PREP_RE = re.compile(
|
|
| 51 |
r'\b(draft|triage|forecast|planning|proposed|tentative|pending|agenda|'
|
| 52 |
r'scheduled|rehearsal|sample|option|wip|follow-?up)\b|work in progress')
|
| 53 |
|
| 54 |
-
# ---
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
_PER_TOKEN_LIMIT = 200 # recall depth per token
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
|
| 60 |
def _significant_tokens(query: str):
|
|
@@ -101,17 +125,34 @@ def multi_record_search(client, query: str, *, limit: int = 10, corpus_n: int |
|
|
| 101 |
idf = {t: math.log((corpus_n + 1) / (df[t] + 1)) + 1.0 for t in toks}
|
| 102 |
total = sum(idf.values()) or 1.0
|
| 103 |
terminal_q = bool(set(toks) & _TERMINAL_Q)
|
| 104 |
-
|
| 105 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
scored = []
|
| 108 |
for e in cand.values():
|
| 109 |
if terminal_q and _pure_prep(e["body"]):
|
| 110 |
continue # drop purely-preparatory on a final-state query
|
| 111 |
-
if selective and not (e["m"] & selective):
|
| 112 |
-
continue # drop cross-talk (only common terms matched)
|
| 113 |
cov = sum(idf[t] for t in e["m"]) / total
|
| 114 |
-
if cov
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
(so "rejected" / "denied" / injection queries return []);
|
| 15 |
- on a terminal-state query, drop purely-preparatory records
|
| 16 |
(draft / triage / forecast), negation-aware;
|
| 17 |
+
- ANCHOR-FIRST (hybrid): keep a candidate that is in the
|
| 18 |
+
anchor's cluster (matches >= 1 anchor term, the rarest most
|
| 19 |
+
discriminating tokens) OR clears the high-coverage bar
|
| 20 |
+
ANCHOR_HYBRID_HI. A non-anchor, mid-coverage candidate is
|
| 21 |
+
cross-cluster pollution and is dropped. The pure strict
|
| 22 |
+
filter killed pollution but over-dropped natural-language
|
| 23 |
+
evidence that lacks the rare anchor; the hybrid keeps both;
|
| 24 |
+
- rank by IDF-weighted coverage with a tier tiebreaker
|
| 25 |
+
(content tiers before contentless journal), keep
|
| 26 |
+
>= COVERAGE_THRESHOLD.
|
| 27 |
+
|
| 28 |
+
Bench: baseline single-pass 4/10; recall-only multipass 3/10 (REGRESSES). The
|
| 29 |
+
prior retrieve-then-verify scored 10/10 at 24 records but only ~0.36 recall at
|
| 30 |
+
50-100 companies (tester Runs 16/17) because its selectivity cutoff was a corpus
|
| 31 |
+
fraction (round(0.15 * corpus_n)) that lost meaning at scale: past ~150 records
|
| 32 |
+
almost every term read as "selective," so cross-cluster records cleared the gate.
|
| 33 |
+
The anchor-first rewrite (this version) defines the anchor RELATIVE to the rarest
|
| 34 |
+
query term, so the precision gate is scale-invariant (tester Runs 24-29:
|
| 35 |
+
100/100 recall, 0 pollution at 100 companies / 1621 writes). Abstention and the
|
| 36 |
+
terminal/prep gates are preserved unchanged.
|
| 37 |
+
|
| 38 |
+
ANCHOR_HYBRID_HI was tuned on a real-data retrieval diagnostic (LongMemEval text
|
| 39 |
+
combined into one store): the pure anchor-only filter regressed natural-language
|
| 40 |
+
recall (gold evidence that lacks the rare anchor); HI=0.65 restores it while
|
| 41 |
+
keeping synthetic-workflow pollution at 0. Per-question (oracle) retrieval is not
|
| 42 |
+
regressed by this change (NEW >= OLD).
|
| 43 |
+
|
| 44 |
+
CAVEAT — COVERAGE_THRESHOLD, ANCHOR_BAND, ANCHOR_HYBRID_HI, and the prep/terminal
|
| 45 |
+
lexicon are defaults validated against the synthetic multi-cluster scale test
|
| 46 |
+
(tests/test_anchor_resolver_2026_06_06.py) + the LongMemEval retrieval diagnostic;
|
| 47 |
+
re-validate if corpus structure changes.
|
| 48 |
|
| 49 |
Uses only the public MemoryClient surface (search / list_entities), so it adds
|
| 50 |
no coupling to client internals.
|
|
|
|
| 68 |
r'\b(draft|triage|forecast|planning|proposed|tentative|pending|agenda|'
|
| 69 |
r'scheduled|rehearsal|sample|option|wip|follow-?up)\b|work in progress')
|
| 70 |
|
| 71 |
+
# --- anchor-first resolver constants (see CAVEAT in the module docstring) ---
|
| 72 |
+
# Replaces the 24-record bench tuning (SELECTIVE_CUTOFF_FRAC = 0.15) that
|
| 73 |
+
# collapsed at scale. The anchor is defined RELATIVE to the rarest query term,
|
| 74 |
+
# so it is scale-invariant.
|
| 75 |
+
ANCHOR_BAND = 2.0 # a term is an "anchor" if df <= ANCHOR_BAND * rarest-term df
|
| 76 |
+
COVERAGE_THRESHOLD = 0.45 # hard coverage floor: drop candidates below this
|
| 77 |
+
ANCHOR_HYBRID_HI = 0.65 # a non-anchor candidate is kept only if coverage >= this
|
| 78 |
_PER_TOKEN_LIMIT = 200 # recall depth per token
|
| 79 |
+
# content tiers beat the contentless journal tier at equal coverage (cross-tier
|
| 80 |
+
# BM25 scores are not comparable; tester email 19e7eb3096b4dae5)
|
| 81 |
+
_TIER_PRIORITY = {"entity": 0, "state": 0, "reference": 0, "journal": 1}
|
| 82 |
|
| 83 |
|
| 84 |
def _significant_tokens(query: str):
|
|
|
|
| 125 |
idf = {t: math.log((corpus_n + 1) / (df[t] + 1)) + 1.0 for t in toks}
|
| 126 |
total = sum(idf.values()) or 1.0
|
| 127 |
terminal_q = bool(set(toks) & _TERMINAL_Q)
|
| 128 |
+
|
| 129 |
+
# Anchor-first: anchor terms are the rarest (most discriminating) tokens,
|
| 130 |
+
# defined relative to the rarest term so the band is scale-invariant. Every
|
| 131 |
+
# candidate is strict-filtered to the anchor's cluster (must match >= 1 anchor
|
| 132 |
+
# term), which removes the cross-cluster pollution the old corpus-fraction
|
| 133 |
+
# cutoff let through at scale. Anchor-raw recalls fully but pollutes; the
|
| 134 |
+
# strict filter is the load-bearing precision gate (tester Runs 24-29).
|
| 135 |
+
min_df = min(df.values())
|
| 136 |
+
anchor_cut = max(2, round(ANCHOR_BAND * min_df))
|
| 137 |
+
anchor_terms = {t for t in toks if df[t] <= anchor_cut}
|
| 138 |
|
| 139 |
scored = []
|
| 140 |
for e in cand.values():
|
| 141 |
if terminal_q and _pure_prep(e["body"]):
|
| 142 |
continue # drop purely-preparatory on a final-state query
|
|
|
|
|
|
|
| 143 |
cov = sum(idf[t] for t in e["m"]) / total
|
| 144 |
+
if cov < COVERAGE_THRESHOLD:
|
| 145 |
+
continue # below the hard coverage floor
|
| 146 |
+
# Anchor-first HYBRID gate: keep a candidate that is in the anchor's
|
| 147 |
+
# cluster (matches an anchor term) OR clears the high-coverage bar
|
| 148 |
+
# (genuinely relevant despite lacking the rare anchor, e.g. natural-
|
| 149 |
+
# language evidence). A non-anchor, mid-coverage candidate is pure
|
| 150 |
+
# cross-cluster pollution and is dropped. Tuned on the LongMemEval
|
| 151 |
+
# retrieval diagnostic: synthetic-workflow pollution -> 0 while natural-
|
| 152 |
+
# language recall is preserved (anchor-only over-filtered real queries).
|
| 153 |
+
if anchor_terms and not (e["m"] & anchor_terms) and cov < ANCHOR_HYBRID_HI:
|
| 154 |
+
continue
|
| 155 |
+
tier = e["hit"].get("tier")
|
| 156 |
+
scored.append((e["hit"], cov, _TIER_PRIORITY.get(tier, 0), e["best"]))
|
| 157 |
+
scored.sort(key=lambda x: (-x[1], x[2], x[3]))
|
| 158 |
+
return [h for h, _cov, _tp, _best in scored[:limit]]
|
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Anchor-first resolver + search refinements (combined patch, 2026-06-06).
|
| 2 |
+
|
| 3 |
+
Validates the fix for the multi-record recall/precision regression that tester
|
| 4 |
+
Sylvain surfaced (Runs 16/17 ~0.36 recall at 50-100 companies) and validated the
|
| 5 |
+
anchor-first remedy for (Runs 24-29: full recall, zero pollution at 100
|
| 6 |
+
companies). Source memo: memory/research/sylvain-anchor-first-resolver-runs24-29-2026-05-31.md.
|
| 7 |
+
|
| 8 |
+
Three changes under test:
|
| 9 |
+
1. multi_record_search anchor-first strict-filter (scale-invariant precision).
|
| 10 |
+
2. MemoryClient.search_entities(category=...) anchor filter.
|
| 11 |
+
3. MemoryClient.search() cross-tier rank tiebreaker (content before journal).
|
| 12 |
+
"""
|
| 13 |
+
from sibyl_memory_client import MemoryClient
|
| 14 |
+
from sibyl_memory_client.multi_record import multi_record_search
|
| 15 |
+
|
| 16 |
+
_TYPES = {
|
| 17 |
+
"report": "report revenue forecast quarterly",
|
| 18 |
+
"email": "email thread followup correspondence",
|
| 19 |
+
"journal": "journal meeting notes minutes",
|
| 20 |
+
"bug": "bug ticket error defect",
|
| 21 |
+
}
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def _build_corpus(c, n):
|
| 25 |
+
"""n companies, each with 4 linked records sharing THREE per-group topic
|
| 26 |
+
terms — the cross-cluster contamination vector that defeated the old
|
| 27 |
+
corpus-fraction selectivity cutoff."""
|
| 28 |
+
for i in range(n):
|
| 29 |
+
anchor = f"co{i:04d}"
|
| 30 |
+
g = i % max(1, n // 12)
|
| 31 |
+
topics = f"topic{g}alpha topic{g}beta topic{g}gamma"
|
| 32 |
+
for t, tt in _TYPES.items():
|
| 33 |
+
c.set_entity(t, f"{t}-{i}", {"text": f"{anchor} {topics} {t} {tt} project status update"})
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def test_anchor_first_full_recall_zero_pollution_at_scale(tmp_path):
|
| 37 |
+
n = 60
|
| 38 |
+
c = MemoryClient.local(tmp_path / "scale.db", tenant_id="scale")
|
| 39 |
+
_build_corpus(c, n)
|
| 40 |
+
|
| 41 |
+
exp_total = rec_total = pollution = 0
|
| 42 |
+
for i in range(n):
|
| 43 |
+
anchor = f"co{i:04d}"
|
| 44 |
+
g = i % max(1, n // 12)
|
| 45 |
+
res = multi_record_search(c, f"{anchor} topic{g}alpha topic{g}beta topic{g}gamma", limit=20)
|
| 46 |
+
expected = {f"{t}-{i}" for t in _TYPES}
|
| 47 |
+
got = {h.get("key") for h in res}
|
| 48 |
+
exp_total += len(expected)
|
| 49 |
+
rec_total += len(expected & got)
|
| 50 |
+
for h in res:
|
| 51 |
+
txt = (h.get("body") or {}).get("text", "")
|
| 52 |
+
if anchor not in txt:
|
| 53 |
+
pollution += 1
|
| 54 |
+
|
| 55 |
+
assert rec_total == exp_total, f"recall regressed: {rec_total}/{exp_total}"
|
| 56 |
+
assert pollution == 0, f"cross-cluster pollution leaked: {pollution} hits"
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
def test_abstention_preserved(tmp_path):
|
| 60 |
+
c = MemoryClient.local(tmp_path / "ab.db", tenant_id="scale")
|
| 61 |
+
_build_corpus(c, 20)
|
| 62 |
+
# a term with zero corpus support must collapse the whole query to []
|
| 63 |
+
assert multi_record_search(c, "co0001 nonexistenttokenzzzq report", limit=10) == []
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def test_single_cluster_query_returns_only_that_cluster(tmp_path):
|
| 67 |
+
n = 40
|
| 68 |
+
c = MemoryClient.local(tmp_path / "sc.db", tenant_id="scale")
|
| 69 |
+
_build_corpus(c, n)
|
| 70 |
+
g = 7 % max(1, n // 12) # same group formula the corpus uses
|
| 71 |
+
res = multi_record_search(c, f"co0007 topic{g}alpha topic{g}beta topic{g}gamma", limit=20)
|
| 72 |
+
assert res, "expected the anchor cluster to be returned"
|
| 73 |
+
for h in res:
|
| 74 |
+
assert "co0007" in (h.get("body") or {}).get("text", ""), "leaked a non-anchor record"
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
def test_search_entities_category_filter(tmp_path):
|
| 78 |
+
c = MemoryClient.local(tmp_path / "cat.db", tenant_id="scale")
|
| 79 |
+
c.set_entity("report", "r1", {"text": "synergy roadmap alpha"})
|
| 80 |
+
c.set_entity("report", "r2", {"text": "synergy roadmap beta"})
|
| 81 |
+
c.set_entity("memo", "m1", {"text": "synergy roadmap gamma"})
|
| 82 |
+
|
| 83 |
+
all_hits = c.search_entities("synergy")
|
| 84 |
+
assert {h["name"] for h in all_hits} == {"r1", "r2", "m1"}
|
| 85 |
+
|
| 86 |
+
report_only = c.search_entities("synergy", category="report")
|
| 87 |
+
assert {h["name"] for h in report_only} == {"r1", "r2"}
|
| 88 |
+
assert all(h["category"] == "report" for h in report_only)
|
| 89 |
+
|
| 90 |
+
memo_only = c.search_entities("synergy", category="memo")
|
| 91 |
+
assert {h["name"] for h in memo_only} == {"m1"}
|
|
@@ -4,6 +4,17 @@ All notable changes to `sibyl-memory-mcp` are recorded here. Format follows
|
|
| 4 |
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning follows
|
| 5 |
[SemVer](https://semver.org/).
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
## [0.1.7] - 2026-06-05
|
| 8 |
|
| 9 |
### Fixed
|
|
|
|
| 4 |
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning follows
|
| 5 |
[SemVer](https://semver.org/).
|
| 6 |
|
| 7 |
+
## [0.1.8] - 2026-06-06
|
| 8 |
+
|
| 9 |
+
### Changed
|
| 10 |
+
|
| 11 |
+
- **Pin `sibyl-memory-client>=0.4.9`.** Picks up the anchor-first hybrid
|
| 12 |
+
multi-record resolver (client 0.4.9): `memory_search` now strict-filters
|
| 13 |
+
multi-record / linked-record queries to the query's anchor cluster while
|
| 14 |
+
keeping high-coverage natural-language evidence, eliminating cross-cluster
|
| 15 |
+
pollution at scale. No MCP code change; routing through `multi_record_search`
|
| 16 |
+
is unchanged.
|
| 17 |
+
|
| 18 |
## [0.1.7] - 2026-06-05
|
| 19 |
|
| 20 |
### Fixed
|
|
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|
| 4 |
|
| 5 |
[project]
|
| 6 |
name = "sibyl-memory-mcp"
|
| 7 |
-
version = "0.1.
|
| 8 |
description = "MCP server for Sibyl Memory Plugin: wraps the local SQLite + FTS5 memory engine and exposes it to MCP-compatible agents (Claude Code, Codex, Cursor, Continue, anything that speaks MCP)."
|
| 9 |
readme = "README.md"
|
| 10 |
requires-python = ">=3.10"
|
|
@@ -23,7 +23,7 @@ classifiers = [
|
|
| 23 |
]
|
| 24 |
dependencies = [
|
| 25 |
"mcp>=1.0.0",
|
| 26 |
-
"sibyl-memory-client>=0.4.
|
| 27 |
"sibyl-memory-hermes>=0.3.2",
|
| 28 |
]
|
| 29 |
|
|
|
|
| 4 |
|
| 5 |
[project]
|
| 6 |
name = "sibyl-memory-mcp"
|
| 7 |
+
version = "0.1.8"
|
| 8 |
description = "MCP server for Sibyl Memory Plugin: wraps the local SQLite + FTS5 memory engine and exposes it to MCP-compatible agents (Claude Code, Codex, Cursor, Continue, anything that speaks MCP)."
|
| 9 |
readme = "README.md"
|
| 10 |
requires-python = ">=3.10"
|
|
|
|
| 23 |
]
|
| 24 |
dependencies = [
|
| 25 |
"mcp>=1.0.0",
|
| 26 |
+
"sibyl-memory-client>=0.4.9",
|
| 27 |
"sibyl-memory-hermes>=0.3.2",
|
| 28 |
]
|
| 29 |
|