sibyllabs commited on
Commit
2b83607
·
1 Parent(s): 6700c84

multi-record: anchor-first hybrid resolver (client 0.4.9 + mcp 0.1.8)

Browse files

Fix the multi-record / linked-record retrieval regression at scale. The old
corpus-fraction selectivity cutoff (round(0.15 * corpus_n)) lost meaning past
~150 records and let cross-cluster records pollute results (~0.36 recall at
50-100 companies). The resolver is now anchor-first with a hybrid gate: a
candidate survives if it is in the query's anchor cluster (matches a rarest-term
anchor) OR clears a high-coverage bar. Scale-invariant; abstention and
terminal/prep gates unchanged.

Also: search() cross-tier rank tiebreaker (content tiers before contentless
journal) and search_entities(category=) anchor filter. mcp pins client>=0.4.9.

Validated: client 110 + mcp 22 tests green; synthetic 480-record A/B (full
recall, 0 cross-cluster pollution vs 1920); LongMemEval retrieval diagnostic
(per-question oracle NEW>=OLD).

sibyl-memory-client/CHANGELOG.md CHANGED
@@ -4,6 +4,45 @@ All notable changes to `sibyl-memory-client` are recorded here. Format
4
  follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning
5
  follows [SemVer](https://semver.org/).
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ## [0.4.8] - 2026-06-04
8
 
9
  ### Fixed
 
4
  follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning
5
  follows [SemVer](https://semver.org/).
6
 
7
+ ## [0.4.9] - 2026-06-06
8
+
9
+ ### Fixed
10
+
11
+ - **Multi-record search recall/precision regression at scale (anchor-first hybrid resolver).**
12
+ `multi_record_search` used a corpus-fraction selectivity cutoff
13
+ (`round(0.15 * corpus_n)`) calibrated on a 24-record reconstruction. Past ~150
14
+ records the cutoff lost meaning: almost every term read as "selective," so
15
+ cross-cluster records cleared the gate and polluted results (tester Sylvain
16
+ Runs 16/17, ~0.36 recall at 50-100 companies). The resolver is now anchor-first:
17
+ anchor terms are the rarest tokens, defined RELATIVE to the rarest query term
18
+ (`df <= ANCHOR_BAND * min_df`, scale-invariant). The gate is a HYBRID: a
19
+ candidate survives if it is in the anchor's cluster (matches an anchor term) OR
20
+ clears the high-coverage bar `ANCHOR_HYBRID_HI` (genuinely relevant despite
21
+ lacking the rare anchor). A pure strict filter killed cross-cluster pollution
22
+ but over-dropped natural-language evidence; the hybrid keeps both. Abstention
23
+ (zero-support term) and the terminal/prep gates are unchanged. Validated two
24
+ ways: (a) synthetic 480-record workflow A/B — full recall, 0 cross-cluster
25
+ pollution vs the old code's 1,920 polluting hits over 120 queries (matches
26
+ tester Runs 24-29); (b) real-data LongMemEval retrieval diagnostic — per-question
27
+ (oracle) retrieval is not regressed (NEW >= OLD, +3.4pts), and in a combined-
28
+ store contamination stress NEW cuts cross-question pollution ~29% for a small
29
+ recall trade. Regression guard: `tests/test_anchor_resolver_2026_06_06.py`.
30
+
31
+ - **Cross-tier rank comparability.** `search()` BM25 ranks are not on a common
32
+ scale across FTS tables (`journal_events_fts` is contentless). Added a tier
33
+ tiebreaker so content tiers (entity/state/reference) sort before journal at equal
34
+ rank, layered on the existing 0.4.7 journal cap. (tester email 19e7eb3096b4dae5)
35
+
36
+ ### Added
37
+
38
+ - **`search_entities(category=...)`.** Optional exact-match category anchor on
39
+ entity FTS, removing topical bleed across categories on multi-entity workloads
40
+ (tester email 19e7e75af0b7780a). Backward compatible (defaults to all categories).
41
+
42
+ Sourced from Sylvain's beta Runs 24-29 + the bugflow batch dedup; this single
43
+ patch also supersedes ~20 already-fixed entries that had accumulated in the
44
+ bug-batch queue.
45
+
46
  ## [0.4.8] - 2026-06-04
47
 
48
  ### Fixed
sibyl-memory-client/pyproject.toml CHANGED
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
 
5
  [project]
6
  name = "sibyl-memory-client"
7
- version = "0.4.8"
8
  description = "Local-first agentic memory SDK. SQLite-backed five-tier hierarchical schema, FTS5 search, multi-tenant, with self-learning skill detection and local memory linter. Foundation of the Sibyl Memory Plugin family."
9
  authors = [{ name = "SIBYL, Sibyl Labs LLC", email = "sibyl@sibyllabs.org" }]
10
  license = { text = "MIT" }
 
4
 
5
  [project]
6
  name = "sibyl-memory-client"
7
+ version = "0.4.9"
8
  description = "Local-first agentic memory SDK. SQLite-backed five-tier hierarchical schema, FTS5 search, multi-tenant, with self-learning skill detection and local memory linter. Foundation of the Sibyl Memory Plugin family."
9
  authors = [{ name = "SIBYL, Sibyl Labs LLC", email = "sibyl@sibyllabs.org" }]
10
  license = { text = "MIT" }
sibyl-memory-client/src/sibyl_memory_client/client.py CHANGED
@@ -881,7 +881,8 @@ class MemoryClient:
881
  # ------------------------------------------------------------------
882
  # FTS5 search
883
  # ------------------------------------------------------------------
884
- def search_entities(self, query: str, *, limit: int = 20, prefix: bool = False) -> list[dict[str, Any]]:
 
885
  """Full-text search over entity name + category + body via FTS5.
886
 
887
  Returns warm-tier entity rows only. For cross-tier search (entities +
@@ -891,6 +892,11 @@ class MemoryClient:
891
  (``name:foo``) and unclosed quotes can't escape into the parser.
892
  Set ``prefix=True`` for prefix matching on the final token.
893
 
 
 
 
 
 
894
  Returns: list of entity rows. Each row is a dict with keys
895
  id, tenant_id, category, name, status, body, created_at, updated_at
896
  (body is JSON-deserialized).
@@ -904,15 +910,18 @@ class MemoryClient:
904
  # external-content FTS5: join by rowid back to base table.
905
  # _fts_query handles classification (v0.4.0 KAPPA) + corruption
906
  # containment (poisoned-index DatabaseError self-heals or returns []).
 
 
 
907
  with self._storage.connection() as conn:
908
  rows = _fts_query(
909
  conn,
910
  "SELECT e.id, e.tenant_id, e.category, e.name, e.status, e.body, e.created_at, e.updated_at "
911
  "FROM entities_fts f "
912
  "JOIN entities e ON e.rowid = f.rowid "
913
- "WHERE entities_fts MATCH ? AND f.tenant_id = ? "
914
  "ORDER BY rank LIMIT ?",
915
- (match_q, self._tenant_id, limit),
916
  "entities_fts",
917
  )
918
  return [self._row_to_entity(r) for r in rows]
@@ -1049,8 +1058,12 @@ class MemoryClient:
1049
  },
1050
  "snippet": r["snip"], "rank": r["rank"], "ts": r["ts"],
1051
  })
1052
- # Sort by rank (lower = better in FTS5) and apply global limit
1053
- hits.sort(key=lambda h: h["rank"])
 
 
 
 
1054
  return hits[:limit]
1055
 
1056
  # ------------------------------------------------------------------
 
881
  # ------------------------------------------------------------------
882
  # FTS5 search
883
  # ------------------------------------------------------------------
884
+ def search_entities(self, query: str, *, limit: int = 20, prefix: bool = False,
885
+ category: str | None = None) -> list[dict[str, Any]]:
886
  """Full-text search over entity name + category + body via FTS5.
887
 
888
  Returns warm-tier entity rows only. For cross-tier search (entities +
 
892
  (``name:foo``) and unclosed quotes can't escape into the parser.
893
  Set ``prefix=True`` for prefix matching on the final token.
894
 
895
+ Pass ``category="<name>"`` to anchor the search to a single entity
896
+ category (exact match); this removes topical bleed across categories on
897
+ multi-entity workloads (tester email 19e7e75af0b7780a). Omit to search
898
+ all categories.
899
+
900
  Returns: list of entity rows. Each row is a dict with keys
901
  id, tenant_id, category, name, status, body, created_at, updated_at
902
  (body is JSON-deserialized).
 
910
  # external-content FTS5: join by rowid back to base table.
911
  # _fts_query handles classification (v0.4.0 KAPPA) + corruption
912
  # containment (poisoned-index DatabaseError self-heals or returns []).
913
+ cat_clause = " AND e.category = ?" if category else ""
914
+ params = ((match_q, self._tenant_id, category, limit) if category
915
+ else (match_q, self._tenant_id, limit))
916
  with self._storage.connection() as conn:
917
  rows = _fts_query(
918
  conn,
919
  "SELECT e.id, e.tenant_id, e.category, e.name, e.status, e.body, e.created_at, e.updated_at "
920
  "FROM entities_fts f "
921
  "JOIN entities e ON e.rowid = f.rowid "
922
+ "WHERE entities_fts MATCH ? AND f.tenant_id = ?" + cat_clause + " "
923
  "ORDER BY rank LIMIT ?",
924
+ params,
925
  "entities_fts",
926
  )
927
  return [self._row_to_entity(r) for r in rows]
 
1058
  },
1059
  "snippet": r["snip"], "rank": r["rank"], "ts": r["ts"],
1060
  })
1061
+ # Sort by rank (lower = better in FTS5), with a tier tiebreaker: at
1062
+ # comparable rank the content tiers (entity/state/reference) sort before
1063
+ # the contentless journal tier, whose BM25 scores are not on the same
1064
+ # scale (cross-tier rank comparability, tester email 19e7eb3096b4dae5).
1065
+ _tier_rank = {"entity": 0, "state": 0, "reference": 0, "journal": 1}
1066
+ hits.sort(key=lambda h: (h["rank"], _tier_rank.get(h["tier"], 0)))
1067
  return hits[:limit]
1068
 
1069
  # ------------------------------------------------------------------
sibyl-memory-client/src/sibyl_memory_client/multi_record.py CHANGED
@@ -14,20 +14,37 @@ linked records returns only the single strongest match and misses the rest.
14
  (so "rejected" / "denied" / injection queries return []);
15
  - on a terminal-state query, drop purely-preparatory records
16
  (draft / triage / forecast), negation-aware;
17
- - a candidate must match >= 1 rare/selective term, not just
18
- common ones (kills cross-talk from neighbouring clusters);
19
- - rank by IDF-weighted coverage, keep >= COVERAGE_THRESHOLD.
20
-
21
- Bench (reconstructed Run15 oracle, client 0.4.x): baseline single-pass 4/10;
22
- recall-only multipass 3/10 (REGRESSES breaks abstention + pulls distractors);
23
- this 10/10. The verify gates are load-bearing recall alone regresses.
24
-
25
- CAVEAT the constants below (SELECTIVE_CUTOFF_FRAC, COVERAGE_THRESHOLD, the
26
- prep/terminal lexicon, the strict zero-support abstention) are tuned on a
27
- 24-record reconstruction, NOT generalized to production-scale corpora. Validate
28
- against real-scale data or gate behind a flag before relying on it at scale.
29
- Generalization candidates: corpus-relative IDF percentile, normalized coverage
30
- threshold, anchor-term / min-coverage abstention, learned state classification.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  Uses only the public MemoryClient surface (search / list_entities), so it adds
33
  no coupling to client internals.
@@ -51,10 +68,17 @@ _PREP_RE = re.compile(
51
  r'\b(draft|triage|forecast|planning|proposed|tentative|pending|agenda|'
52
  r'scheduled|rehearsal|sample|option|wip|follow-?up)\b|work in progress')
53
 
54
- # --- bench-tuned constants (see CAVEAT in the module docstring) -------------
55
- SELECTIVE_CUTOFF_FRAC = 0.15 # term is "selective"/rare if df <= frac * corpus_size
56
- COVERAGE_THRESHOLD = 0.45 # keep candidates whose IDF-weighted coverage >= this
 
 
 
 
57
  _PER_TOKEN_LIMIT = 200 # recall depth per token
 
 
 
58
 
59
 
60
  def _significant_tokens(query: str):
@@ -101,17 +125,34 @@ def multi_record_search(client, query: str, *, limit: int = 10, corpus_n: int |
101
  idf = {t: math.log((corpus_n + 1) / (df[t] + 1)) + 1.0 for t in toks}
102
  total = sum(idf.values()) or 1.0
103
  terminal_q = bool(set(toks) & _TERMINAL_Q)
104
- sel_cut = max(1, round(SELECTIVE_CUTOFF_FRAC * corpus_n))
105
- selective = {t for t in toks if df[t] <= sel_cut}
 
 
 
 
 
 
 
 
106
 
107
  scored = []
108
  for e in cand.values():
109
  if terminal_q and _pure_prep(e["body"]):
110
  continue # drop purely-preparatory on a final-state query
111
- if selective and not (e["m"] & selective):
112
- continue # drop cross-talk (only common terms matched)
113
  cov = sum(idf[t] for t in e["m"]) / total
114
- if cov >= COVERAGE_THRESHOLD:
115
- scored.append((e["hit"], cov, e["best"]))
116
- scored.sort(key=lambda x: (-x[1], x[2]))
117
- return [h for h, _cov, _best in scored[:limit]]
 
 
 
 
 
 
 
 
 
 
 
 
14
  (so "rejected" / "denied" / injection queries return []);
15
  - on a terminal-state query, drop purely-preparatory records
16
  (draft / triage / forecast), negation-aware;
17
+ - ANCHOR-FIRST (hybrid): keep a candidate that is in the
18
+ anchor's cluster (matches >= 1 anchor term, the rarest most
19
+ discriminating tokens) OR clears the high-coverage bar
20
+ ANCHOR_HYBRID_HI. A non-anchor, mid-coverage candidate is
21
+ cross-cluster pollution and is dropped. The pure strict
22
+ filter killed pollution but over-dropped natural-language
23
+ evidence that lacks the rare anchor; the hybrid keeps both;
24
+ - rank by IDF-weighted coverage with a tier tiebreaker
25
+ (content tiers before contentless journal), keep
26
+ >= COVERAGE_THRESHOLD.
27
+
28
+ Bench: baseline single-pass 4/10; recall-only multipass 3/10 (REGRESSES). The
29
+ prior retrieve-then-verify scored 10/10 at 24 records but only ~0.36 recall at
30
+ 50-100 companies (tester Runs 16/17) because its selectivity cutoff was a corpus
31
+ fraction (round(0.15 * corpus_n)) that lost meaning at scale: past ~150 records
32
+ almost every term read as "selective," so cross-cluster records cleared the gate.
33
+ The anchor-first rewrite (this version) defines the anchor RELATIVE to the rarest
34
+ query term, so the precision gate is scale-invariant (tester Runs 24-29:
35
+ 100/100 recall, 0 pollution at 100 companies / 1621 writes). Abstention and the
36
+ terminal/prep gates are preserved unchanged.
37
+
38
+ ANCHOR_HYBRID_HI was tuned on a real-data retrieval diagnostic (LongMemEval text
39
+ combined into one store): the pure anchor-only filter regressed natural-language
40
+ recall (gold evidence that lacks the rare anchor); HI=0.65 restores it while
41
+ keeping synthetic-workflow pollution at 0. Per-question (oracle) retrieval is not
42
+ regressed by this change (NEW >= OLD).
43
+
44
+ CAVEAT — COVERAGE_THRESHOLD, ANCHOR_BAND, ANCHOR_HYBRID_HI, and the prep/terminal
45
+ lexicon are defaults validated against the synthetic multi-cluster scale test
46
+ (tests/test_anchor_resolver_2026_06_06.py) + the LongMemEval retrieval diagnostic;
47
+ re-validate if corpus structure changes.
48
 
49
  Uses only the public MemoryClient surface (search / list_entities), so it adds
50
  no coupling to client internals.
 
68
  r'\b(draft|triage|forecast|planning|proposed|tentative|pending|agenda|'
69
  r'scheduled|rehearsal|sample|option|wip|follow-?up)\b|work in progress')
70
 
71
+ # --- anchor-first resolver constants (see CAVEAT in the module docstring) ---
72
+ # Replaces the 24-record bench tuning (SELECTIVE_CUTOFF_FRAC = 0.15) that
73
+ # collapsed at scale. The anchor is defined RELATIVE to the rarest query term,
74
+ # so it is scale-invariant.
75
+ ANCHOR_BAND = 2.0 # a term is an "anchor" if df <= ANCHOR_BAND * rarest-term df
76
+ COVERAGE_THRESHOLD = 0.45 # hard coverage floor: drop candidates below this
77
+ ANCHOR_HYBRID_HI = 0.65 # a non-anchor candidate is kept only if coverage >= this
78
  _PER_TOKEN_LIMIT = 200 # recall depth per token
79
+ # content tiers beat the contentless journal tier at equal coverage (cross-tier
80
+ # BM25 scores are not comparable; tester email 19e7eb3096b4dae5)
81
+ _TIER_PRIORITY = {"entity": 0, "state": 0, "reference": 0, "journal": 1}
82
 
83
 
84
  def _significant_tokens(query: str):
 
125
  idf = {t: math.log((corpus_n + 1) / (df[t] + 1)) + 1.0 for t in toks}
126
  total = sum(idf.values()) or 1.0
127
  terminal_q = bool(set(toks) & _TERMINAL_Q)
128
+
129
+ # Anchor-first: anchor terms are the rarest (most discriminating) tokens,
130
+ # defined relative to the rarest term so the band is scale-invariant. Every
131
+ # candidate is strict-filtered to the anchor's cluster (must match >= 1 anchor
132
+ # term), which removes the cross-cluster pollution the old corpus-fraction
133
+ # cutoff let through at scale. Anchor-raw recalls fully but pollutes; the
134
+ # strict filter is the load-bearing precision gate (tester Runs 24-29).
135
+ min_df = min(df.values())
136
+ anchor_cut = max(2, round(ANCHOR_BAND * min_df))
137
+ anchor_terms = {t for t in toks if df[t] <= anchor_cut}
138
 
139
  scored = []
140
  for e in cand.values():
141
  if terminal_q and _pure_prep(e["body"]):
142
  continue # drop purely-preparatory on a final-state query
 
 
143
  cov = sum(idf[t] for t in e["m"]) / total
144
+ if cov < COVERAGE_THRESHOLD:
145
+ continue # below the hard coverage floor
146
+ # Anchor-first HYBRID gate: keep a candidate that is in the anchor's
147
+ # cluster (matches an anchor term) OR clears the high-coverage bar
148
+ # (genuinely relevant despite lacking the rare anchor, e.g. natural-
149
+ # language evidence). A non-anchor, mid-coverage candidate is pure
150
+ # cross-cluster pollution and is dropped. Tuned on the LongMemEval
151
+ # retrieval diagnostic: synthetic-workflow pollution -> 0 while natural-
152
+ # language recall is preserved (anchor-only over-filtered real queries).
153
+ if anchor_terms and not (e["m"] & anchor_terms) and cov < ANCHOR_HYBRID_HI:
154
+ continue
155
+ tier = e["hit"].get("tier")
156
+ scored.append((e["hit"], cov, _TIER_PRIORITY.get(tier, 0), e["best"]))
157
+ scored.sort(key=lambda x: (-x[1], x[2], x[3]))
158
+ return [h for h, _cov, _tp, _best in scored[:limit]]
sibyl-memory-client/tests/test_anchor_resolver_2026_06_06.py ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Anchor-first resolver + search refinements (combined patch, 2026-06-06).
2
+
3
+ Validates the fix for the multi-record recall/precision regression that tester
4
+ Sylvain surfaced (Runs 16/17 ~0.36 recall at 50-100 companies) and validated the
5
+ anchor-first remedy for (Runs 24-29: full recall, zero pollution at 100
6
+ companies). Source memo: memory/research/sylvain-anchor-first-resolver-runs24-29-2026-05-31.md.
7
+
8
+ Three changes under test:
9
+ 1. multi_record_search anchor-first strict-filter (scale-invariant precision).
10
+ 2. MemoryClient.search_entities(category=...) anchor filter.
11
+ 3. MemoryClient.search() cross-tier rank tiebreaker (content before journal).
12
+ """
13
+ from sibyl_memory_client import MemoryClient
14
+ from sibyl_memory_client.multi_record import multi_record_search
15
+
16
+ _TYPES = {
17
+ "report": "report revenue forecast quarterly",
18
+ "email": "email thread followup correspondence",
19
+ "journal": "journal meeting notes minutes",
20
+ "bug": "bug ticket error defect",
21
+ }
22
+
23
+
24
+ def _build_corpus(c, n):
25
+ """n companies, each with 4 linked records sharing THREE per-group topic
26
+ terms — the cross-cluster contamination vector that defeated the old
27
+ corpus-fraction selectivity cutoff."""
28
+ for i in range(n):
29
+ anchor = f"co{i:04d}"
30
+ g = i % max(1, n // 12)
31
+ topics = f"topic{g}alpha topic{g}beta topic{g}gamma"
32
+ for t, tt in _TYPES.items():
33
+ c.set_entity(t, f"{t}-{i}", {"text": f"{anchor} {topics} {t} {tt} project status update"})
34
+
35
+
36
+ def test_anchor_first_full_recall_zero_pollution_at_scale(tmp_path):
37
+ n = 60
38
+ c = MemoryClient.local(tmp_path / "scale.db", tenant_id="scale")
39
+ _build_corpus(c, n)
40
+
41
+ exp_total = rec_total = pollution = 0
42
+ for i in range(n):
43
+ anchor = f"co{i:04d}"
44
+ g = i % max(1, n // 12)
45
+ res = multi_record_search(c, f"{anchor} topic{g}alpha topic{g}beta topic{g}gamma", limit=20)
46
+ expected = {f"{t}-{i}" for t in _TYPES}
47
+ got = {h.get("key") for h in res}
48
+ exp_total += len(expected)
49
+ rec_total += len(expected & got)
50
+ for h in res:
51
+ txt = (h.get("body") or {}).get("text", "")
52
+ if anchor not in txt:
53
+ pollution += 1
54
+
55
+ assert rec_total == exp_total, f"recall regressed: {rec_total}/{exp_total}"
56
+ assert pollution == 0, f"cross-cluster pollution leaked: {pollution} hits"
57
+
58
+
59
+ def test_abstention_preserved(tmp_path):
60
+ c = MemoryClient.local(tmp_path / "ab.db", tenant_id="scale")
61
+ _build_corpus(c, 20)
62
+ # a term with zero corpus support must collapse the whole query to []
63
+ assert multi_record_search(c, "co0001 nonexistenttokenzzzq report", limit=10) == []
64
+
65
+
66
+ def test_single_cluster_query_returns_only_that_cluster(tmp_path):
67
+ n = 40
68
+ c = MemoryClient.local(tmp_path / "sc.db", tenant_id="scale")
69
+ _build_corpus(c, n)
70
+ g = 7 % max(1, n // 12) # same group formula the corpus uses
71
+ res = multi_record_search(c, f"co0007 topic{g}alpha topic{g}beta topic{g}gamma", limit=20)
72
+ assert res, "expected the anchor cluster to be returned"
73
+ for h in res:
74
+ assert "co0007" in (h.get("body") or {}).get("text", ""), "leaked a non-anchor record"
75
+
76
+
77
+ def test_search_entities_category_filter(tmp_path):
78
+ c = MemoryClient.local(tmp_path / "cat.db", tenant_id="scale")
79
+ c.set_entity("report", "r1", {"text": "synergy roadmap alpha"})
80
+ c.set_entity("report", "r2", {"text": "synergy roadmap beta"})
81
+ c.set_entity("memo", "m1", {"text": "synergy roadmap gamma"})
82
+
83
+ all_hits = c.search_entities("synergy")
84
+ assert {h["name"] for h in all_hits} == {"r1", "r2", "m1"}
85
+
86
+ report_only = c.search_entities("synergy", category="report")
87
+ assert {h["name"] for h in report_only} == {"r1", "r2"}
88
+ assert all(h["category"] == "report" for h in report_only)
89
+
90
+ memo_only = c.search_entities("synergy", category="memo")
91
+ assert {h["name"] for h in memo_only} == {"m1"}
sibyl-memory-mcp/CHANGELOG.md CHANGED
@@ -4,6 +4,17 @@ All notable changes to `sibyl-memory-mcp` are recorded here. Format follows
4
  [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning follows
5
  [SemVer](https://semver.org/).
6
 
 
 
 
 
 
 
 
 
 
 
 
7
  ## [0.1.7] - 2026-06-05
8
 
9
  ### Fixed
 
4
  [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versioning follows
5
  [SemVer](https://semver.org/).
6
 
7
+ ## [0.1.8] - 2026-06-06
8
+
9
+ ### Changed
10
+
11
+ - **Pin `sibyl-memory-client>=0.4.9`.** Picks up the anchor-first hybrid
12
+ multi-record resolver (client 0.4.9): `memory_search` now strict-filters
13
+ multi-record / linked-record queries to the query's anchor cluster while
14
+ keeping high-coverage natural-language evidence, eliminating cross-cluster
15
+ pollution at scale. No MCP code change; routing through `multi_record_search`
16
+ is unchanged.
17
+
18
  ## [0.1.7] - 2026-06-05
19
 
20
  ### Fixed
sibyl-memory-mcp/pyproject.toml CHANGED
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
 
5
  [project]
6
  name = "sibyl-memory-mcp"
7
- version = "0.1.7"
8
  description = "MCP server for Sibyl Memory Plugin: wraps the local SQLite + FTS5 memory engine and exposes it to MCP-compatible agents (Claude Code, Codex, Cursor, Continue, anything that speaks MCP)."
9
  readme = "README.md"
10
  requires-python = ">=3.10"
@@ -23,7 +23,7 @@ classifiers = [
23
  ]
24
  dependencies = [
25
  "mcp>=1.0.0",
26
- "sibyl-memory-client>=0.4.8",
27
  "sibyl-memory-hermes>=0.3.2",
28
  ]
29
 
 
4
 
5
  [project]
6
  name = "sibyl-memory-mcp"
7
+ version = "0.1.8"
8
  description = "MCP server for Sibyl Memory Plugin: wraps the local SQLite + FTS5 memory engine and exposes it to MCP-compatible agents (Claude Code, Codex, Cursor, Continue, anything that speaks MCP)."
9
  readme = "README.md"
10
  requires-python = ">=3.10"
 
23
  ]
24
  dependencies = [
25
  "mcp>=1.0.0",
26
+ "sibyl-memory-client>=0.4.9",
27
  "sibyl-memory-hermes>=0.3.2",
28
  ]
29