File size: 18,729 Bytes
31e282f bcc0082 31e282f bcc0082 31e282f bcc0082 31e282f bcc0082 31e282f bcc0082 31e282f 547daee 31e282f bcc0082 31e282f bcc0082 31e282f bcc0082 31e282f bcc0082 31e282f bcc0082 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 | # Knowledge graph
A pre-built weighted graph of skills, agents, MCP servers, and
harnesses in the ctx ecosystem, shipped as `graph/wiki-graph.tar.gz`.
The on-disk JSON and `resolve_graph` Python API are harness-aware, including
plain-slug graph walks from `harness:<slug>` nodes. `ctx-monitor`
exposes skill/agent/MCP/harness wiki and graph views. Harness installation,
update, and uninstall are handled by `ctx-harness-install`; dashboard
load/unload POSTs deliberately reject harnesses and return the dry-run CLI
command to use instead. Quality scoring is exposed for sidecar-backed skills,
agents, and MCP servers.
## What's in it
Authoritative numbers from the shipped tarball. The curated-core snapshot
is **12,934 nodes** (1,470 curated skills + 467 agents + 10,790 MCP servers + 207 harnesses). Harness pages under `entities/harnesses/` are ingested into
local rebuilds and the separate harness recommendation path. The
tarball also carries **68,494 skill pages**; **67,024**
skill bodies are hydrated as installable `SKILL.md` files under
`converted/`; the **28,612** entries over the configured line
limit were converted to gated micro-skill orchestrators. Full original bodies
are used during graph rebuilds for semantic similarity, but
`SKILL.md.original` backups, transient `.lock` files, and `.ctx/` queue state
are omitted from the shipped tarball.
| | Count |
|---|---:|
| Total nodes | **79,958** |
| Curated core nodes | **12,934** (1,470 skills + 467 agents + 10,790 MCP servers + 207 harnesses) |
| Body-backed skill nodes | **67,024** hydrated installable skill entries |
| Total edges | **1,778,069** |
| Hydrated skill incident edges | **1,516,298** |
| Hydrated skill semantic incident edges | **911,922** |
| Communities | **52** (Louvain) |
| Edge sources (overlap-deduped) | semantic 1,088,763 - tag 474,837 - token 280,275 |
| Cross-type edges (skill <-> agent) | ~52,382 |
| Cross-type edges (skill <-> MCP) | ~30,295 |
| Cross-type edges (agent <-> MCP) | ~229 |
| Harness edges | **5,063** |
| Shipped skill index | **67,024** observed body-backed skill entries |
## Install
Use `ctx-init --graph` to install the fast runtime graph. Source checkouts use
`graph/wiki-graph-runtime.tar.gz`; pip installs download the matching GitHub
release asset for the installed package version. This installs
`graphify-out/*`, the skill index used by recommendations, and
the harness pages used by `ctx-harness-install`:
```bash
ctx-init --graph
```
To expand every shipped skill/agent/MCP entity page, harness page,
skill page, concept page, converted micro-skill pipeline,
and Obsidian vault metadata, request the full wiki artifact explicitly:
```bash
ctx-init --graph --graph-install-mode full
```
Manual extraction is still supported for offline/source installs. Extract the
full tarball into your `~/.claude/skill-wiki/` when you want local markdown
wiki browsing:
```bash
mkdir -p ~/.claude/skill-wiki
tar xzf graph/wiki-graph.tar.gz -C ~/.claude/skill-wiki/
```
On Windows PowerShell, create the target and use the built-in `tar.exe`
without `--force-local`:
```powershell
New-Item -ItemType Directory -Force "$env:USERPROFILE\.claude\skill-wiki" | Out-Null
tar -xzf graph\wiki-graph.tar.gz -C "$env:USERPROFILE\.claude\skill-wiki"
```
The extracted tree also opens directly as an Obsidian vault β the
`.obsidian/` config ships inside the tarball β so you can use
Obsidian's native graph view if you prefer it to the web dashboard.
## How edges are built
Edges are built and explained by the `ctx-wiki-graphify` console script
(`ctx.core.wiki.wiki_graphify`). A pair must first have at least one base
signal:
1. **Semantic cosine** β when the embedding backend is available, entity
text is embedded and semantic neighbors above the configured build floor
contribute weighted edges.
2. **Explicit frontmatter tags** β each entity page's YAML `tags:`
list contributes edges between every pair of entities that share
a tag. Popular tags capped at 500 nodes to avoid noise-floor
"everything connects to everything" mega-buckets like `typescript`
or `frontend`.
3. **Slug-token pseudo-tags** β each hyphenated slug contributes its
tokens as implicit tags. `fastapi-pro` contributes `fastapi`;
`python-patterns` contributes `python` and `patterns`. A stop-word
filter drops generic tokens like `skill`, `agent`, `pro`, `expert`,
`core` so they don't over-connect the graph.
4. **Source overlap** β pages with the same high-specificity source URL,
repository URL, homepage, detail URL, or package URL can connect even
when their tags differ. Dense source buckets are skipped.
5. **Direct wikilinks** β explicit entity links such as
`[[entities/agents/code-reviewer]]` create a direct graph edge.
Edge `weight` is the final blended strength. Semantic, tag, and token
weights form the base blend from `config.json`; source overlap and direct
links add configured boosts. Existing edges can also receive explainable
ranking boosts from Adamic-Adar shared-neighbor structure, type affinity,
usage telemetry, and quality scores. Those boost-only signals do not create
edges by themselves. The shipped default `graph.min_edge_weight` is `0.03`;
calibration against the 2026-05 shipped graph showed this is the highest
floor with zero edge loss, while `0.05` would remove roughly 29.7% of edges.
Edge metadata keeps the ingredients explainable: `semantic_sim`,
`shared_tags`, `shared_tokens`, `shared_sources`, `direct_link`,
`adamic_adar`, `type_affinity`, `usage_score`, `quality_score`,
`edge_reasons`, and `score_components`. Hydrated skill records use their
full source bodies during graph rebuilds, so long converted entries keep
full-body similarity even though the shipped installable `SKILL.md` files are
short gated loaders. The raw `SKILL.md.original` backups are build inputs, not
tarball members.
## Communities
After edges are built, `wiki_graphify` runs NetworkX's Louvain
community detection (`resolution=1.2`, `seed=42` for determinism).
The result is **52 communities** ranging from single-member isolated
specialists to several thousand members in broad clusters like
`Community + Official + AI`. Each community also gets an auto-generated
`concepts/<community>.md` wiki page summarizing its members and top
shared tags.
The legacy CNM ("greedy modularity") algorithm is still available
behind `CTX_GRAPH_COMMUNITY=cnm` β it's deterministic but O(nΒ²) on
dense graphs and hangs on the live 13K-node dataset (~50min run was
killed on 2026-04-27 inside the priority-queue siftup). Louvain is
the default because it finishes in seconds and produces equivalent
quality clusters for the recommendation use case.
## Querying the graph
### Via the dashboard
```bash
ctx-monitor serve # http://127.0.0.1:8765
```
Then open `/graph?slug=<entity-slug>&type=<entity-type>` for the
dashboard's interactive SVG neighborhood view, or
`/api/graph/<slug>.json?type=<entity-type>&hops=1&limit=40` for the
dashboard-shaped JSON. The `type` query is optional for unique slugs and
recommended for duplicate slugs such as `langgraph`. See the
[dashboard reference](dashboard.md) for the full route catalogue.
### Via Python
```python
import json
from pathlib import Path
from networkx.readwrite import node_link_graph
raw = json.loads(
Path("~/.claude/skill-wiki/graphify-out/graph.json").expanduser().read_text()
)
edges_key = "links" if "links" in raw else "edges"
G = node_link_graph(raw, edges=edges_key)
# 79,958 nodes, 1,778,069 edges
print(G.number_of_nodes(), G.number_of_edges())
# Find entities related to 'fastapi-pro' by edge weight
seed = "skill:fastapi-pro"
neighbors = sorted(
G.neighbors(seed),
key=lambda n: G[seed][n]["weight"],
reverse=True,
)[:10]
for n in neighbors:
shared = G[seed][n].get("shared_tags", [])
print(f" w={G[seed][n]['weight']:>2} {G.nodes[n]['label']:<40} {shared[:3]}")
```
The node-link JSON schema's edges key is auto-detected (legacy
NetworkX 2.x used `"links"`; current versions default to `"edges"`).
The helper `resolve_graph.load_graph()` does this for you.
### Via recommendation paths
The graph backs two recommendation paths:
- Execution recommendation surfaces (`ctx.recommend_bundle`, MCP
`ctx__recommend_bundle`, generic harness tools, Claude Code hook
suggestions, and repo-scan advisory output) share
`ctx.core.resolve.recommendations.recommend_by_tags` for skills,
agents, and MCP servers. That engine ranks candidates by
slug-token matches, tag overlap, graph degree, and semantic-cache
signals when available. Imported skill results are normal `skill` nodes with
detail URLs, install commands, duplicate
hints, gated micro-skill loaders when over the line threshold, and
quality/security metadata. If an older
extracted wiki has the skill index JSON but no graph nodes for
those records, the same recommender falls back to the index file.
- Harness recommendations are a separate path for custom/API/local
model onboarding (`ctx-init --model-mode custom ...`) and
`ctx-harness-install`. They use the same graph filtered to
`harness` nodes and the higher harness match floor from `config.json`.
- Repository scans still start from stack detections, then turn that profile
into the same tag/query bundle used by the execution recommender. If a
shipped graph is unavailable, scan output falls back to the legacy installed
skill resolver so a plain profile scan remains useful. Harnesses are
intentionally not emitted from repo scans or Claude Code hook bundles.
This split is intentional: execution surfaces need identical ranking and a
small top-K, while harness choice changes the model runtime itself and belongs
in an explicit onboarding/install flow.
## Modular graph and wiki packs
ctx is moving the graph and LLM-wiki from "one huge tarball is the mutable
truth" to this operational model:
```text
immutable base graph pack
+ small graph overlay packs
+ tombstones for deletes
+ immutable base wiki pack
+ small wiki overlay packs
-> merged reader for dashboard, search, recommendations, and repair tools
```
The user-facing behavior is intended to stay the same: recommendations,
dashboard graph views, wiki pages, and harness setup still read the merged
graph/wiki as one catalog. The difference is how updates are stored. Adding or
updating one skill, agent, MCP server, or harness can write a small overlay pack
instead of forcing a full graph and wiki rebuild.
This makes normal updates faster and safer:
- **Faster adds/updates** - the worker writes a page overlay and, when the
vector index exists, attaches ANN-based graph edges for the changed entity.
- **Efficient deletes** - deletes become tombstones in graph/wiki overlays, so
removed entities disappear from merged reads without rewriting the base pack.
- **Crash isolation** - overlay packs and compaction manifests are written
atomically; a failed update does not corrupt the base snapshot.
- **Same recommendation contract** - merged graph reads preserve edge metadata,
scores, provenance, and tombstones before resolver/dashboard code sees the
graph.
Pack directories live inside the installed wiki:
```text
~/.claude/skill-wiki/
graphify-out/packs/base-<export-id>/
graphify-out/packs/overlay-<id>/
wiki-packs/base-<export-id>/
wiki-packs/overlay-<id>/
```
Overlay packs are the normal local-update path. Full rebuilds are still needed
when you intentionally refresh the release artifact, change global scoring
configuration, rebuild communities, or compact a long chain of overlays into a
new base snapshot.
To stage a coordinated graph+wiki compaction without mutating the active wiki:
```bash
ctx-pack-compact compact \
--wiki-path ~/.claude/skill-wiki \
--base-export-id <new-export-id> \
--staging-dir /tmp/ctx-pack-stage \
--json
```
The staging directory contains a top-level `pack-compaction-manifest.json` plus
staged graph and wiki base packs. Promotion validates the top-level manifest,
graph pack manifest, wiki pack manifest, checksums, matching export IDs, and
graph/wiki entity consistency before replacing active packs:
```bash
ctx-pack-compact validate \
--staged-graph-packs-dir /tmp/ctx-pack-stage/graph-packs \
--staged-wiki-packs-dir /tmp/ctx-pack-stage/wiki-packs \
--require-compaction-manifest \
--json
```
```bash
ctx-pack-compact promote \
--wiki-path ~/.claude/skill-wiki \
--staged-graph-packs-dir /tmp/ctx-pack-stage/graph-packs \
--staged-wiki-packs-dir /tmp/ctx-pack-stage/wiki-packs \
--json
```
Promotion refreshes the SQLite dashboard/recommendation graph store by default.
Use `--graph-store-db <path>` to refresh a non-default store, or
`--no-graph-store-refresh` only when you plan to rebuild it separately:
```bash
ctx-pack-compact validate \
--wiki-path ~/.claude/skill-wiki \
--json
ctx-graph-store build \
--graph-dir ~/.claude/skill-wiki/graphify-out \
--db ~/.claude/skill-wiki/graphify-out/graph-store.sqlite3
ctx-graph-store validate \
--db ~/.claude/skill-wiki/graphify-out/graph-store.sqlite3
```
### LLM-wiki design references
ctx follows Karpathy's LLM-wiki pattern. We also reviewed
[`nashsu/llm_wiki`](https://github.com/nashsu/llm_wiki) as a design reference
for source traceability, persistent ingest queues, graph insights, and
budgeted token/vector/graph retrieval. That repository is GPLv3, while ctx is
MIT, so ctx can use those ideas as product inspiration but must not copy or
vendor its code or assets.
## Rebuilding
After you add a skill, agent, MCP server, or harness entity page:
```bash
ctx-wiki-worker --wiki ~/.claude/skill-wiki --limit 1
```
The `entity-upsert` worker path validates the queued page hash, updates the
wiki index, and, when a persisted semantic vector index exists, runs a
best-effort ANN attach into `graphify-out/entity-overlays.jsonl`. That overlay
lets the runtime resolver connect a new or updated entity to existing graph
neighbors without recomputing global all-pairs similarity. When modular wiki
packs exist, the same write is mirrored into a wiki overlay pack. The worker
also queues a graph-store refresh so dashboard and resolver reads see the merged
view. The entity markdown page or wiki page overlay remains the source of truth.
For manual review or debugging:
```bash
ctx-incremental-attach calibrate \
--graph ~/.claude/skill-wiki/graphify-out/graph.json
ctx-incremental-attach attach \
--index-dir ~/.claude/skill-wiki/.embedding-cache/graph/vector-index \
--overlay ~/.claude/skill-wiki/graphify-out/entity-overlays.jsonl \
--node-id skill:fastapi-review \
--type skill \
--label fastapi-review \
--text-file ~/.claude/skill-wiki/entities/skills/fastapi-review.md \
--dry-run
```
Shadow-gate a persisted index before trusting a new ANN backend, changed
thresholds, or a large attach workflow:
```bash
ctx-incremental-shadow \
--index-dir ~/.claude/skill-wiki/.embedding-cache/graph/vector-index \
--graph ~/.claude/skill-wiki/graphify-out/graph.json \
--sample-size 100 \
--min-overlap 0.85
```
The shadow command pretends sampled existing nodes are new, compares the
incremental attach result to batch graph semantic neighbors, and reports
precision, recall, top-5/top-10/top-20 agreement, score deltas, and bad
examples. A failing gate means either tune thresholds or use a full graph
rebuild before shipping.
If the vector index is missing, rebuild it without repacking artifacts:
```bash
ctx-wiki-graphify \
--wiki-dir ~/.claude/skill-wiki \
--incremental \
--graph-only \
--semantic-vector-index numpy-flat
```
Then drain pending entity-upsert work with `ctx-wiki-worker --wiki
~/.claude/skill-wiki`. This is the current repair path for "build index" and
"attach pending" without adding another command surface.
Before publishing graph artifacts, run the full rebuild/export path:
```bash
ctx-wiki-graphify # rebuild entity graph + communities
```
The pre-commit hook (`.githooks/pre-commit`) does **not** rebuild or
repack graph artifacts from `~/.claude/skill-wiki/`; that local wiki can
contain private entities. It refreshes cheap README stats when relevant
checked-in files are staged and warns when entity sources changed. Run
`ctx-wiki-graphify`, validate, repack, and stage the artifacts explicitly
for skill, agent, MCP server, or harness releases.
Graphify exports stage and validate each generated artifact before atomic
promotion. `graph.json`, `graph-delta.json`, `communities.json`,
`graph-report.md`, and `graph-export-manifest.json` each get a sibling
`*.promotion.json` file with candidate, current, and `last_good` hashes plus
rollback metadata. The manifest is promoted last, so a crash between artifact
promotion and manifest promotion is detected as an incomplete export and the
next run rebuilds instead of trusting mixed graph files.
## Current artifact record
This page is intentionally current-state only. Older graph sizes made the public
page look stale even when the headline table was correct, so historical refresh
notes live in `CHANGELOG.md` instead of being repeated here.
The shipped artifact currently records **79,958 nodes**, **1,778,069 edges**,
**52 Louvain communities**, **1,088,763 semantic edges**, **474,837 tag edges**,
and **280,275 slug-token edges**. The current build is fully reproducible from
the wiki content and the checked-in graph build configuration.
## Pre-ship gates
Two advisory gates run before the tarball is repackaged. Both produce
review reports and never auto-modify the inventory.
- **`ctx-dedup-check`** β flags entity pairs (skill β skill, skill β
agent, skill β MCP, agent β agent, agent β MCP, MCP β MCP) at or
above 0.85 cosine similarity. Incremental: keeps a `dedup-state.json`
next to the embedding cache, so follow-up runs only re-check pairs
involving entities whose content changed. Allowlist support via
`.dedup-allowlist.txt`. The current snapshot has 15,976 findings,
most of which are within-MCP near-duplicates (multiple wrappers
around the same upstream service).
- **`ctx-tag-backfill`** β finds skills/agents with empty `tags:`
frontmatter and proposes a backfill drawn from slug tokens, body
keywords, and the existing tag vocabulary. Report-only by default;
pass `--apply` to write. Backfills are additive only.
- **`ctx-skillspector-audit`** β runs a static `--no-llm` check over
shipped skill bodies with
[NVIDIA SkillSpector](https://github.com/NVIDIA/SkillSpector), writes
`graph/skillspector-audit.jsonl.gz`, and stamps skill entity pages in the
full wiki artifact. This is a ctx-run check, not NVIDIA endorsement or
certification.
|