Spaces:

Bani57
/

website

Sleeping

Andrej Janchevski commited on Apr 14

Commit

acde928

1 Parent(s): c50d15c

docs(plan): add kg-anomaly correction inference endpoints plan

Plan for POST /kg-anomaly/correct and POST /kg-anomaly/continue wrapping
DiscreteDenoisingDiffusionKG. Documents model-loading strategy with
reconstructed dataset_infos (KG checkpoints only pickle cfg via
save_hyperparameters('cfg'), unlike MultiProxAn), SSE streaming for both
standard and multiprox modes, change detection, and KG subgraph rendering.

Files changed (1) hide show

.claude/plans/backend_kg_anomaly.md +261 -0

.claude/plans/backend_kg_anomaly.md ADDED Viewed

	@@ -0,0 +1,261 @@

+# KG Anomaly Correction Inference Endpoints
+Implement `POST /kg-anomaly/correct` and `POST /kg-anomaly/continue` -- the final two inference endpoints.
+These wrap the DiGress KG diffusion model (`DiscreteDenoisingDiffusionKG`) for correcting/generating knowledge graph subgraph edges, following the same architecture as the MultiProxAn graph generation endpoints.
+## Context
+Discovery endpoints (`GET /kg-anomaly/datasets`, `GET /kg-anomaly/datasets/{id}/sample-subgraphs`) are already implemented. Checkpoint scanning is already in `registry.py`. The two remaining endpoints are the core inference: standard denoising and MultiProx step-by-step Gibbs sampling, both using SSE streaming (updated from the original JSON spec per user request).
+The MultiProxAn graph generation endpoints (`graphgen_inference.py`, `graph_generation.py` views, `registry.py` methods) are the direct template. Key differences for KG anomaly:
+- Model class: `DiscreteDenoisingDiffusionKG` (not `DiscreteDenoisingDiffusion`)
+- Inputs: subgraph nodes/edges from API (not generated from noise) -- X is given, only E is diffused
+- Extra model args: `X_index` (entity IDs), `inpaint_mask` (which edges to correct)
+- Task modes: "correct" (inpaint masked edges) vs "generate" (regenerate all edges)
+- Checkpoint only saves `cfg` via `save_hyperparameters('cfg')` -- must reconstruct `dataset_infos` from a COINs experiment + state_dict shapes
+- Response includes `changes` diff (before/after edge comparison)
+## Assumptions and Constraints
+- Checkpoints: `{dataset_id}.ckpt` (generate) and `{dataset_id}_correct.ckpt` (correct) in `src/research/COINs-KGGeneration/graph_generation/checkpoints/`
+- Only discrete KG models (no continuous variant for KG anomaly)
+- `loader.communities` (numpy array, entity_id -> community_id) is available on the lightweight Loader (not freed)
+- COINs experiment for the same dataset must load successfully (needed for kg_experiment in model constructor)
+- E tensor uses class 0 for "no edge", classes 1..N for actual relation types; API `relation_id` is 0-indexed for actual relations: `E_class = relation_id + 1`
+- `graph_generation/src` must be added to `sys.path` for bare imports (`from diffusion.noise_schedule import ...`, `from metrics.abstract_metrics import ...`)
+## Scope
+**Included:**
+- `POST /kg-anomaly/correct` (standard + multiprox modes, both SSE streaming)
+- `POST /kg-anomaly/continue` (multiprox continuation, SSE streaming)
+- KG subgraph rendering (PIL + networkx, directed graph with entity/relation labels, color-coded changes)
+- Change detection (before/after edge diff)
+- Model loading with reconstructed dataset_infos
+- sys.path update for `graph_generation/src`
+- API spec update (`docs/api.yaml`) to reflect SSE streaming
+- Postman collection update with auto-chaining test scripts
+- Backend README update
+**Excluded:**
+- Frontend implementation
+- Continuous model variant (only discrete exists for KG)
+## Design
+### Model Loading Strategy
+The checkpoint's `hyper_parameters` only contains `cfg` (from `save_hyperparameters('cfg')`). We must reconstruct all other constructor args:
+1. **Load checkpoint to CPU** with DDP patching (same as `_safe_load_lightning_checkpoint`)
+2. **Extract `cfg`** from `hyper_parameters`
+3. **Infer dims from state_dict shapes:**
+   - `Xdim_output = ckpt["state_dict"]["transition_model.u_x"].shape[1]`
+   - `Edim_output = ckpt["state_dict"]["transition_model.u_e"].shape[1]`
+   - `input_dims['X'] = ckpt["state_dict"]["model.mlp_in_X.0.weight"].shape[1]`
+   - `input_dims['E'] = ckpt["state_dict"]["model.mlp_in_E.0.weight"].shape[1]`
+   - `input_dims['y'] = ckpt["state_dict"]["model.mlp_in_y.0.weight"].shape[1]`
+   - `output_dims = {'X': Xdim_output, 'E': Edim_output, 'y': 0}`
+4. **Extract marginal distributions** from state_dict buffers:
+   - `node_types = ckpt["state_dict"]["transition_model.u_x"].squeeze(0)`
+   - `edge_types = ckpt["state_dict"]["transition_model.u_e"].squeeze(0)`
+5. **Load COINs experiment** via `_load_coins_experiment(dataset_id, "transe")` for the `kg_experiment`
+6. **Build mock `dataset_infos`** with dims, distributions, `nodes_dist`, and `datamodule.kg_experiment`
+7. **Build `extra_features`** from `cfg.model.extra_features` (ExtraFeatures or DummyExtraFeatures)
+8. **Construct model**, patch `save_hyperparameters` to no-op, load state_dict, move to device, eval mode
+### Input Subgraph -> Tensor Conversion
+`build_kg_tensors(subgraph, dataset_id, loader, model)` -> `(X_given, E_given, y_given, X_index, X_c, n_nodes, is_bip, node_mask)`
+- **X_given** `(1, n, Xdim_output)`: one-hot from `type_id`. `X[0, i, type_id] = 1.0`
+- **E_given** `(1, n, n, Edim_output)`: init all to class 0 (no edge). For each edge: `E[0, src, tgt, 0] = 0; E[0, src, tgt, relation_id + 1] = 1.0`
+- **y_given** `(1, 0)`: empty tensor (consistent with `kg_dataset.py` line 73)
+- **X_index** `(1, n)`: entity IDs from `nodes[i]["entity_id"]`
+- **X_c** `(1, n)`: `loader.communities[entity_id]` for each node
+- **n_nodes** `tensor([n])`, **is_bip** `tensor([n > 20])`, **node_mask** `ones(1, n, dtype=bool)`
+### Change Detection
+`compute_changes(original_E_int, corrected_E_int, num_nodes, loader)` -> `{"edges": [...], "summary": {...}}`
+For each directed pair `(i, j)` where `i != j`:
+- `orig = original_E_int[i, j]`, `corr = corrected_E_int[i, j]` (integer class indices)
+- Both 0: skip (no edge in either)
+- Same nonzero: `"unchanged"`
+- orig=0, corr>0: `"added"`
+- orig>0, corr=0: `"removed"` (with `original_relation_id = orig - 1`)
+- Both nonzero, different: `"modified"` (with `original_relation_id = orig - 1`)
+Relation names resolved via `loader.dataset.get_inverted_name_maps()`. API `relation_id = E_class - 1`.
+### KG Subgraph Rendering
+`render_kg_subgraph(X_int, E_int, num_nodes, X_index, dataset_id, loader, changes=None)` -> PIL.Image
+- PIL + networkx `spring_layout` (no matplotlib, same pattern as `_render_comm20`)
+- Directed graph (`nx.DiGraph`), directed arrows for edges
+- Node labels: entity names from `inv_nodes[entity_id]` (truncated)
+- Edge labels: relation names from `inv_relations[relation_id]`
+- Image size: 500x500
+- Color coding when `changes` provided:
+  - unchanged: gray `#888888`
+  - modified: orange `#e67e22`
+  - added: green `#27ae60`
+  - removed: red dashed `#e74c3c`
+### SSE Streaming (updated from original JSON spec)
+Both endpoints use SSE streaming like graph generation, with `event: progress`, `event: preview`, and `event: result`. The `result` event for standard mode includes `original_image`, `corrected_image`, `chain_gif`, `changes`. For multiprox: `step`, `image`, `state`, `changes`.
+### State Blob (MultiProx)
+Keys:
+```
+X_given, E, y, n_nodes, dataset_id, task, X_index, X_c, is_bip,
+original_E_int, T, n, m, t, t_prime, gibbs_chain_freq, inner_step, step
+```
+Note: `X_given` (not `X`) because node features are fixed. `E` is the `(1, M, n, n, Edim)` ensemble tensor.
+## Implementation Steps
+### Step 1: Add `graph_generation/src` to sys.path
+**File:** `src/backend/research_api/settings.py`
+Add `_DIGRESS_KG_SRC = str(PROJECT_ROOT / "src" / "research" / "COINs-KGGeneration" / "graph_generation" / "src")` to the sys.path block. Required for bare imports in `diffusion_model_discrete_kg.py`:
+- `from diffusion.noise_schedule import ...`
+- `from metrics.abstract_metrics import ...`
+### Step 2: Create `src/backend/api/services/kg_anomaly_inference.py`
+New file mirroring `graphgen_inference.py`. Contains:
+- `build_kg_tensors()` -- input subgraph -> tensors
+- `compute_changes()` -- before/after edge diff
+- `render_kg_subgraph()` -- PIL + networkx directed graph rendering
+- `run_standard_correction()` -- generator yielding progress/preview/result events
+- `run_multiprox_correction_init()` -- generator yielding noise init progress + result with state
+- `run_multiprox_correction_step()` -- generator yielding gibbs + refine progress + result with updated state
+- `encode_state_blob()` / `decode_state_blob()` -- same pattern as graphgen
+- `_pil_to_b64()` / `_frames_to_gif_b64()` -- import from graphgen_inference (reuse)
+Standard correction generator logic (following research code `sample_batch`):
+```
+1. Move tensors to device
+2. Build node_mask, inpaint_mask (via get_inpaint_mask for "correct" task, all-ones for "generate")
+3. Sample initial noise for E only; apply inpaint mask: z_T.E * mask + E_given * (~mask)
+4. Set X = X_given (nodes are given, not diffused)
+5. Denoising loop s_int from T-1 to 0:
+   - model.sample_p_zs_given_zt(s, t, X, E, y, X_index, node_mask, inpaint_mask)
+   - yield progress + preview at frame intervals
+6. Collapse final: PlaceHolder(X, E, y).mask(node_mask, collapse=True)
+7. Compute changes, render original + corrected images, build GIF
+8. yield result event
+```
+MultiProx init/step follows same structure as graphgen but with:
+- X_index and inpaint_mask passed to `sample_p_zs_given_zt`
+- `model.apply_noise(..., inpaint_mask, gibbs=True)` for re-noising
+- Changes computed at each step for the result event
+### Step 3: Add model loading to `src/backend/api/services/registry.py`
+Add to `ModelRegistry.__init__`:
+```python
+self._kg_anomaly_models = {}  # (dataset_id, task) -> loaded eval-mode model
+```
+Add `_load_kg_anomaly_model(self, dataset_id, task)`:
+1. Check cache `_kg_anomaly_models[(dataset_id, task)]`
+2. Determine ckpt path: `DIGRESS_KG_DIR / checkpoints / {dataset_id}[_correct].ckpt`
+3. Load checkpoint to CPU with DDP patching
+4. Extract `cfg` from `hyper_parameters`
+5. Infer dims from state_dict shapes (transition_model buffers, model MLP weights)
+6. Load COINs experiment via `self._load_coins_experiment(dataset_id, "transe")`
+7. Build mock dataset_infos (MockDataModule with kg_experiment, MockDatasetInfos with dims/distributions)
+8. Build ExtraFeatures from cfg
+9. Construct `DiscreteDenoisingDiffusionKG(cfg, dataset_infos, None, None, None, extra_features, domain_features)`
+10. Load state_dict (strict=False), move to device, eval mode
+11. Cache and return
+Add `kg_anomaly_correct_stream(self, ...)`:
+- Acquire inference lock eagerly
+- Load model, build tensors from subgraph
+- Return generator (standard or multiprox init), with lock released in finally block
+Add `kg_anomaly_continue_stream(self, state_b64)`:
+- Decode state blob eagerly, acquire lock, load model
+- Return generator wrapping `run_multiprox_correction_step`, with lock released in finally block
+### Step 4: Add views to `src/backend/api/views/kg_anomaly.py`
+Add `KgAnomalyCorrectView`:
+- Validate: dataset_id, sampling_mode, task (default "correct"), subgraph (nodes 2-20, edge indices valid), checkpoint availability
+- Map API task "correct" -> model task "inpaint", "generate" -> "generate"
+- For standard: clamp diffusion_steps [50, 1000], chain_frames [10, 30]
+- For multiprox: validate m, t, t_prime, gibbs_chain_freq (same rules as GraphGenGenerateView)
+- Return `_streaming_sse_response(gen)` (import SSE helpers from graph_generation views, or duplicate)
+Add `KgAnomalyContinueView`:
+- Validate state string
+- Return `_streaming_sse_response(gen)`
+### Step 5: Update `src/backend/api/urls.py`
+Add:
+```python
+path("kg-anomaly/correct", KgAnomalyCorrectView.as_view()),
+path("kg-anomaly/continue", KgAnomalyContinueView.as_view()),
+```
+### Step 6: Update `docs/api.yaml`
+Change `/kg-anomaly/correct` and `/kg-anomaly/continue` response schemas from regular JSON to SSE streaming (text/event-stream), matching the graph-generation endpoint pattern. Add SSE event schemas for progress, preview, and result events.
+### Step 7: Update `docs/postman/collection.json`
+Add auto-chaining test scripts to the multiprox correct request (parse SSE result event, save state to `{{multiprox_state}}`). Update continue request to use `{{multiprox_state}}`.
+### Step 8: Update `src/backend/README.md`
+Add `/kg-anomaly/correct` and `/kg-anomaly/continue` to the endpoint table, mark as streaming SSE.
+## Critical Files
+| File | Action | Purpose |
+|------|--------|---------|
+| `src/backend/research_api/settings.py` | Modify | Add `graph_generation/src` to sys.path |
+| `src/backend/api/services/kg_anomaly_inference.py` | Create | Core inference, rendering, change detection |
+| `src/backend/api/services/registry.py` | Modify | Model loading, stream orchestration, lock management |
+| `src/backend/api/views/kg_anomaly.py` | Modify | Add CorrectView and ContinueView |
+| `src/backend/api/urls.py` | Modify | Wire new routes |
+| `docs/api.yaml` | Modify | SSE streaming for kg-anomaly endpoints |
+| `docs/postman/collection.json` | Modify | Auto-chaining for multiprox |
+| `src/backend/README.md` | Modify | Endpoint documentation |
+## Research Code Referenced (read-only)
+| File | What we use |
+|------|-------------|
+| `src/research/COINs-KGGeneration/graph_generation/src/diffusion_model_discrete_kg.py` | `DiscreteDenoisingDiffusionKG` class, `sample_p_zs_given_zt`, `apply_noise` signatures |
+| `src/research/COINs-KGGeneration/graph_generation/src/utils.py` | `get_inpaint_mask`, `PlaceHolder` |
+| `src/research/COINs-KGGeneration/graph_generation/src/diffusion/diffusion_utils.py` | `sample_discrete_feature_noise` |
+| `src/research/COINs-KGGeneration/graph_generation/src/diffusion/extra_features.py` | `ExtraFeatures`, `DummyExtraFeatures` |
+| `src/research/COINs-KGGeneration/graph_generation/src/diffusion/distributions.py` | `DistributionNodes` |
+| `src/research/COINs-KGGeneration/graph_generation/src/datasets/abstract_dataset.py` | `AbstractDatasetInfos` for mock pattern |
+| `src/research/COINs-KGGeneration/graph_generation/src/main.py` | Reference loading sequence (lines 82-138) |
+| `src/research/COINs-KGGeneration/graph_completion/graphs/load_graph.py` | `loader.communities` array |
+## Verification
+1. `python manage.py runserver 8000` boots without errors (model scanning detects KG anomaly checkpoints)
+2. `GET /kg-anomaly/datasets` returns `available_tasks: ["generate", "correct"]` for each dataset
+3. Standard correct (wordnet, correct task): returns SSE stream with progress, preview frames, and result with `original_image`, `corrected_image`, `chain_gif`, `changes`
+4. Standard generate (wordnet, generate task): same but inpaint_mask is all-ones (all edges regenerated)
+5. MultiProx init (wordnet): returns SSE result with `state` blob, `step: 0`, `changes`
+6. MultiProx continue: state round-trips correctly, `step` increments, `round_complete`/`done` flags work
+7. Inference lock: second concurrent request returns 429 INFERENCE_BUSY
+8. Postman collection: all 4 KG anomaly inference requests pass with auto-chaining
+9. All 3 datasets (freebase, wordnet, nell) x 2 tasks (correct, generate) work