Spaces:

Bani57
/

website

Sleeping

Andrej Janchevski commited on Apr 13

Commit

c50d15c

1 Parent(s): c1b3cc7

docs(postman): add thesis-optimal multiprox params and force-unlock endpoint

- QM9 multiprox: t=0.5, t'=0.004 (Table 4.3.1 best valid config)
- Comm20 multiprox: t=0.4, t'=0.1 (Table C.2.1 best orbit/degree)
- Add POST /debug/force-unlock request to Health folder
- Auto-chaining test scripts for multiprox state between init/continue
- Update README with SSE protocol and new endpoints
- Update backend_multiproxan plan to reflect current design decisions

Files changed (3) hide show

.claude/plans/backend_multiproxan.md +233 -350
docs/postman/collection.json +180 -17
src/backend/README.md +31 -3

.claude/plans/backend_multiproxan.md CHANGED Viewed

@@ -13,14 +13,19 @@ The key challenge is that `sample_batch` / `sample_batch_gibbs` write to disk vi
 | `src/research/MultiProxAn/src/diffusion_model_discrete.py` | Strip all `wandb` imports and calls |
 | `src/research/MultiProxAn/src/diffusion_model.py` | Strip all `wandb` imports and calls (incl. one unguarded call) |
 | `src/research/MultiProxAn/src/utils.py` | Remove `import wandb` and `setup_wandb()` function |
 | `src/backend/research_api/settings.py` | Add `MultiProxAn/src/` to `sys.path` |
-| `src/backend/api/services/graphgen_inference.py` | **New file** — all inference logic + rendering |
-| `src/backend/api/services/registry.py` | Add `_graphgen_models` cache, `_load_graphgen_model`, `graphgen_generate`, `graphgen_continue` |
-| `src/backend/api/views/graph_generation.py` | Add `GraphGenGenerateView`, `GraphGenContinueView` |
-| `src/backend/api/urls.py` | Wire 2 new routes + update import |
-| `src/backend/requirements.txt` | Add `wandb`, `Pillow`, `overrides` |
-| `src/backend/README.md` | Mark generate/continue as implemented |
-| `docs/postman/collection.json` | Add 12 example requests |
 ## Model Differences: Discrete vs Continuous
@@ -30,60 +35,164 @@ The key challenge is that `sample_batch` / `sample_batch_gibbs` write to disk vi
 |---|---|---|
 | **`node_mask` dtype** | `bool` | `float32` (`.float()` required) |
 | **Initial noise** | `sample_discrete_feature_noise(limit_dist=model.limit_dist, node_mask=node_mask)` | `sample_feature_noise(X_size=(1,n,Xdim), E_size=(1,n,n,Edim), y_size=(1,ydim), node_mask=node_mask)` |
-| **`sample_p_zs_given_zt` returns** | `(sampled_s, discrete_sampled_s)` — 2-tuple; `discrete_sampled_s.X/E` are already collapsed integers | single `z_s` PlaceHolder with continuous floats |
-| **Chain frame render** | use `discrete_sampled_s.X/E` directly | `utils.unnormalize(z_s.X, z_s.E, z_s.y, model.norm_values, model.norm_biases, node_mask, collapse=True)` → `.X/.E` are integers |
-| **Final graph collapse** | `PlaceHolder(X=X, E=E, y=y).mask(node_mask, collapse=True)` | `model.sample_discrete_graph_given_z0(X, E, y, node_mask)` (runs another forward pass + unnormalize) |
 | **Gibbs ensemble aggregation** | `torch.median(X, dim=1).values` | `torch.mean(X, dim=1)` |
 The refinement loop in `run_multiprox_step` uses `sample_p_zs_given_zt` to update `cur_X/E/y` at each step; only the first element of the tuple (or the single return value) is needed there. `_collapse_final` is called once at the end for rendering.
 ## Research Code Reused (read-only)
 | Symbol | Location | How used |
 |---|---|---|
-| `DiscreteDenoisingDiffusion` | `src/diffusion_model_discrete.py` | Loaded via `load_from_checkpoint` for `model_type=discrete` |
-| `LiftedDenoisingDiffusion` | `src/diffusion_model.py` | Loaded via `load_from_checkpoint` for `model_type=continuous` |
 | `model.node_dist.sample_n(1, device)` | both models | Sample number of nodes |
-| `diffusion_utils.sample_discrete_feature_noise(limit_dist, node_mask)` | `src/diffusion/diffusion_utils.py` | Initial noise — discrete only |
-| `diffusion_utils.sample_feature_noise(X_size, E_size, y_size, node_mask)` | `src/diffusion/diffusion_utils.py` | Initial noise — continuous only |
-| `model.sample_p_zs_given_zt(s, t, X, E, y, node_mask)` | both models | One denoising step (return varies by model type — see table above) |
 | `model.apply_noise(X, E, y, node_mask, gibbs=True)` | both models | Re-apply Gibbs noise; uses `model.gibbs_fixed_t_2` internally |
-| `PlaceHolder.mask(node_mask, collapse=True)` | `src/utils.py` | Final collapse — discrete only |
-| `model.sample_discrete_graph_given_z0(X, E, y, node_mask)` | `diffusion_model.py` | Final collapse — continuous only |
-| `utils.unnormalize(X, E, y, norm_values, norm_biases, node_mask, collapse=True)` | `src/utils.py` | Chain frame rendering — continuous only |
-| `model.norm_values`, `model.norm_biases` | `diffusion_model.py` | Unnormalization factors — continuous only |
 ## Implementation
-### Step 0: Patch research code — remove `wandb`
-`wandb` is imported at module level in three files; the import alone causes `ImportError` at
-`load_from_checkpoint` time. One call in `diffusion_model.py` is **unguarded** (no `if wandb.run:`
-check) and would crash at inference even if the import somehow succeeded.
 **`src/research/MultiProxAn/src/diffusion_model_discrete.py`**:
 - Remove `import wandb` (line 9)
-- Remove the two `utils.setup_wandb(self.cfg)` call sites (inside `on_train_start` and `on_test_start`)
-- Remove all `if wandb.run: wandb.log(...)` / `wandb.run.summary[...]` blocks (5 blocks, all guarded — safe to delete entirely)
 **`src/research/MultiProxAn/src/diffusion_model.py`**:
 - Remove `import wandb` (line 9)
 - Remove the two `utils.setup_wandb(self.cfg)` call sites
-- Remove all guarded `if wandb.run: wandb.log(...)` blocks (5 blocks)
-- Remove the **unguarded** `wandb.log({...})` call at line 590 (inside `compute_val_loss` / `compute_test_loss`); the surrounding NLL computation and `return` must be preserved — only the `wandb.log(...)` statement is deleted
 **`src/research/MultiProxAn/src/utils.py`**:
-- Remove `import wandb` (line 7)
-- Remove the entire `setup_wandb(cfg)` function (lines 134–139)
-After these edits `wandb` is no longer needed and must be removed from `requirements.txt` (Step 6).
 ---
-### Step 1: `settings.py` — sys.path fix
-The research code uses two import styles simultaneously (`from diffusion.noise_schedule import ...` needs `MultiProxAn/src/` on path; `from src import utils` needs `MultiProxAn/` on path). Extend the existing loop at lines 9–13:
 ```python
 _MULTIPROXAN_SRC = str(PROJECT_ROOT / "src" / "research" / "MultiProxAn" / "src")
 for _path in (_COINS_KG_ROOT, _MULTIPROXAN_ROOT, _MULTIPROXAN_SRC):
@@ -95,108 +204,48 @@ for _path in (_COINS_KG_ROOT, _MULTIPROXAN_ROOT, _MULTIPROXAN_SRC):
 ### Step 2: New `graphgen_inference.py`
-Completely independent of Django/registry. Receives a loaded model and returns bytes/dicts.
 **Module structure:**
 ```
 graphgen_inference.py
-  ├── Constants: QM9_ATOM_TYPES, STATE_BLOB_MAX_BYTES, REQUIRED_STATE_KEYS
-  │
-  ├── # Model-type helpers (called by all three main functions)
-  ├── _is_discrete(model) → bool
-  ├── _build_node_mask(n_nodes, n_max, model) → bool or float32 tensor
-  ├── _sample_initial_noise(model, n_max, node_mask) → PlaceHolder
-  ├── _denoising_step(model, s_t, t_t, X, E, y, node_mask) → (X_soft, E_soft, y_soft, X_int, E_int)
-  ├── _gibbs_aggregate(model, X) → tensor   [median (discrete) or mean (continuous)]
-  ├── _collapse_final(model, X, E, y, node_mask) → (X_int, E_int)
-  │
-  ├── # Main inference functions
-  ├── run_standard_generation(model, num_nodes, diffusion_steps, chain_frames, dataset_id)
-  ├── run_multiprox_init(model, num_nodes, m, t, t_prime, dataset_id)
-  ├── run_multiprox_step(model, state_dict, dataset_id)
-  │
-  ├── # State serialisation
-  ├── encode_state_blob(state_dict) → str
-  ├── decode_state_blob(b64_str) → dict
-  │
-  └── # Visualisation
-      ├── render_graph(X_int, E_int, dataset_id) → PIL.Image
-      ├── _render_qm9(X_int, E_int) → PIL.Image   [RDKit]
-      ├── _render_comm20(X_int, E_int) → PIL.Image  [networkx + matplotlib Agg]
-      ├── _pil_to_b64(img) → str
-      └── _frames_to_gif_b64(frames) → str
 ```
 #### Model-type helpers
-```python
-def _is_discrete(model):
-    from diffusion_model_discrete import DiscreteDenoisingDiffusion
-    return isinstance(model, DiscreteDenoisingDiffusion)
-def _build_node_mask(n_nodes, n_max, model):
-    """bool for discrete, float32 for continuous."""
-    arange = torch.arange(n_max, device=n_nodes.device).unsqueeze(0)
-    mask = arange < n_nodes.unsqueeze(1)       # (1, n_max) bool
-    return mask if _is_discrete(model) else mask.float()
-def _sample_initial_noise(model, n_max, node_mask):
-    """Discrete: sample_discrete_feature_noise. Continuous: sample_feature_noise."""
-    if _is_discrete(model):
-        return diffusion_utils.sample_discrete_feature_noise(
-            limit_dist=model.limit_dist, node_mask=node_mask)
-    else:
-        bs = node_mask.shape[0]
-        return diffusion_utils.sample_feature_noise(
-            X_size=(bs, n_max, model.Xdim_output),
-            E_size=(bs, n_max, n_max, model.Edim_output),
-            y_size=(bs, model.ydim_output),
-            node_mask=node_mask)
-def _denoising_step(model, s_t, t_t, X, E, y, node_mask):
-    """Run one denoising step.
-    Returns (X_soft, E_soft, y_soft, X_int, E_int) where X/E_int are collapsed integer tensors
-    suitable for rendering. X/E_soft are the continuous activations to feed into the next step.
-    """
-    if _is_discrete(model):
-        sampled_s, discrete_s = model.sample_p_zs_given_zt(s_t, t_t, X, E, y, node_mask)
-        return sampled_s.X, sampled_s.E, sampled_s.y, discrete_s.X, discrete_s.E
-    else:
-        z_s = model.sample_p_zs_given_zt(s=s_t, t=t_t, X_t=X, E_t=E, y_t=y, node_mask=node_mask)
-        unnorm = utils.unnormalize(
-            z_s.X, z_s.E, z_s.y,
-            model.norm_values, model.norm_biases, node_mask, collapse=True)
-        return z_s.X, z_s.E, z_s.y, unnorm.X, unnorm.E
-def _gibbs_aggregate(model, X):
-    """Aggregate ensemble: median for discrete, mean for continuous."""
-    if _is_discrete(model):
-        return torch.median(X, dim=1).values
-    else:
-        return torch.mean(X, dim=1)
-def _collapse_final(model, X, E, y, node_mask):
-    """Collapse continuous activations to integer indices for rendering.
-    Discrete: PlaceHolder.mask(collapse=True). Continuous: sample_discrete_graph_given_z0.
-    Returns (X_int, E_int) tensors.
-    """
-    if _is_discrete(model):
-        final = PlaceHolder(X=X, E=E, y=y).mask(node_mask, collapse=True)
-        return final.X, final.E
-    else:
-        final = model.sample_discrete_graph_given_z0(X, E, y, node_mask)
-        return final.X, final.E
-```
 #### `run_standard_generation`
-Re-implements the denoising loop — do NOT call `sample_batch`. Uses `s_idx / diffusion_steps` for normalized time (valid because `sample_p_zs_given_zt` takes normalized floats in [0,1] regardless of `model.T`):
 ```python
 def run_standard_generation(model, num_nodes, diffusion_steps, chain_frames, dataset_id):
@@ -219,121 +268,31 @@ def run_standard_generation(model, num_nodes, diffusion_steps, chain_frames, dat
             step = diffusion_steps - 1 - s_idx
             if step % frame_interval == 0 or s_idx == 0:
                 gif_frames.append(render_graph(X_int[0, :n_max], E_int[0, :n_max, :n_max], dataset_id))
     X_final, E_final = _collapse_final(model, X, E, y, node_mask)
-    image_b64 = _pil_to_b64(render_graph(X_final[0, :n_max], E_final[0, :n_max, :n_max], dataset_id))
-    return image_b64, _frames_to_gif_b64(gif_frames), elapsed_ms
 ```
-#### API parameter → model attribute mapping
 | API param | Model attribute | Role |
 |---|---|---|
 | `t` | `model.gibbs_fixed_t_2` | Fixed noise level for the Gibbs chain; used by `apply_noise(gibbs=True)` and as `fixed_t_norm` in the inner denoising step |
-| `t_prime` | `model.gibbs_fixed_t_1` | Refinement target; `P = int((gibbs_fixed_t_2 − gibbs_fixed_t_1) × T) + 1` steps denoise from `t` down to `t_prime` |
 | `gibbs_chain_freq` | `model.gibbs_chain_freq` | Inner Gibbs steps per `/continue` call (same attribute name in checkpoint) |
 | `m` | `model.gibbs_M` | Ensemble size |
-| `n` | `model.gibbs_N` | Number of outer Gibbs rounds (full sweeps over M); session is complete when `step == n` |
-All parameters are always explicit in the API request — no fallback to checkpoint values. If the user changes any parameter on the frontend, it clears the state and starts a fresh `generate` call. The `apply_noise(gibbs=True)` call reads `self.gibbs_fixed_t_2` directly, so we must set `model.gibbs_fixed_t_2 = t` before calling it (and restore afterwards). This is safe because `_inference_lock` ensures single-threaded access.
 #### `run_multiprox_init`
-Initialises the M-member ensemble; no Gibbs steps run yet. `gibbs_chain_freq` controls how many inner
-Gibbs iterations run per `/continue` call — `⌈M / gibbs_chain_freq⌉` calls complete one outer round.
-Default `gibbs_chain_freq = max(1, m // 10)` (10% of ensemble size):
-```python
-def run_multiprox_init(model, num_nodes, n, m, t, t_prime, gibbs_chain_freq, dataset_id):
-    device = next(model.parameters()).device
-    n_nodes = model.node_dist.sample_n(1, device) if num_nodes is None else torch.tensor([num_nodes], ...)
-    n_max = n_nodes.item()
-    node_mask = _build_node_mask(n_nodes, n_max, model)
-    # Sample M independent initial noise graphs
-    z_samples = [_sample_initial_noise(model, n_max, node_mask) for _ in range(m)]
-    X = torch.stack([z.X for z in z_samples], dim=1)   # (1, M, n_max, Xdim)
-    E = torch.stack([z.E for z in z_samples], dim=1)
-    y = torch.stack([z.y for z in z_samples], dim=1)
-    # Step 0 image: aggregate ensemble → collapse → render
-    agg_X = _gibbs_aggregate(model, X)   # (1, n_max, Xdim)
-    agg_E = _gibbs_aggregate(model, E)
-    agg_y = _gibbs_aggregate(model, y.float())
-    X_int, E_int = _collapse_final(model, agg_X, agg_E, agg_y, node_mask)
-    image_b64 = _pil_to_b64(render_graph(X_int[0, :n_max], E_int[0, :n_max, :n_max], dataset_id))
-    state = {"X": X.cpu(), "E": E.cpu(), "y": y.cpu(), "n_nodes": n_nodes.cpu(),
-             "dataset_id": dataset_id, "model_type": None,  # filled by registry
-             "T": model.T, "n": n, "m": m, "t": t, "t_prime": t_prime,
-             "gibbs_chain_freq": gibbs_chain_freq, "inner_step": 0, "step": 0}
-    return state, image_b64, elapsed_ms
-```
 #### `run_multiprox_step`
-Runs `gibbs_chain_freq` inner Gibbs iterations, advancing `inner_step`. When `inner_step` reaches `m`
-(one full outer round), runs the t→t_prime refinement and resets. On intermediate calls (round
-still in progress), renders the raw ensemble aggregate — no refinement, faster response.
-```python
-def run_multiprox_step(model, state_dict, dataset_id):
-    # Unpack X (1,M,n_max,Xdim), E, y, n_nodes, t, t_prime, T, n, m, gibbs_chain_freq, inner_step, step
-    device = next(model.parameters()).device
-    X, E, y = state["X"].to(device), state["E"].to(device), state["y"].to(device)
-    n_nodes = state["n_nodes"].to(device)
-    n_max = X.shape[2]
-    node_mask = _build_node_mask(n_nodes, n_max, model)
-    fixed_t = t * torch.ones((1,1), dtype=torch.float, device=device)
-    fixed_s = fixed_t - (1.0 / T)
-    # How many inner Gibbs steps to run this call (may be less than gibbs_chain_freq at end of round)
-    steps_this_call = min(gibbs_chain_freq, m - inner_step)
-    with torch.no_grad():
-        for i in range(steps_this_call):
-            k = inner_step + i
-            avg_X = _gibbs_aggregate(model, X)   # (1, n_max, Xdim)
-            avg_E = _gibbs_aggregate(model, E)
-            avg_y = _gibbs_aggregate(model, y.float())
-            denoised_X, denoised_E, denoised_y, _, _ = _denoising_step(
-                model, fixed_s, fixed_t, avg_X, avg_E, avg_y, node_mask)
-            old_t2 = model.gibbs_fixed_t_2
-            model.gibbs_fixed_t_2 = t            # override per-request (lock held by registry)
-            noisy = model.apply_noise(denoised_X, denoised_E, denoised_y, node_mask, gibbs=True)
-            model.gibbs_fixed_t_2 = old_t2
-            X[:, k], E[:, k], y[:, k] = noisy["X_t"], noisy["E_t"], noisy["y_t"]
-        new_inner_step = inner_step + steps_this_call
-        round_complete = new_inner_step >= m
-        if round_complete:
-            new_inner_step = 0
-            new_step = step + 1
-        else:
-            new_step = step
-        done = round_complete and new_step >= n
-        # Refinement pass runs on every call — always produce a clean render.
-        # Uses the current ensemble aggregate (regardless of round_complete).
-        P = int((t - t_prime) * T) + 1
-        cur_X = _gibbs_aggregate(model, X)
-        cur_E = _gibbs_aggregate(model, E)
-        cur_y = _gibbs_aggregate(model, y.float())
-        for j in range(P):
-            s_ref = (t - (j + 1) / T) * torch.ones((1,1), dtype=torch.float, device=device)
-            t_ref = (t - j / T) * torch.ones((1,1), dtype=torch.float, device=device)
-            cur_X, cur_E, cur_y, _, _ = _denoising_step(
-                model, s_ref, t_ref, cur_X, cur_E, cur_y, node_mask)
-        X_int, E_int = _collapse_final(model, cur_X, cur_E, cur_y, node_mask)
-    image_b64 = _pil_to_b64(render_graph(X_int[0, :n_max], E_int[0, :n_max, :n_max], dataset_id))
-    updated_state = {**state_dict, "X": X.cpu(), "E": E.cpu(), "y": y.cpu(),
-                     "step": new_step, "inner_step": new_inner_step}
-    return updated_state, image_b64, round_complete, done, elapsed_ms
-```
-Response includes `round_complete: bool` (full M sweep done) and `done: bool` (`n` outer rounds complete — frontend should stop calling continue).
 **State blob (encode/decode):**
@@ -342,167 +301,88 @@ STATE_BLOB_MAX_BYTES = 10 * 1024 * 1024  # 10 MB
 REQUIRED_STATE_KEYS = {"X", "E", "y", "n_nodes", "dataset_id", "model_type", "T",
                        "n", "m", "t", "t_prime", "gibbs_chain_freq", "inner_step", "step"}
-# encode: torch.save(state_dict, BytesIO) → base64 string
-# decode: base64 → bytes → torch.load(weights_only=True) → validate keys + X.dim()==4, E.dim()==5
-#         Raise ValueError on bad base64, oversized blob, missing keys, wrong tensor dims.
-#         Registry converts ValueError → InvalidRequestError(400).
 ```
-**Visualization:**
-- `_render_qm9`: `RWMol`, `Atom(symbol)` from `QM9_ATOM_TYPES=["C","N","O","F"]`, `AddBond` with `{1:SINGLE, 2:DOUBLE, 3:TRIPLE, 4:AROMATIC}` (index 0 = no bond, skip), `MolToImage(size=(300,300))` → PIL
-- `_render_comm20`: `nx.Graph` from adjacency (edge index > 0 = present), `nx.draw_networkx` with matplotlib Agg backend, `fig.savefig(BytesIO, "png")` → PIL
-- `_pil_to_b64`: `img.save(BytesIO, "PNG")` → `"data:image/png;base64," + b64encode`
-- `_frames_to_gif_b64`: `frames[0].save(BytesIO, "GIF", save_all=True, append_images=..., duration=150, loop=0)` → `"data:image/gif;base64," + b64encode`
 ---
 ### Step 3: `registry.py` additions
-**New instance variable** in `ModelRegistry.__init__`:
 ```python
-self._graphgen_models = {}   # (dataset_id, model_type) -> loaded eval-mode model
 ```
-**`_load_graphgen_model(self, dataset_id, model_type)`** — lazy load with caching:
-```python
-# Defer imports inside method — sys.path is set at app startup, not at Django module import time
-if model_type == "discrete":
-    from diffusion_model_discrete import DiscreteDenoisingDiffusion as cls
-else:
-    from diffusion_model import LiftedDenoisingDiffusion as cls
-suffix = "_c" if model_type == "continuous" else ""
-ckpt_path = Path(settings.MULTIPROXAN_DIR) / "checkpoints" / f"{dataset_id}{suffix}.ckpt"
-model = cls.load_from_checkpoint(
-    str(ckpt_path), map_location="cpu",
-    train_metrics=None, sampling_metrics=None, visualization_tools=None,
-)
-model.eval()
-self._graphgen_models[(dataset_id, model_type)] = model
-```
-**Dataset loader / hyperparameter restoration notes:**
-`save_hyperparameters(ignore=['train_metrics', 'sampling_metrics'])` saves `dataset_infos`,
-`extra_features`, `domain_features`, and `visualization_tools` into the checkpoint. The three `None`
-overrides are sufficient — the remaining hparams are restored from the checkpoint via pickle.
-**Why unpickling is safe without data files:** Python pickle reconstructs objects by restoring
-`__dict__` without running `__init__` and without any file I/O. The pre-computed tensors inside
-`dataset_infos` (`n_nodes`, `node_types`, `edge_types`, `nodes_dist`) all restore correctly. The
-stored datamodule reference is inert at inference time — we never call `train_dataloader()` or any
-method that touches data files.
-**Why file paths resolve correctly on any deployment:** all datamodules compute their data root as:
-```python
-base_path = pathlib.Path(os.path.realpath(__file__)).parents[2]  # → MultiProxAn/
-root_path = os.path.join(base_path, self.datadir)                # → MultiProxAn/data/qm9
-```
-`__file__` is the source file on the current machine, so paths are always relative to the research
-code location. `self.datadir` (a relative string like `"data/qm9"`) pickles and restores as-is.
-No path patching is needed.
-`extra_features` and `domain_features` are lightweight stateless callables with no file I/O —
-they unpickle cleanly.
-**`graphgen_generate(self, dataset_id, model_type, sampling_mode, num_nodes, diffusion_steps, chain_frames, multiprox_params)`**:
-```python
-# Acquire _inference_lock (non-blocking) → raise InferenceBusy() if locked
-# Load model via _load_graphgen_model (may raise ModelUnavailable)
-# standard:  run_standard_generation → return {dataset_id, model_type, sampling_mode, image, chain_gif, inference_time_ms}
-# multiprox: run_multiprox_init → state["model_type"] = model_type → encode_state_blob
-#            → return {step:0, image, state, inference_time_ms}
-# finally: release lock
-```
-**`graphgen_continue(self, state_b64)`**:
-```python
-# decode_state_blob OUTSIDE lock (fail-fast) → raise InvalidRequestError on ValueError
-# Acquire _inference_lock → raise InferenceBusy() if locked
-# Load model via _load_graphgen_model(state["dataset_id"], state["model_type"])
-# run_multiprox_step → encode_state_blob → return {step, image, state, inference_time_ms}
-# finally: release lock
-```
 ---
 ### Step 4: `views/graph_generation.py` additions
-**`GraphGenGenerateView.post`** — validation + dispatch:
 - Require `dataset_id`, `model_type`, `sampling_mode`
-- `dataset_id ∈ GRAPHGEN_DATASETS`, `model_type ∈ {"discrete","continuous"}`, `sampling_mode ∈ {"standard","multiprox"}`
-- Check `registry.graphgen_checkpoints_available.get(dataset_id, [])` contains `model_type` → `ModelUnavailable(503)` if not
 - `diffusion_steps` clamped to [50, 1000], `chain_frames` to [10, 30]
 - `num_nodes`: optional int, validated against `GRAPHGEN_DATASETS[dataset_id]["max_nodes"]`
-- multiprox only: `multiprox_params` required; validate `0 <= t_prime <= t <= 1`, `2 <= m <= 100` (default: `100`), `n >= 1` (default: `10`), `1 <= gibbs_chain_freq <= m` (default: `max(1, m // 10)`)
-- Call `registry.graphgen_generate(...)` → `Response(result)`
-**`GraphGenContinueView.post`** — minimal validation:
 - Require `state` (non-empty string)
-- Call `registry.graphgen_continue(state_b64)` → `Response(result)`
 ---
-### Step 5: `urls.py`
-```python
-# Update import line 12:
-from api.views.graph_generation import (
-    GraphGenDatasetsView, GraphGenSamplingModesView,
-    GraphGenGenerateView, GraphGenContinueView,
-)
-# Add after "graph-generation/sampling-modes" route:
-path("graph-generation/generate", GraphGenGenerateView.as_view()),
-path("graph-generation/continue", GraphGenContinueView.as_view()),
-```
 ---
-### Step 6: `requirements.txt` — new dependencies
-| Package | Why needed | Already present? |
-|---|---|---|
-| `Pillow` | PIL `Image`, `BytesIO` ops in `graphgen_inference.py` rendering | Transitive dep of `matplotlib` and `torchvision`, but not explicit — add for clarity |
-| `overrides` | Pulled in by research code transitively | Present in research `requirements.txt`; not in backend — add to be safe |
-`wandb` is **not** added — it is stripped from the research code in Step 0. Pin `overrides==7.3.1` to match the research repo. Add under a `# MultiProxAn graph generation` comment.
-**Python 3.9 compatibility verdict:** No issues. Research code syntax requires Python ≥ 3.6 only (f-strings; no walrus operator, match statements, or `f"{var=}"` debug syntax). PyTorch 2.0.1 and pytorch-lightning 2.0.4 both officially support Python 3.8–3.11. `torch.load(weights_only=True)` in our state blob decoder is safe on torch 2.0.1 (default is `False`; `True` is supported for plain tensor dicts since torch 1.13). The research code's own `torch.load` calls (checkpoint files, dataset caches) do not use `weights_only` — this is fine for our trusted local files on torch 2.0.1, which does not warn or break on the missing kwarg.
 ---
-### Step 7: README + Postman
-**`README.md`**: Remove "not yet implemented" annotations from generate and continue rows in the endpoint table.
-**`docs/postman/collection.json`**: Add 12 new requests to the graph-generation folder, covering all combinations of dataset × model_type × sampling_mode × endpoint. `continue` applies to multiprox only.
-**Standard / generate (4 requests):**
-1. `Standard QM9 Discrete` — POST generate `{dataset_id:"qm9", model_type:"discrete", sampling_mode:"standard", diffusion_steps:50, chain_frames:10}`
-2. `Standard QM9 Continuous` — POST generate `{dataset_id:"qm9", model_type:"continuous", sampling_mode:"standard", diffusion_steps:50, chain_frames:10}`
-3. `Standard comm20 Discrete` — POST generate `{dataset_id:"comm20", model_type:"discrete", sampling_mode:"standard", diffusion_steps:500, chain_frames:10}`
-4. `Standard comm20 Continuous` — POST generate `{dataset_id:"comm20", model_type:"continuous", sampling_mode:"standard", diffusion_steps:500, chain_frames:10}`
-**MultiProx / generate (4 requests):**
-5. `MultiProx QM9 Discrete Init` — POST generate `{dataset_id:"qm9", model_type:"discrete", sampling_mode:"multiprox", multiprox_params:{n:10, m:100, t:0.5, t_prime:0.1}}`
-6. `MultiProx QM9 Continuous Init` — POST generate `{dataset_id:"qm9", model_type:"continuous", sampling_mode:"multiprox", multiprox_params:{n:10, m:100, t:0.5, t_prime:0.1}}`
-7. `MultiProx comm20 Discrete Init` — POST generate `{dataset_id:"comm20", model_type:"discrete", sampling_mode:"multiprox", multiprox_params:{n:10, m:100, t:0.5, t_prime:0.1}}`
-8. `MultiProx comm20 Continuous Init` — POST generate `{dataset_id:"comm20", model_type:"continuous", sampling_mode:"multiprox", multiprox_params:{n:10, m:100, t:0.5, t_prime:0.1}}`
-**MultiProx / continue (4 requests — one per dataset × model_type):**
-9. `MultiProx QM9 Discrete Continue` — POST continue `{state:"{{graphgen_state_qm9_discrete}}"}`
-10. `MultiProx QM9 Continuous Continue` — POST continue `{state:"{{graphgen_state_qm9_continuous}}"}`
-11. `MultiProx comm20 Discrete Continue` — POST continue `{state:"{{graphgen_state_comm20_discrete}}"}`
-12. `MultiProx comm20 Continuous Continue` — POST continue `{state:"{{graphgen_state_comm20_continuous}}"}`
-Each `continue` request uses a distinct collection variable (set manually from the corresponding init response's `state` field).
 ---
@@ -510,16 +390,19 @@ Each `continue` request uses a distinct collection variable (set manually from t
 | Risk | Mitigation |
 |---|---|
-| `weights_only=True` in `torch.load` blocks primitive scalars | State dict contains only tensors + str/int/float; PyTorch 2.x whitelist covers these. If a version tightens further, fall back to `weights_only=False` with a comment explaining the controlled trust boundary (blob originates from our own server response). |
-| `load_from_checkpoint` raises on pickle restoration | Unlikely — Python pickle reconstructs objects without file I/O and datamodule paths resolve via `__file__` relative to the research code. If it does occur, inspect `ckpt['hyper_parameters']` to identify which object fails to deserialize. |
 | `model.gibbs_fixed_t_2` attribute override | Only safe because `_inference_lock` ensures single-threaded inference. Save/restore pattern is used. |
-| matplotlib not thread-safe | Rendering only called inside `_inference_lock`, so effectively single-threaded. |
 ## Verification
-1. Django shell: `from diffusion_model_discrete import DiscreteDenoisingDiffusion` → no `ModuleNotFoundError`
-2. `GET /graph-generation/datasets` still returns `available_model_types` (regression)
-3. `POST /graph-generation/generate` (standard, comm20, `diffusion_steps=50, chain_frames=10`) → `image` starts with `data:image/png;base64,`, `chain_gif` starts with `data:image/gif;base64,`
-4. `POST /graph-generation/generate` (multiprox, comm20) → `step=0`, `state` is non-empty base64 string
-5. `POST /graph-generation/continue` with `state` from step 4 → `step=1`, different `image`, new `state`
-6. Error paths: unknown `dataset_id` → 400, corrupted `state` → 400, concurrent requests → 429, `t_prime > t` → 400

 | `src/research/MultiProxAn/src/diffusion_model_discrete.py` | Strip all `wandb` imports and calls |
 | `src/research/MultiProxAn/src/diffusion_model.py` | Strip all `wandb` imports and calls (incl. one unguarded call) |
 | `src/research/MultiProxAn/src/utils.py` | Remove `import wandb` and `setup_wandb()` function |
+| `src/research/MultiProxAn/src/analysis/spectre_utils.py` | Wrap `graph_tool`, `pyemd`, `pygsp`, `dist_helper` imports in try/except |
+| `src/research/MultiProxAn/src/metrics/molecular_metrics.py` | Guard optional metric imports with try/except |
+| `src/research/MultiProxAn/src/metrics/molecular_metrics_discrete.py` | Guard optional metric imports with try/except |
+| `src/research/MultiProxAn/src/metrics/train_metrics.py` | Guard optional metric imports with try/except |
 | `src/backend/research_api/settings.py` | Add `MultiProxAn/src/` to `sys.path` |
+| `src/backend/api/services/graphgen_inference.py` | **New file** -- all inference logic + rendering |
+| `src/backend/api/services/registry.py` | Add `_graphgen_models` cache, `_safe_load_lightning_checkpoint`, `graphgen_generate_stream`, `graphgen_continue_stream`, `force_release_inference_lock` |
+| `src/backend/api/views/graph_generation.py` | Add `GraphGenGenerateView`, `GraphGenContinueView`, SSE streaming helpers |
+| `src/backend/api/views/health.py` | Add inference lock status to health endpoint, add `ForceUnlockView` |
+| `src/backend/api/urls.py` | Wire 3 new routes (generate, continue, debug/force-unlock) |
+| `src/backend/requirements.txt` | Add `Pillow`, `overrides` |
+| `src/backend/README.md` | Mark generate/continue as implemented, document SSE protocol |
+| `docs/postman/collection.json` | Add 12 example requests, auto-chaining test scripts |
 ## Model Differences: Discrete vs Continuous
 |---|---|---|
 | **`node_mask` dtype** | `bool` | `float32` (`.float()` required) |
 | **Initial noise** | `sample_discrete_feature_noise(limit_dist=model.limit_dist, node_mask=node_mask)` | `sample_feature_noise(X_size=(1,n,Xdim), E_size=(1,n,n,Edim), y_size=(1,ydim), node_mask=node_mask)` |
+| **`sample_p_zs_given_zt` returns** | `(sampled_s, discrete_sampled_s)` -- 2-tuple; `discrete_sampled_s.X/E` are already collapsed integers | single `z_s` PlaceHolder with continuous floats |
+| **Chain frame render** | use `discrete_sampled_s.X/E` directly (`.long()` required -- see below) | `utils.unnormalize(z_s.X, z_s.E, z_s.y, model.norm_values, model.norm_biases, node_mask, collapse=True)` -> `.X/.E` are integers |
+| **Final graph collapse** | `PlaceHolder(X=X, E=E, y=y).mask(node_mask, collapse=True)` (`.long()` required) | `model.sample_discrete_graph_given_z0(X, E, y, node_mask)` (runs another forward pass + unnormalize) |
 | **Gibbs ensemble aggregation** | `torch.median(X, dim=1).values` | `torch.mean(X, dim=1)` |
+**Critical `.long()` fix**: The discrete model's `sample_p_zs_given_zt` calls `.type_as(y_t)` internally, which can cast collapsed integer indices to float. Both `_denoising_step` and `_collapse_final` must apply `.long()` on the discrete path to prevent `TypeError: list indices must be integers or slices, not float` in the rendering functions.
 The refinement loop in `run_multiprox_step` uses `sample_p_zs_given_zt` to update `cur_X/E/y` at each step; only the first element of the tuple (or the single return value) is needed there. `_collapse_final` is called once at the end for rendering.
 ## Research Code Reused (read-only)
 | Symbol | Location | How used |
 |---|---|---|
+| `DiscreteDenoisingDiffusion` | `src/diffusion_model_discrete.py` | Loaded via `_safe_load_lightning_checkpoint` for `model_type=discrete` |
+| `LiftedDenoisingDiffusion` | `src/diffusion_model.py` | Loaded via `_safe_load_lightning_checkpoint` for `model_type=continuous` |
 | `model.node_dist.sample_n(1, device)` | both models | Sample number of nodes |
+| `diffusion_utils.sample_discrete_feature_noise(limit_dist, node_mask)` | `src/diffusion/diffusion_utils.py` | Initial noise -- discrete only |
+| `diffusion_utils.sample_feature_noise(X_size, E_size, y_size, node_mask)` | `src/diffusion/diffusion_utils.py` | Initial noise -- continuous only |
+| `model.sample_p_zs_given_zt(s, t, X, E, y, node_mask)` | both models | One denoising step (return varies by model type -- see table above) |
 | `model.apply_noise(X, E, y, node_mask, gibbs=True)` | both models | Re-apply Gibbs noise; uses `model.gibbs_fixed_t_2` internally |
+| `PlaceHolder.mask(node_mask, collapse=True)` | `src/utils.py` | Final collapse -- discrete only |
+| `model.sample_discrete_graph_given_z0(X, E, y, node_mask)` | `diffusion_model.py` | Final collapse -- continuous only |
+| `utils.unnormalize(X, E, y, norm_values, norm_biases, node_mask, collapse=True)` | `src/utils.py` | Chain frame rendering -- continuous only |
+| `model.norm_values`, `model.norm_biases` | `diffusion_model.py` | Unnormalization factors -- continuous only |
+## Design Decisions
+### Streaming Protocol: Server-Sent Events (SSE)
+Both generate and continue endpoints use **SSE** (`text/event-stream`) instead of plain JSON or NDJSON. This enables real-time progress streaming with preview images in Postman's native SSE viewer.
+**Event types:**
+- `event: progress` -- metadata (phase, step, total_steps, elapsed_ms)
+- `event: preview` -- base64 PNG image of the current graph state (separate event so Postman shows clean image updates)
+- `event: result` -- final JSON payload (image, chain_gif, state, timing)
+**SSE helpers** in `views/graph_generation.py`:
+```python
+def _streaming_sse_response(gen):
+    resp = StreamingHttpResponse(_sse_iter(gen), content_type="text/event-stream")
+    resp["Cache-Control"] = "no-cache"
+    resp["X-Accel-Buffering"] = "no"  # nginx
+    return resp
+def _sse_iter(gen):
+    for event in gen:
+        etype = event.get("type", "message")
+        preview = event.pop("preview", None)
+        yield f"event: {etype}\ndata: {json.dumps(event, separators=(',', ':'))}\n\n"
+        if preview:
+            yield f"event: preview\ndata: {preview}\n\n"
+```
+All three inference generators (`run_standard_generation`, `run_multiprox_init`, `run_multiprox_step`) yield progress dicts with optional `preview` keys, then a final `result` dict.
+### Safe Checkpoint Loading: `_safe_load_lightning_checkpoint`
+`load_from_checkpoint` must **not** be used for MultiProxAn models. The comm20 checkpoints were trained with DDP (DistributedDataParallel), causing three cascading failures:
+1. **DDP `__setstate__`/`__getstate__`** -- unpickling checkpoint hparams tries to reconstruct DDP wrappers, which requires an active process group. Fix: monkey-patch `__setstate__` and `__getstate__` during `torch.load`.
+2. **`save_hyperparameters` deepcopy crash** -- Lightning's `save_hyperparameters` deepcopies hparams containing DDP-wrapped datamodule objects, causing a hard process crash (exit code -1). Fix: patch `save_hyperparameters` to a no-op during model construction.
+3. **CUDA OOM** -- the full checkpoint (with heavy pickled hparams objects) crashes if loaded directly to GPU. Fix: `map_location="cpu"` first, then `model.to(device)` after state_dict loading.
+The solution in `registry.py`:
+```python
+def _safe_load_lightning_checkpoint(cls, ckpt_path):
+    # 1. Patch DDP setstate/getstate to simple dict update/return
+    # 2. torch.load(ckpt_path, map_location="cpu", weights_only=False)
+    # 3. Extract hyper_parameters, null out train_metrics/sampling_metrics/visualization_tools
+    # 4. Patch save_hyperparameters to no-op
+    # 5. Construct model via cls(**hparams)
+    # 6. load_state_dict(strict=False), del ckpt, model.to(device), model.eval()
+```
+### Graph Rendering: PIL + networkx (no matplotlib)
+`_render_comm20` uses **PIL + networkx `spring_layout`** instead of matplotlib. Matplotlib's GUI backend initializes in the calling thread and crashes with exit code -1 when called from Django's request threads on Windows. Even with `matplotlib.use("Agg")`, the backend initialization races with Django's threading model.
+```python
+def _render_comm20(X_int, E_int):
+    import networkx as nx
+    from PIL import Image, ImageDraw
+    # Build networkx Graph from adjacency matrix
+    # spring_layout(G, seed=42) for deterministic positions
+    # PIL drawing: lines for edges, ellipses for nodes (#2ecc71 green, #1a7a42 outline)
+    # Returns PIL.Image (300x300)
+```
+QM9 rendering uses RDKit's `MolToImage` (no matplotlib dependency).
+### Inference Lock Management
+A single `threading.Lock` (`_inference_lock`) protects all inference endpoints (COINs predict, graphgen generate, graphgen continue). Design choices:
+- **Non-blocking acquire** -- returns 429 `INFERENCE_BUSY` immediately if locked
+- **Owner tracking** -- `_inference_lock_owner` stores a description string (e.g. `"graphgen_generate comm20/discrete/standard"`) for debugging
+- **Lock held by generator** -- the lock is acquired eagerly (before the generator starts) and released in the generator's `finally` block, so it is held for the entire streaming duration
+- **Force-release endpoint** -- `POST /debug/force-unlock` (DEBUG mode only) for stuck locks from killed clients. Also exposed in the health endpoint's `inference_lock` field
+All three lock acquisition sites follow the same pattern:
+```python
+if not self._inference_lock.acquire(blocking=False):
+    raise InferenceBusy()
+self._inference_lock_owner = f"<endpoint> <context>"
+try:
+    model = self._load_<model>(...)
+except Exception:
+    self._inference_lock_owner = None
+    self._inference_lock.release()
+    raise
+def _gen():
+    try:
+        ...yield events...
+    finally:
+        self._inference_lock_owner = None
+        self._inference_lock.release()
+return _gen()
+```
+### Research Code Import Guards
+Several research code modules import heavy optional dependencies (`graph_tool`, `pyemd`, `pygsp`, `dist_helper`) at module level. These are needed only for metric computation during training, not inference. Since they are transitively imported during checkpoint unpickling (via pickled datamodule/metric objects), they must be guarded with `try/except ImportError` to avoid crashing on systems where these packages are not installed.
+Files to patch: `spectre_utils.py`, `molecular_metrics.py`, `molecular_metrics_discrete.py`, `train_metrics.py`.
 ## Implementation
+### Step 0: Patch research code -- remove `wandb`
+`wandb` is imported at module level in three files; the import alone causes `ImportError` at checkpoint load time. One call in `diffusion_model.py` is **unguarded** (no `if wandb.run:` check) and would crash at inference even if the import somehow succeeded.
 **`src/research/MultiProxAn/src/diffusion_model_discrete.py`**:
 - Remove `import wandb` (line 9)
+- Remove the two `utils.setup_wandb(self.cfg)` call sites
+- Remove all `if wandb.run: wandb.log(...)` / `wandb.run.summary[...]` blocks
 **`src/research/MultiProxAn/src/diffusion_model.py`**:
 - Remove `import wandb` (line 9)
 - Remove the two `utils.setup_wandb(self.cfg)` call sites
+- Remove all guarded `if wandb.run:` blocks + one unguarded `wandb.log(...)` call
 **`src/research/MultiProxAn/src/utils.py`**:
+- Remove `import wandb` and the entire `setup_wandb(cfg)` function
 ---
+### Step 0b: Guard heavy optional imports in research code
+Wrap `graph_tool`, `pyemd`, `pygsp`, and `dist_helper` imports in `try/except` in:
+- `spectre_utils.py` -- all four imports
+- `molecular_metrics.py`, `molecular_metrics_discrete.py`, `train_metrics.py` -- cascading metric deps
+---
+### Step 1: `settings.py` -- sys.path fix
+Both `MultiProxAn/` and `MultiProxAn/src/` must be on `sys.path`:
 ```python
 _MULTIPROXAN_SRC = str(PROJECT_ROOT / "src" / "research" / "MultiProxAn" / "src")
 for _path in (_COINS_KG_ROOT, _MULTIPROXAN_ROOT, _MULTIPROXAN_SRC):
 ### Step 2: New `graphgen_inference.py`
+Completely independent of Django/registry. Receives a loaded model and yields progress/result dicts.
 **Module structure:**
 ```
 graphgen_inference.py
+  +-- Constants: QM9_ATOM_TYPES, STATE_BLOB_MAX_BYTES, REQUIRED_STATE_KEYS
+  |
+  +-- # Model-type helpers
+  +-- _is_discrete(model) -> bool
+  +-- _build_node_mask(n_nodes, n_max, model) -> bool or float32 tensor
+  +-- _sample_initial_noise(model, n_max, node_mask) -> PlaceHolder
+  +-- _denoising_step(model, s_t, t_t, X, E, y, node_mask) -> (X_soft, E_soft, y_soft, X_int, E_int)
+  +-- _gibbs_aggregate(model, X) -> tensor   [median (discrete) or mean (continuous)]
+  +-- _collapse_final(model, X, E, y, node_mask) -> (X_int, E_int)
+  |
+  +-- # Main inference generators (yield progress dicts, then result dict)
+  +-- run_standard_generation(model, num_nodes, diffusion_steps, chain_frames, dataset_id)
+  +-- run_multiprox_init(model, num_nodes, n, m, t, t_prime, gibbs_chain_freq, dataset_id)
+  +-- run_multiprox_step(model, state_dict, dataset_id)
+  |
+  +-- # State serialisation
+  +-- encode_state_blob(state_dict) -> str
+  +-- decode_state_blob(b64_str) -> dict
+  |
+  +-- # Visualisation
+      +-- render_graph(X_int, E_int, dataset_id) -> PIL.Image
+      +-- _render_qm9(X_int, E_int) -> PIL.Image   [RDKit]
+      +-- _render_comm20(X_int, E_int) -> PIL.Image  [PIL + networkx spring_layout]
+      +-- _pil_to_b64(img) -> str
+      +-- _frames_to_gif_b64(frames) -> str
 ```
 #### Model-type helpers
+`_denoising_step` returns `(X_soft, E_soft, y_soft, X_int, E_int)`. For discrete models, `.long()` is applied to `discrete_s.X` and `discrete_s.E` to counteract the `.type_as(y_t)` float cast. For continuous models, `utils.unnormalize` with `collapse=True` produces the integer tensors.
+`_collapse_final` also applies `.long()` on the discrete path for the same reason.
 #### `run_standard_generation`
+Re-implements the denoising loop -- does NOT call `sample_batch`. Uses `s_idx / diffusion_steps` for normalized time. Yields `progress` events at each step (with `preview` at frame intervals) and a final `result` event with `image` and `chain_gif`.
 ```python
 def run_standard_generation(model, num_nodes, diffusion_steps, chain_frames, dataset_id):
             step = diffusion_steps - 1 - s_idx
             if step % frame_interval == 0 or s_idx == 0:
                 gif_frames.append(render_graph(X_int[0, :n_max], E_int[0, :n_max, :n_max], dataset_id))
+            yield {"type": "progress", "phase": "denoise", "step": step + 1, ...}
     X_final, E_final = _collapse_final(model, X, E, y, node_mask)
+    yield {"type": "result", "image": ..., "chain_gif": ..., "inference_time_ms": ...}
 ```
+#### API parameter -> model attribute mapping
 | API param | Model attribute | Role |
 |---|---|---|
 | `t` | `model.gibbs_fixed_t_2` | Fixed noise level for the Gibbs chain; used by `apply_noise(gibbs=True)` and as `fixed_t_norm` in the inner denoising step |
+| `t_prime` | `model.gibbs_fixed_t_1` | Refinement target; `P = int((gibbs_fixed_t_2 - gibbs_fixed_t_1) * T) + 1` steps denoise from `t` down to `t_prime` |
 | `gibbs_chain_freq` | `model.gibbs_chain_freq` | Inner Gibbs steps per `/continue` call (same attribute name in checkpoint) |
 | `m` | `model.gibbs_M` | Ensemble size |
+| `n` | `model.gibbs_N` | Number of outer Gibbs rounds; session is complete when `step == n` |
+All parameters are always explicit in the API request -- no fallback to checkpoint values. The `apply_noise(gibbs=True)` call reads `self.gibbs_fixed_t_2` directly, so we must set `model.gibbs_fixed_t_2 = t` before calling it (and restore afterwards). This is safe because `_inference_lock` ensures single-threaded access.
 #### `run_multiprox_init`
+Initialises the M-member ensemble. Yields `progress` events during noise sampling, then a `result` event with the step-0 image and state dict. Default `gibbs_chain_freq = max(1, m // 10)`.
 #### `run_multiprox_step`
+Runs `gibbs_chain_freq` inner Gibbs iterations. Yields `progress` events during Gibbs phase (with preview of ensemble aggregate) and refinement phase (t -> t_prime denoising). The refinement pass always runs to produce a clean render. Returns `round_complete` and `done` flags.
 **State blob (encode/decode):**
 REQUIRED_STATE_KEYS = {"X", "E", "y", "n_nodes", "dataset_id", "model_type", "T",
                        "n", "m", "t", "t_prime", "gibbs_chain_freq", "inner_step", "step"}
+# encode: torch.save(state_dict, BytesIO) -> base64 string
+# decode: base64 -> bytes -> torch.load(weights_only=False) -> validate keys + X.dim()==4, E.dim()==5
 ```
 ---
 ### Step 3: `registry.py` additions
+**`_safe_load_lightning_checkpoint(cls, ckpt_path)`** -- top-level function, used by `_load_graphgen_model`.
+**New instance variables** in `ModelRegistry.__init__`:
 ```python
+self._inference_lock = threading.Lock()
+self._inference_lock_owner = None   # description string for debugging
+self._graphgen_models = {}          # (dataset_id, model_type) -> loaded eval-mode model
 ```
+**`force_release_inference_lock(self)`** -- emergency release for stuck locks.
+**`_load_graphgen_model(self, dataset_id, model_type)`** -- lazy load with caching. Uses `_safe_load_lightning_checkpoint` instead of `load_from_checkpoint`.
+**`graphgen_generate_stream(self, ...)`** -- acquires lock, loads model, returns a generator. The generator yields SSE-compatible dicts and releases the lock in `finally`. Encodes state blob for multiprox mode.
+**`graphgen_continue_stream(self, state_b64)`** -- decodes state blob eagerly (fail-fast before lock), acquires lock, loads model, returns generator.
 ---
 ### Step 4: `views/graph_generation.py` additions
+**`GraphGenGenerateView.post`** -- validation + dispatch:
 - Require `dataset_id`, `model_type`, `sampling_mode`
+- `dataset_id` in `GRAPHGEN_DATASETS`, `model_type` in `{"discrete","continuous"}`, `sampling_mode` in `{"standard","multiprox"}`
+- Check `registry.graphgen_checkpoints_available` contains `model_type` -> `ModelUnavailable(503)` if not
 - `diffusion_steps` clamped to [50, 1000], `chain_frames` to [10, 30]
 - `num_nodes`: optional int, validated against `GRAPHGEN_DATASETS[dataset_id]["max_nodes"]`
+- multiprox only: `multiprox_params` required; validate `0 < t_prime <= t <= 1`, `2 <= m <= 100`, `n >= 1`, `1 <= gibbs_chain_freq <= m`
+- Return `_streaming_sse_response(gen)`
+**`GraphGenContinueView.post`** -- minimal validation:
 - Require `state` (non-empty string)
+- Return `_streaming_sse_response(gen)`
+**SSE helpers** (`_streaming_sse_response`, `_sse_iter`) -- convert generator dicts to SSE event stream with anti-buffering headers. Preview images split into separate `event: preview` events.
 ---
+### Step 5: `views/health.py` additions
+**`HealthView`** -- include `inference_lock` status (`locked`, `owner`) in response.
+**`ForceUnlockView`** -- DEBUG-only POST endpoint at `debug/force-unlock`.
 ---
+### Step 6: `urls.py`
+```python
+path("graph-generation/generate", GraphGenGenerateView.as_view()),
+path("graph-generation/continue", GraphGenContinueView.as_view()),
+path("debug/force-unlock", ForceUnlockView.as_view()),
+```
 ---
+### Step 7: `requirements.txt` -- new dependencies
+| Package | Why needed |
+|---|---|
+| `Pillow` | PIL rendering in `graphgen_inference.py` |
+| `overrides` | Transitive research code dependency |
+`wandb` is **not** added -- stripped from research code.
+---
+### Step 8: README + Postman
+**`README.md`**: Document SSE streaming protocol with three event types. Add force-unlock endpoint to endpoint table. Update health endpoint description to include lock status.
+**`docs/postman/collection.json`**:
+- 12 graph-generation requests (4 standard generate, 4 multiprox init, 4 continue variants consolidated into 1 using `{{multiprox_state}}`)
+- **Auto-chaining test scripts**: all 4 multiprox init requests have post-response scripts that parse the SSE `result` event and save the `state` field to the `multiprox_state` collection variable. The consolidated continue endpoint reads `{{multiprox_state}}` and also updates it from its result, enabling repeated continue calls.
 ---
 | Risk | Mitigation |
 |---|---|
+| `weights_only=False` in `torch.load` for state blobs | State dict originates from our own server response. The blob is opaque base64 from the client but was generated server-side via `encode_state_blob`. Size limit (10 MB) and key validation provide basic bounds checking. |
+| `_safe_load_lightning_checkpoint` bypasses Lightning's loading | Manual load + `load_state_dict(strict=False)` is correct for inference. Missing keys (if any) default to initialized values. |
 | `model.gibbs_fixed_t_2` attribute override | Only safe because `_inference_lock` ensures single-threaded inference. Save/restore pattern is used. |
+| SSE connection dropped mid-stream | Generator's `finally` block releases the lock. If the generator is garbage-collected without completing, Python still runs the finally block. |
+| Research code imports heavy optional deps at module level | Guarded with `try/except ImportError` in spectre_utils.py, molecular_metrics*.py, train_metrics.py. |
 ## Verification
+1. Django shell: `from diffusion_model_discrete import DiscreteDenoisingDiffusion` -- no `ModuleNotFoundError`
+2. `GET /graph-generation/datasets` returns `available_model_types` (regression check)
+3. All 4 standard generate combinations (qm9/comm20 x discrete/continuous) return SSE streams with progress, preview, and result events
+4. All 4 multiprox init combinations return step-0 image and state blob
+5. Continue endpoint chains correctly -- state updates across init -> continue x 3
+6. Force-unlock endpoint releases stuck locks in DEBUG mode
+7. Health endpoint shows lock status and owner
+8. Error paths: unknown `dataset_id` -> 400, corrupted `state` -> 400, concurrent requests -> 429, `t_prime > t` -> 400

docs/postman/collection.json CHANGED Viewed

@@ -8,6 +8,10 @@
     {
       "key": "base_url",
       "value": "http://localhost:8000/api/v1"
     }
   ],
   "item": [
@@ -52,6 +56,24 @@
             },
             "description": "List the 3 research methods."
           }
         }
       ]
     },
@@ -362,60 +384,201 @@
       "name": "Graph Generation - Inference",
       "item": [
         {
-          "name": "POST /graph-generation/generate (standard)",
           "request": {
             "method": "POST",
-            "header": [
-              { "key": "Content-Type", "value": "application/json" }
-            ],
             "body": {
               "mode": "raw",
-              "raw": "{\n  \"dataset_id\": \"qm9\",\n  \"model_type\": \"discrete\",\n  \"sampling_mode\": \"standard\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"chain_frames\": 20\n}"
             },
             "url": {
               "raw": "{{base_url}}/graph-generation/generate",
               "host": ["{{base_url}}"],
               "path": ["graph-generation", "generate"]
             },
-            "description": "Standard denoising graph generation. Returns final image + chain GIF."
           }
         },
         {
-          "name": "POST /graph-generation/generate (multiprox)",
           "request": {
             "method": "POST",
-            "header": [
-              { "key": "Content-Type", "value": "application/json" }
-            ],
             "body": {
               "mode": "raw",
-              "raw": "{\n  \"dataset_id\": \"comm20\",\n  \"model_type\": \"discrete\",\n  \"sampling_mode\": \"multiprox\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"multiprox_params\": {\n    \"m\": 10,\n    \"t\": 0.5,\n    \"t_prime\": 0.1\n  }\n}"
             },
             "url": {
               "raw": "{{base_url}}/graph-generation/generate",
               "host": ["{{base_url}}"],
               "path": ["graph-generation", "generate"]
             },
-            "description": "MultiProx generation. Returns step 0 + state for continue calls."
           }
         },
         {
           "name": "POST /graph-generation/continue",
           "request": {
             "method": "POST",
-            "header": [
-              { "key": "Content-Type", "value": "application/json" }
-            ],
             "body": {
               "mode": "raw",
-              "raw": "{\n  \"state\": \"<paste state from previous response>\"\n}"
             },
             "url": {
               "raw": "{{base_url}}/graph-generation/continue",
               "host": ["{{base_url}}"],
               "path": ["graph-generation", "continue"]
             },
-            "description": "Advance MultiProx chain by one step. Paste the state from the previous response."
           }
         }
       ]

     {
       "key": "base_url",
       "value": "http://localhost:8000/api/v1"
+    },
+    {
+      "key": "multiprox_state",
+      "value": ""
     }
   ],
   "item": [
             },
             "description": "List the 3 research methods."
           }
+        },
+        {
+          "name": "POST /debug/force-unlock",
+          "request": {
+            "method": "POST",
+            "header": [
+              {
+                "key": "Content-Type",
+                "value": "application/json"
+              }
+            ],
+            "url": {
+              "raw": "{{base_url}}/debug/force-unlock",
+              "host": ["{{base_url}}"],
+              "path": ["debug", "force-unlock"]
+            },
+            "description": "Force-release a stuck inference lock. Only available in DEBUG mode."
+          }
         }
       ]
     },
       "name": "Graph Generation - Inference",
       "item": [
         {
+          "name": "POST /graph-generation/generate (standard, QM9, discrete)",
           "request": {
             "method": "POST",
+            "header": [{ "key": "Content-Type", "value": "application/json" }],
             "body": {
               "mode": "raw",
+              "raw": "{\n  \"dataset_id\": \"qm9\",\n  \"model_type\": \"discrete\",\n  \"sampling_mode\": \"standard\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"chain_frames\": 30\n}"
             },
             "url": {
               "raw": "{{base_url}}/graph-generation/generate",
               "host": ["{{base_url}}"],
               "path": ["graph-generation", "generate"]
             },
+            "description": "Standard denoising on QM9 molecules (discrete model). Returns final PNG image + chain GIF."
           }
         },
         {
+          "name": "POST /graph-generation/generate (standard, QM9, continuous)",
           "request": {
             "method": "POST",
+            "header": [{ "key": "Content-Type", "value": "application/json" }],
+            "body": {
+              "mode": "raw",
+              "raw": "{\n  \"dataset_id\": \"qm9\",\n  \"model_type\": \"continuous\",\n  \"sampling_mode\": \"standard\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"chain_frames\": 30\n}"
+            },
+            "url": {
+              "raw": "{{base_url}}/graph-generation/generate",
+              "host": ["{{base_url}}"],
+              "path": ["graph-generation", "generate"]
+            },
+            "description": "Standard denoising on QM9 molecules (continuous/lifted model). Returns final PNG image + chain GIF."
+          }
+        },
+        {
+          "name": "POST /graph-generation/generate (standard, comm20, discrete)",
+          "request": {
+            "method": "POST",
+            "header": [{ "key": "Content-Type", "value": "application/json" }],
+            "body": {
+              "mode": "raw",
+              "raw": "{\n  \"dataset_id\": \"comm20\",\n  \"model_type\": \"discrete\",\n  \"sampling_mode\": \"standard\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"chain_frames\": 30\n}"
+            },
+            "url": {
+              "raw": "{{base_url}}/graph-generation/generate",
+              "host": ["{{base_url}}"],
+              "path": ["graph-generation", "generate"]
+            },
+            "description": "Standard denoising on Community20 graphs (discrete model). Returns final PNG image + chain GIF."
+          }
+        },
+        {
+          "name": "POST /graph-generation/generate (standard, comm20, continuous)",
+          "request": {
+            "method": "POST",
+            "header": [{ "key": "Content-Type", "value": "application/json" }],
+            "body": {
+              "mode": "raw",
+              "raw": "{\n  \"dataset_id\": \"comm20\",\n  \"model_type\": \"continuous\",\n  \"sampling_mode\": \"standard\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"chain_frames\": 30\n}"
+            },
+            "url": {
+              "raw": "{{base_url}}/graph-generation/generate",
+              "host": ["{{base_url}}"],
+              "path": ["graph-generation", "generate"]
+            },
+            "description": "Standard denoising on Community20 graphs (continuous/lifted model). Returns final PNG image + chain GIF."
+          }
+        },
+        {
+          "name": "POST /graph-generation/generate (multiprox init, QM9, discrete)",
+          "event": [
+            {
+              "listen": "test",
+              "script": {
+                "type": "text/javascript",
+                "exec": ["// Extract state from the SSE result event and store as collection variable", "var body = pm.response.text();", "var lines = body.split('\\n');", "for (var i = 0; i < lines.length; i++) {", "    if (lines[i].trim() === 'event: result' && i + 1 < lines.length) {", "        var dataLine = lines[i + 1].replace(/^data: /, '');", "        try {", "            var result = JSON.parse(dataLine);", "            if (result.state) {", "                pm.collectionVariables.set('multiprox_state', result.state);", "                console.log('State saved to {{multiprox_state}} (' + result.state.length + ' chars)');", "            }", "        } catch (e) { console.log('Failed to parse result event: ' + e); }", "        break;", "    }", "}"]
+              }
+            }
+          ],
+          "request": {
+            "method": "POST",
+            "header": [{ "key": "Content-Type", "value": "application/json" }],
+            "body": {
+              "mode": "raw",
+              "raw": "{\n  \"dataset_id\": \"qm9\",\n  \"model_type\": \"discrete\",\n  \"sampling_mode\": \"multiprox\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"multiprox_params\": {\n    \"n\": 10,\n    \"m\": 100,\n    \"t\": 0.5,\n    \"t_prime\": 0.004,\n    \"gibbs_chain_freq\": 10\n  }\n}"
+            },
+            "url": {
+              "raw": "{{base_url}}/graph-generation/generate",
+              "host": ["{{base_url}}"],
+              "path": ["graph-generation", "generate"]
+            },
+            "description": "MultiProx Gibbs init on QM9 (discrete). Best params from thesis Table 4.3.1: t=50%, t'=0.4% of T. Returns step 0 image + state blob. State is auto-saved to {{multiprox_state}}."
+          }
+        },
+        {
+          "name": "POST /graph-generation/generate (multiprox init, QM9, continuous)",
+          "event": [
+            {
+              "listen": "test",
+              "script": {
+                "type": "text/javascript",
+                "exec": ["var body = pm.response.text();", "var lines = body.split('\\n');", "for (var i = 0; i < lines.length; i++) {", "    if (lines[i].trim() === 'event: result' && i + 1 < lines.length) {", "        var dataLine = lines[i + 1].replace(/^data: /, '');", "        try {", "            var result = JSON.parse(dataLine);", "            if (result.state) { pm.collectionVariables.set('multiprox_state', result.state); }", "        } catch (e) {}", "        break;", "    }", "}"]
+              }
+            }
+          ],
+          "request": {
+            "method": "POST",
+            "header": [{ "key": "Content-Type", "value": "application/json" }],
+            "body": {
+              "mode": "raw",
+              "raw": "{\n  \"dataset_id\": \"qm9\",\n  \"model_type\": \"continuous\",\n  \"sampling_mode\": \"multiprox\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"multiprox_params\": {\n    \"n\": 10,\n    \"m\": 100,\n    \"t\": 0.5,\n    \"t_prime\": 0.004,\n    \"gibbs_chain_freq\": 10\n  }\n}"
+            },
+            "url": {
+              "raw": "{{base_url}}/graph-generation/generate",
+              "host": ["{{base_url}}"],
+              "path": ["graph-generation", "generate"]
+            },
+            "description": "MultiProx Gibbs init on QM9 (continuous/lifted). Best params from thesis Table 4.3.1: t=50%, t'=0.4% of T. Returns step 0 image + state blob. State is auto-saved to {{multiprox_state}}."
+          }
+        },
+        {
+          "name": "POST /graph-generation/generate (multiprox init, comm20, discrete)",
+          "event": [
+            {
+              "listen": "test",
+              "script": {
+                "type": "text/javascript",
+                "exec": ["var body = pm.response.text();", "var lines = body.split('\\n');", "for (var i = 0; i < lines.length; i++) {", "    if (lines[i].trim() === 'event: result' && i + 1 < lines.length) {", "        var dataLine = lines[i + 1].replace(/^data: /, '');", "        try {", "            var result = JSON.parse(dataLine);", "            if (result.state) { pm.collectionVariables.set('multiprox_state', result.state); }", "        } catch (e) {}", "        break;", "    }", "}"]
+              }
+            }
+          ],
+          "request": {
+            "method": "POST",
+            "header": [{ "key": "Content-Type", "value": "application/json" }],
             "body": {
               "mode": "raw",
+              "raw": "{\n  \"dataset_id\": \"comm20\",\n  \"model_type\": \"discrete\",\n  \"sampling_mode\": \"multiprox\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"multiprox_params\": {\n    \"n\": 10,\n    \"m\": 100,\n    \"t\": 0.4,\n    \"t_prime\": 0.1,\n    \"gibbs_chain_freq\": 10\n  }\n}"
             },
             "url": {
               "raw": "{{base_url}}/graph-generation/generate",
               "host": ["{{base_url}}"],
               "path": ["graph-generation", "generate"]
             },
+            "description": "MultiProx Gibbs init on Community20 (discrete). Best params from thesis Table C.2.1: t=40%, t'=10% of T. Returns step 0 image + state blob. State is auto-saved to {{multiprox_state}}."
+          }
+        },
+        {
+          "name": "POST /graph-generation/generate (multiprox init, comm20, continuous)",
+          "event": [
+            {
+              "listen": "test",
+              "script": {
+                "type": "text/javascript",
+                "exec": ["var body = pm.response.text();", "var lines = body.split('\\n');", "for (var i = 0; i < lines.length; i++) {", "    if (lines[i].trim() === 'event: result' && i + 1 < lines.length) {", "        var dataLine = lines[i + 1].replace(/^data: /, '');", "        try {", "            var result = JSON.parse(dataLine);", "            if (result.state) { pm.collectionVariables.set('multiprox_state', result.state); }", "        } catch (e) {}", "        break;", "    }", "}"]
+              }
+            }
+          ],
+          "request": {
+            "method": "POST",
+            "header": [{ "key": "Content-Type", "value": "application/json" }],
+            "body": {
+              "mode": "raw",
+              "raw": "{\n  \"dataset_id\": \"comm20\",\n  \"model_type\": \"continuous\",\n  \"sampling_mode\": \"multiprox\",\n  \"num_nodes\": null,\n  \"diffusion_steps\": 500,\n  \"multiprox_params\": {\n    \"n\": 10,\n    \"m\": 100,\n    \"t\": 0.4,\n    \"t_prime\": 0.1,\n    \"gibbs_chain_freq\": 10\n  }\n}"
+            },
+            "url": {
+              "raw": "{{base_url}}/graph-generation/generate",
+              "host": ["{{base_url}}"],
+              "path": ["graph-generation", "generate"]
+            },
+            "description": "MultiProx Gibbs init on Community20 (continuous/lifted). Best params from thesis Table C.2.1: t=40%, t'=10% of T. Returns step 0 image + state blob. State is auto-saved to {{multiprox_state}}."
           }
         },
         {
           "name": "POST /graph-generation/continue",
+          "event": [
+            {
+              "listen": "test",
+              "script": {
+                "type": "text/javascript",
+                "exec": ["// Update state for chaining multiple continue calls", "var body = pm.response.text();", "var lines = body.split('\\n');", "for (var i = 0; i < lines.length; i++) {", "    if (lines[i].trim() === 'event: result' && i + 1 < lines.length) {", "        var dataLine = lines[i + 1].replace(/^data: /, '');", "        try {", "            var result = JSON.parse(dataLine);", "            if (result.state) {", "                pm.collectionVariables.set('multiprox_state', result.state);", "                console.log('State updated (done=' + result.done + ', step=' + result.step + ')');", "            }", "        } catch (e) {}", "        break;", "    }", "}"]
+              }
+            }
+          ],
           "request": {
             "method": "POST",
+            "header": [{ "key": "Content-Type", "value": "application/json" }],
             "body": {
               "mode": "raw",
+              "raw": "{\n  \"state\": \"{{multiprox_state}}\"\n}"
             },
             "url": {
               "raw": "{{base_url}}/graph-generation/continue",
               "host": ["{{base_url}}"],
               "path": ["graph-generation", "continue"]
             },
+            "description": "Advance MultiProx chain by gibbs_chain_freq inner steps. Uses {{multiprox_state}} from the last init/continue call. Can be fired repeatedly to chain steps."
           }
         }
       ]

src/backend/README.md CHANGED Viewed

@@ -45,6 +45,7 @@ The API is served at `http://localhost:8000/api/v1/`.
 | `DJANGO_SECRET_KEY` | `dev-insecure-key-change-in-production` | Django secret key. **Set in production.** |
 | `DJANGO_DEBUG` | `True` | Enable debug mode. Set to `False` in production. |
 | `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. |
 ## Startup Sequence
@@ -65,8 +66,9 @@ All endpoints are prefixed with `/api/v1/`.
 | Method | Path | Description |
 |---|---|---|
-| `GET` | `/health` | Service health + model availability |
 | `GET` | `/methods` | List the 3 research methods |
 ### COINs — KG Reasoning
@@ -86,8 +88,8 @@ All endpoints are prefixed with `/api/v1/`.
 |---|---|---|
 | `GET` | `/graph-generation/datasets` | List graph types with node/edge types |
 | `GET` | `/graph-generation/sampling-modes` | Sampling strategies with parameter specs |
-| `POST` | `/graph-generation/generate` | Generate a graph (not yet implemented) |
-| `POST` | `/graph-generation/continue` | Continue MultiProx generation (not yet implemented) |
 ### KG Anomaly Correction
@@ -98,6 +100,32 @@ All endpoints are prefixed with `/api/v1/`.
 | `POST` | `/kg-anomaly/correct` | Run correction (not yet implemented) |
 | `POST` | `/kg-anomaly/continue` | Continue MultiProx correction (not yet implemented) |
 ## Project Structure
 ```

 | `DJANGO_SECRET_KEY` | `dev-insecure-key-change-in-production` | Django secret key. **Set in production.** |
 | `DJANGO_DEBUG` | `True` | Enable debug mode. Set to `False` in production. |
 | `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. |
+| `TORCH_DEVICE` | Auto (`cuda:0` if available, else `cpu`) | PyTorch device for model inference. |
 ## Startup Sequence
 | Method | Path | Description |
 |---|---|---|
+| `GET` | `/health` | Service health + model availability + inference lock status |
 | `GET` | `/methods` | List the 3 research methods |
+| `POST` | `/debug/force-unlock` | Release stuck inference lock (debug mode only) |
 ### COINs — KG Reasoning
 |---|---|---|
 | `GET` | `/graph-generation/datasets` | List graph types with node/edge types |
 | `GET` | `/graph-generation/sampling-modes` | Sampling strategies with parameter specs |
+| `POST` | `/graph-generation/generate` | **Streaming NDJSON.** Generate a graph (standard denoising or MultiProx Gibbs init) |
+| `POST` | `/graph-generation/continue` | **Streaming NDJSON.** Advance a MultiProx Gibbs session by one step |
 ### KG Anomaly Correction
 | `POST` | `/kg-anomaly/correct` | Run correction (not yet implemented) |
 | `POST` | `/kg-anomaly/continue` | Continue MultiProx correction (not yet implemented) |
+## Streaming Inference Protocol (SSE)
+The graph generation endpoints (`/generate`, `/continue`) return **Server-Sent Events** (`text/event-stream`). Three event types are emitted:
+**`event: progress`** — phase/step metadata (no images):
+```
+event: progress
+data: {"type":"progress","phase":"denoise","step":42,"total_steps":500,"elapsed_ms":2100}
+```
+**`event: preview`** — base64 PNG of the graph's current state, emitted at key frames:
+```
+event: preview
+data: data:image/png;base64,...
+```
+Preview frequency: `denoise` emits at `chain_frames` intervals (~30 over 500 steps), `gibbs` emits every inner step, `refine` emits every ~10% of steps.
+**`event: result`** — final payload with image, chain GIF, and timing:
+```
+event: result
+data: {"type":"result","dataset_id":"qm9","model_type":"discrete","sampling_mode":"standard","image":"data:image/png;base64,...","chain_gif":"data:image/gif;base64,...","inference_time_ms":25000}
+```
+Phases: `denoise` (standard generation loop), `noise_init` (multiprox init noise sampling), `gibbs` (multiprox inner Gibbs steps), `refine` (multiprox refinement denoising).
 ## Project Structure
 ```