| ---
|
| title: FlowProt
|
| sdk: docker
|
| app_port: 7860
|
| license: mit
|
| ---
|
|
|
| # FlowProt Hugging Face Space (Docker MVP)
|
|
|
| **Live Space:** [alibtsd/FlowProt](https://huggingface.co/spaces/alibtsd/FlowProt)
|
|
|
| This repository is wired for a **remote-first Hugging Face Space deployment** for FlowProt.
|
| The Space exposes the following inference modes:
|
|
|
| - `unconditional` generation (enabled)
|
| - `classifier`-guided generation (enabled when a classifier checkpoint is configured)
|
| - `conditional` generation (feature-flagged via `app.enable_conditional`, disabled by default)
|
|
|
| ## Workflow
|
|
|
| The UI is organised into four tabs that follow the design lifecycle:
|
|
|
| 1. **Generate** - choose a mode, length, number of samples, seed, and sampling timesteps, then run inference. Classifier and conditional controls appear only for the relevant mode.
|
| 2. **View** - inspect generated backbones in interactive Mol* 3D, step through the generation trajectory, and overlay aligned backbone vs ESMFold comparisons.
|
| 3. **Analyze** - run ProteinMPNN + ESMFold self-consistency, browse metrics and leaderboards, and export filtered results.
|
| 4. **Advanced** - reload recent runs, manage the saved demo example, and view diagnostics/health.
|
|
|
| A status badge at the top reports whether the model is loaded, whether self-consistency tooling is available, and the active device. A "Load demo example" button gives first-time visitors a one-click result.
|
|
|
| ## Architecture
|
|
|
| - `app.py`: Gradio UI only (request handling + user-visible errors + health display)
|
| - `inference.py`: preprocessing/prediction/postprocessing orchestration
|
| - `model_loader.py`: env-driven artifact resolution + model loading with defensive logging
|
| - `config.yaml`: runtime defaults for UI limits and sampling parameters
|
|
|
| ## Viewer Experience
|
|
|
| - `Mol*` is the primary viewer for generated structures.
|
| - The sample selector (View tab) shows the **first** selected sample in 3D; all selected samples feed self-consistency. A live label clarifies the current selection.
|
| - **Trajectory playback**: step through, or play back, the backbone (`bb_traj`) or denoised (`x0_traj`) trajectory of the selected sample with a built-in frame slider.
|
| - The viewer supports an **aligned compare flow**: pick a generated backbone and a folded sample from self-consistency, then load an aligned overlay in the same Mol* panel. Clicking a row in the Analyze metrics table loads the overlay automatically.
|
| - Viewer rendering is iframe-isolated to avoid Gradio layout conflicts.
|
|
|
| ## Inference Controls
|
|
|
| - Optional per-run seed override is available in the UI for reproducible sampling (`inference.seed` remains the fallback default).
|
| - The number of sampling timesteps is adjustable per run (more steps trade speed for quality).
|
| - Generated `sample.pdb` files are always returned in the main output panel.
|
| - Optional trajectory downloads can be enabled from the UI to expose `traj` and `x0_traj` artifacts.
|
| - A run can be bundled into a single downloadable `.zip` from the Generate tab.
|
|
|
| ## Conditional Generation (feature-flagged)
|
|
|
| When `app.enable_conditional: true`, the Generate tab exposes a conditional workflow:
|
|
|
| - Upload a reference PDB (chain A) to define fixed backbone positions.
|
| - Specify fixed residues as ranges/indices (for example `10-40,55,60-62`); leaving it empty fixes all residues.
|
| - Optionally apply classifier guidance during conditional sampling.
|
|
|
| Protein length is taken from the uploaded reference structure. Enable this flag only after validating end-to-end with a real checkpoint and reference structure.
|
|
|
| ## Run History
|
|
|
| Each successful run is recorded in `space_outputs/run_history.json` (mode, seed, sample count, timesteps, sample paths). The Advanced tab lets you reload a recent run's samples and any saved self-consistency results without re-running inference.
|
|
|
| ## Self-Consistency Metrics
|
|
|
| The UI includes a self-consistency action that runs:
|
|
|
| 1. `ProteinMPNN` sequence design on a generated sample PDB
|
| 2. optional `ESMFold` structure prediction for designed sequences
|
| 3. metric computation against the generated reference structure:
|
| - `scTM`
|
| - `scRMSD`
|
|
|
| The self-consistency panel also includes:
|
|
|
| - fixed plot axis ranges (not metric filtering):
|
| - `scTM`: `[0, 1]`
|
| - `scRMSD`: `[0, 10]`
|
| - `ESMFold mean pLDDT`: `[0, 1]`
|
| - ranked leaderboards:
|
| - top-N by highest `scTM`
|
| - top-N by lowest `scRMSD`
|
| - sample labels in tables/plots use descriptive run/length/sample names while canonical source paths stay in backend metadata.
|
| - per-sequence table defaults to showing all evaluated rows; optional view filters can narrow by:
|
| - sample label (multi-select)
|
| - sample label text query
|
| - minimum `scTM`
|
| - maximum `scRMSD`
|
| - minimum `ESMFold mean pLDDT`
|
|
|
| When multiple samples are selected in the sample selector, self-consistency runs for all selected samples and reports a single aggregated metrics view across those samples. A progress bar reports per-sample and per-sequence folding progress.
|
| Status logs include selected/processed sample counts, requested sequences per sample, folding mode, artifact counts, and a concise failed-sample preview when partial failures happen.
|
|
|
| The scatter plots show per-point hover tooltips (sample label and metric values). The current (filtered) metrics view can be exported to CSV from the Analyze tab.
|
|
|
| Artifacts are written per run under the selected sample directory in a timestamped `self_consistency/` folder.
|
|
|
| ## Saved Example Case
|
|
|
| You can preserve a generated sample as a reusable demo case:
|
|
|
| - Click **Save selected sample as example** to copy the selected sample directory into:
|
| - `examples/flowprot_space_example/sample/`
|
| - If no sample is selected, the app falls back to the most recent generated sample under `space_outputs/`.
|
| - The app also writes:
|
| - `examples/flowprot_space_example/manifest.json`
|
| - Click **Load saved example** to view that sample in Mol* without running inference.
|
| - If the example contains `self_consistency/*/sc_results.csv`, the app auto-loads the latest self-consistency summary/table/plots and leaderboards.
|
|
|
| ## Model Artifact Configuration
|
|
|
| Configure exactly one artifact source using environment variables:
|
|
|
| 1. Local checkpoint path:
|
| - `FLOWPROT_CKPT_PATH=/path/to/model.ckpt`
|
| 2. Local checkpoint directory:
|
| - `FLOWPROT_CKPT_DIR=/path/to/ckpt_dir`
|
| - optional: `FLOWPROT_CKPT_FILENAME=epoch.ckpt` (default)
|
| 3. Hugging Face Hub snapshot:
|
| - `FLOWPROT_HF_REPO_ID=<org-or-user>/<repo>`
|
| - `FLOWPROT_CKPT_FILENAME=<relative/path/in/repo>.ckpt`
|
| - optional: `FLOWPROT_HF_REVISION=<branch-or-commit>`
|
| - optional for private repos: `HF_TOKEN=<token>`
|
|
|
| Optional overrides:
|
|
|
| - `FLOWPROT_CKPT_CONFIG_PATH=/path/to/config.yaml`
|
| - `FLOWPROT_CKPT_CONFIG_FILENAME=config.yaml` (default)
|
| - `FLOWPROT_APP_CONFIG=/app/config.yaml`
|
| - `FLOWPROT_DEVICE=auto|cpu|cuda|cuda:0`
|
| - `FLOWPROT_LOG_LEVEL=INFO|DEBUG|WARNING|ERROR`
|
| - `FLOWPROT_PMPNN_DIR=/app/model/ProteinMPNN`
|
| - `FLOWPROT_PMPNN_WEIGHTS_DIR=/app/model/ProteinMPNN/vanilla_model_weights`
|
| - `FLOWPROT_PMPNN_MODEL_NAME=v_48_020`
|
| - `FLOWPROT_SC_DEVICE=auto|cpu|cuda|cuda:0`
|
| - `FLOWPROT_SC_SEED=123`
|
| - `FLOWPROT_ESMFOLD_MODEL_ID=facebook/esmfold_v1`
|
| - `FLOWPROT_VIEW_EXAMPLE_ON_STARTUP=true|false` (default `false`)
|
|
|
| ## Local Run (Import/UI only)
|
|
|
| ```bash
|
| pip install -r requirements.txt
|
| python app.py
|
| ```
|
|
|
| If no checkpoint is configured, the UI still starts and shows model-load failures in the status panel.
|
|
|
| For local checkpoint testing:
|
|
|
| ```bash
|
| export FLOWPROT_CKPT_DIR=model/ckpt
|
| export FLOWPROT_CKPT_FILENAME=epoch29.ckpt
|
| python app.py
|
| ```
|
|
|
| PowerShell equivalent:
|
|
|
| ```powershell
|
| $env:FLOWPROT_CKPT_DIR='model/ckpt'
|
| $env:FLOWPROT_CKPT_FILENAME='epoch29.ckpt'
|
| python app.py
|
| ```
|
|
|
| ## Smoke Checks
|
|
|
| Use `smoke_checks.py` to run explicit validation tiers:
|
|
|
| ```bash
|
| python smoke_checks.py --tier import
|
| python smoke_checks.py --tier handlers
|
| python smoke_checks.py --tier model-load
|
| python smoke_checks.py --tier inference --length 64 --num-samples 1
|
| ```
|
|
|
| The `handlers` tier validates pure UI helper logic (seed parsing, fixed-residue parsing, self-consistency filtering, status badge) without requiring a model checkpoint.
|
|
|
| Inference tier accepts optional reproducibility control:
|
|
|
| ```bash
|
| python smoke_checks.py --tier inference --length 64 --num-samples 1 --seed 123
|
| ```
|
|
|
| ### Validation Matrix (local vs remote)
|
|
|
| - `import`: local-capable (no checkpoint required)
|
| - `handlers`: local-capable (no checkpoint required)
|
| - `model-load`: generally remote-target unless real artifacts are configured locally
|
| - `inference`: remote-target for production confidence (GPU/artifact/runtime parity)
|
|
|
| ## Validation Checklist (Local vs Hugging Face Space)
|
|
|
| Use this checklist to separate what should pass locally vs what should be verified from Space logs/runtime:
|
|
|
| 1. Local import health
|
| - `python smoke_checks.py --tier import`
|
| - expected: app modules import cleanly in your local env with dependencies installed.
|
| 2. Local UI startup
|
| - `python app.py`
|
| - expected: Gradio launches; diagnostics panel is collapsed by default.
|
| 3. Model load and inference
|
| - local optional when real artifacts are available.
|
| - otherwise verify remotely in Space logs with configured artifacts/device.
|
| 4. Self-consistency flow
|
| - run from UI against one generated sample, confirm:
|
| - per-sequence table initially shows all rows
|
| - applying view filters narrows rows non-destructively
|
| - reset restores unfiltered view
|
| 5. Aligned compare viewer
|
| - after self-consistency produces folded structures, select backbone + folded row and click **Load aligned compare view**
|
| - expected: overlay loads in Mol* and compare status reports before/after RMSD.
|
|
|
| ## Docker Space Startup
|
|
|
| The Docker image starts with:
|
|
|
| ```bash
|
| python app.py
|
| ```
|
|
|
| Space health and model load status are visible in the Gradio UI.
|
|
|