Spaces:

alibtsd
/

FlowProt

Running

App Files Files Community

FlowProt / README.md

alibtsd

Deploy FlowProt Docker Space

f34af6f verified 20 days ago

preview code

Raw

History Blame Contribute Delete

10 kB

	---
	title: FlowProt
	sdk: docker
	app_port: 7860
	license: mit
	---

	# FlowProt Hugging Face Space (Docker MVP)

	Live Space: [alibtsd/FlowProt](https://huggingface.co/spaces/alibtsd/FlowProt)

	This repository is wired for a remote-first Hugging Face Space deployment for FlowProt.
	The Space exposes the following inference modes:

	- `unconditional` generation (enabled)
	- `classifier`-guided generation (enabled when a classifier checkpoint is configured)
	- `conditional` generation (feature-flagged via `app.enable_conditional`, disabled by default)

	## Workflow

	The UI is organised into four tabs that follow the design lifecycle:

	1. Generate - choose a mode, length, number of samples, seed, and sampling timesteps, then run inference. Classifier and conditional controls appear only for the relevant mode.
	2. View - inspect generated backbones in interactive Mol* 3D, step through the generation trajectory, and overlay aligned backbone vs ESMFold comparisons.
	3. Analyze - run ProteinMPNN + ESMFold self-consistency, browse metrics and leaderboards, and export filtered results.
	4. Advanced - reload recent runs, manage the saved demo example, and view diagnostics/health.

	A status badge at the top reports whether the model is loaded, whether self-consistency tooling is available, and the active device. A "Load demo example" button gives first-time visitors a one-click result.

	## Architecture

	- `app.py`: Gradio UI only (request handling + user-visible errors + health display)
	- `inference.py`: preprocessing/prediction/postprocessing orchestration
	- `model_loader.py`: env-driven artifact resolution + model loading with defensive logging
	- `config.yaml`: runtime defaults for UI limits and sampling parameters

	## Viewer Experience

	- `Mol*` is the primary viewer for generated structures.
	- The sample selector (View tab) shows the first selected sample in 3D; all selected samples feed self-consistency. A live label clarifies the current selection.
	- Trajectory playback: step through, or play back, the backbone (`bb_traj`) or denoised (`x0_traj`) trajectory of the selected sample with a built-in frame slider.
	- The viewer supports an aligned compare flow: pick a generated backbone and a folded sample from self-consistency, then load an aligned overlay in the same Mol* panel. Clicking a row in the Analyze metrics table loads the overlay automatically.
	- Viewer rendering is iframe-isolated to avoid Gradio layout conflicts.

	## Inference Controls

	- Optional per-run seed override is available in the UI for reproducible sampling (`inference.seed` remains the fallback default).
	- The number of sampling timesteps is adjustable per run (more steps trade speed for quality).
	- Generated `sample.pdb` files are always returned in the main output panel.
	- Optional trajectory downloads can be enabled from the UI to expose `traj` and `x0_traj` artifacts.
	- A run can be bundled into a single downloadable `.zip` from the Generate tab.

	## Conditional Generation (feature-flagged)

	When `app.enable_conditional: true`, the Generate tab exposes a conditional workflow:

	- Upload a reference PDB (chain A) to define fixed backbone positions.
	- Specify fixed residues as ranges/indices (for example `10-40,55,60-62`); leaving it empty fixes all residues.
	- Optionally apply classifier guidance during conditional sampling.

	Protein length is taken from the uploaded reference structure. Enable this flag only after validating end-to-end with a real checkpoint and reference structure.

	## Run History

	Each successful run is recorded in `space_outputs/run_history.json` (mode, seed, sample count, timesteps, sample paths). The Advanced tab lets you reload a recent run's samples and any saved self-consistency results without re-running inference.

	## Self-Consistency Metrics

	The UI includes a self-consistency action that runs:

	1. `ProteinMPNN` sequence design on a generated sample PDB
	2. optional `ESMFold` structure prediction for designed sequences
	3. metric computation against the generated reference structure:
	- `scTM`
	- `scRMSD`

	The self-consistency panel also includes:

	- fixed plot axis ranges (not metric filtering):
	- `scTM`: `[0, 1]`
	- `scRMSD`: `[0, 10]`
	- `ESMFold mean pLDDT`: `[0, 1]`
	- ranked leaderboards:
	- top-N by highest `scTM`
	- top-N by lowest `scRMSD`
	- sample labels in tables/plots use descriptive run/length/sample names while canonical source paths stay in backend metadata.
	- per-sequence table defaults to showing all evaluated rows; optional view filters can narrow by:
	- sample label (multi-select)
	- sample label text query
	- minimum `scTM`
	- maximum `scRMSD`
	- minimum `ESMFold mean pLDDT`

	When multiple samples are selected in the sample selector, self-consistency runs for all selected samples and reports a single aggregated metrics view across those samples. A progress bar reports per-sample and per-sequence folding progress.
	Status logs include selected/processed sample counts, requested sequences per sample, folding mode, artifact counts, and a concise failed-sample preview when partial failures happen.

	The scatter plots show per-point hover tooltips (sample label and metric values). The current (filtered) metrics view can be exported to CSV from the Analyze tab.

	Artifacts are written per run under the selected sample directory in a timestamped `self_consistency/` folder.

	## Saved Example Case

	You can preserve a generated sample as a reusable demo case:

	- Click Save selected sample as example to copy the selected sample directory into:
	- `examples/flowprot_space_example/sample/`
	- If no sample is selected, the app falls back to the most recent generated sample under `space_outputs/`.
	- The app also writes:
	- `examples/flowprot_space_example/manifest.json`
	- Click Load saved example to view that sample in Mol* without running inference.
	- If the example contains `self_consistency/*/sc_results.csv`, the app auto-loads the latest self-consistency summary/table/plots and leaderboards.

	## Model Artifact Configuration

	Configure exactly one artifact source using environment variables:

	1. Local checkpoint path:
	- `FLOWPROT_CKPT_PATH=/path/to/model.ckpt`
	2. Local checkpoint directory:
	- `FLOWPROT_CKPT_DIR=/path/to/ckpt_dir`
	- optional: `FLOWPROT_CKPT_FILENAME=epoch.ckpt` (default)
	3. Hugging Face Hub snapshot:
	- `FLOWPROT_HF_REPO_ID=<org-or-user>/<repo>`
	- `FLOWPROT_CKPT_FILENAME=<relative/path/in/repo>.ckpt`
	- optional: `FLOWPROT_HF_REVISION=<branch-or-commit>`
	- optional for private repos: `HF_TOKEN=<token>`

	Optional overrides:

	- `FLOWPROT_CKPT_CONFIG_PATH=/path/to/config.yaml`
	- `FLOWPROT_CKPT_CONFIG_FILENAME=config.yaml` (default)
	- `FLOWPROT_APP_CONFIG=/app/config.yaml`
	- `FLOWPROT_DEVICE=auto\|cpu\|cuda\|cuda:0`
	- `FLOWPROT_LOG_LEVEL=INFO\|DEBUG\|WARNING\|ERROR`
	- `FLOWPROT_PMPNN_DIR=/app/model/ProteinMPNN`
	- `FLOWPROT_PMPNN_WEIGHTS_DIR=/app/model/ProteinMPNN/vanilla_model_weights`
	- `FLOWPROT_PMPNN_MODEL_NAME=v_48_020`
	- `FLOWPROT_SC_DEVICE=auto\|cpu\|cuda\|cuda:0`
	- `FLOWPROT_SC_SEED=123`
	- `FLOWPROT_ESMFOLD_MODEL_ID=facebook/esmfold_v1`
	- `FLOWPROT_VIEW_EXAMPLE_ON_STARTUP=true\|false` (default `false`)

	## Local Run (Import/UI only)

	```bash
	pip install -r requirements.txt
	python app.py
	```

	If no checkpoint is configured, the UI still starts and shows model-load failures in the status panel.

	For local checkpoint testing:

	```bash
	export FLOWPROT_CKPT_DIR=model/ckpt
	export FLOWPROT_CKPT_FILENAME=epoch29.ckpt
	python app.py
	```

	PowerShell equivalent:

	```powershell
	$env:FLOWPROT_CKPT_DIR='model/ckpt'
	$env:FLOWPROT_CKPT_FILENAME='epoch29.ckpt'
	python app.py
	```

	## Smoke Checks

	Use `smoke_checks.py` to run explicit validation tiers:

	```bash
	python smoke_checks.py --tier import
	python smoke_checks.py --tier handlers
	python smoke_checks.py --tier model-load
	python smoke_checks.py --tier inference --length 64 --num-samples 1
	```

	The `handlers` tier validates pure UI helper logic (seed parsing, fixed-residue parsing, self-consistency filtering, status badge) without requiring a model checkpoint.

	Inference tier accepts optional reproducibility control:

	```bash
	python smoke_checks.py --tier inference --length 64 --num-samples 1 --seed 123
	```

	### Validation Matrix (local vs remote)

	- `import`: local-capable (no checkpoint required)
	- `handlers`: local-capable (no checkpoint required)
	- `model-load`: generally remote-target unless real artifacts are configured locally
	- `inference`: remote-target for production confidence (GPU/artifact/runtime parity)

	## Validation Checklist (Local vs Hugging Face Space)

	Use this checklist to separate what should pass locally vs what should be verified from Space logs/runtime:

	1. Local import health
	- `python smoke_checks.py --tier import`
	- expected: app modules import cleanly in your local env with dependencies installed.
	2. Local UI startup
	- `python app.py`
	- expected: Gradio launches; diagnostics panel is collapsed by default.
	3. Model load and inference
	- local optional when real artifacts are available.
	- otherwise verify remotely in Space logs with configured artifacts/device.
	4. Self-consistency flow
	- run from UI against one generated sample, confirm:
	- per-sequence table initially shows all rows
	- applying view filters narrows rows non-destructively
	- reset restores unfiltered view
	5. Aligned compare viewer
	- after self-consistency produces folded structures, select backbone + folded row and click Load aligned compare view
	- expected: overlay loads in Mol* and compare status reports before/after RMSD.

	## Docker Space Startup

	The Docker image starts with:

	```bash
	python app.py
	```

	Space health and model load status are visible in the Gradio UI.