Phase B: Boltz post-eval scaffold (graceful cpu-basic fallback)
Browse files- Add `spaces>=0.30` to requirements (no-op decorator on cpu-basic)
- Gate torch/boltz behind comments in requirements (uncomment when flipping to ZeroGPU)
- eval_boltz._predict_*: split ImportError from runtime errors with actionable message
- HF_TOKEN secret set out-of-band for hidden-tasks dataset access
- README documents Phase B activation checklist + 4-phase pipeline status table
- README.md +31 -0
- eval_boltz.py +33 -6
- requirements.txt +10 -0
README.md
CHANGED
|
@@ -38,3 +38,34 @@ Novelty, and Diversity. See the *About* tab for the full methodology and the
|
|
| 38 |
- **Guidance Effect** — Paired comparison of the same LLM in unguided (atomic tools) vs guided (composite workflows) mode
|
| 39 |
- **Depth Gap** — Forced-depth and low-diversity intervention results
|
| 40 |
- **About** — Methodology, submission guide, and citation info
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
- **Guidance Effect** — Paired comparison of the same LLM in unguided (atomic tools) vs guided (composite workflows) mode
|
| 39 |
- **Depth Gap** — Forced-depth and low-diversity intervention results
|
| 40 |
- **About** — Methodology, submission guide, and citation info
|
| 41 |
+
|
| 42 |
+
## Backend pipeline phases
|
| 43 |
+
|
| 44 |
+
Submission processing runs in 4 admin-controlled phases:
|
| 45 |
+
|
| 46 |
+
| Phase | Step | Status | Notes |
|
| 47 |
+
|---|---|---|---|
|
| 48 |
+
| **A** | Dispatch tasks → CPU scoring | live | HTTP POST to submitter endpoint, validate, score 5/6 components |
|
| 49 |
+
| **B** | Boltz-2 structure verification | code-ready | Needs ZeroGPU hardware + uncommented `torch`/`boltz` deps |
|
| 50 |
+
| **C** | LLM judge panel (28-pt hybrid) | live | 3-judge PoLL with self-exclusion, requires API key secrets |
|
| 51 |
+
| **D** | Finalize + publish to leaderboard | live | Aggregates hybrid scores, writes back to submissions dataset |
|
| 52 |
+
|
| 53 |
+
### Phase B activation checklist
|
| 54 |
+
|
| 55 |
+
To wire up Boltz-2 verification on this Space:
|
| 56 |
+
|
| 57 |
+
1. **Switch hardware** in HF Space settings → Hardware → `zero-a10g`
|
| 58 |
+
(requires HF Pro / Enterprise).
|
| 59 |
+
2. **Edit `requirements.txt`** and uncomment the two lines:
|
| 60 |
+
```
|
| 61 |
+
torch>=2.2
|
| 62 |
+
boltz>=0.4
|
| 63 |
+
```
|
| 64 |
+
3. **Verify secrets** are set: `HF_TOKEN` (private dataset),
|
| 65 |
+
`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`,
|
| 66 |
+
`DEEPSEEK_API_KEY`.
|
| 67 |
+
4. Restart the Space. The first build will pull ~2GB of CUDA wheels.
|
| 68 |
+
|
| 69 |
+
On `cpu-basic` hardware the Phase B predictors return a structured
|
| 70 |
+
failure dict with `success=False` and an actionable error message
|
| 71 |
+
instead of crashing the dispatcher.
|
eval_boltz.py
CHANGED
|
@@ -7,6 +7,14 @@ Two prediction modes:
|
|
| 7 |
- Complex: Binding tasks (binder + target) -> ipTM, i_pAE
|
| 8 |
|
| 9 |
Batch chunking respects ZeroGPU time limits (~180-240s per burst).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
"""
|
| 11 |
|
| 12 |
from __future__ import annotations
|
|
@@ -28,16 +36,29 @@ MAX_GPU_TIME = 240 # safety margin under 300s ZeroGPU limit
|
|
| 28 |
# ---------------------------------------------------------------------------
|
| 29 |
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
def _predict_monomer(sequence: str) -> dict[str, float]:
|
| 32 |
"""Predict structure of a single protein sequence using Boltz.
|
| 33 |
|
| 34 |
Returns:
|
| 35 |
-
Dict with: pLDDT, pTM (or
|
| 36 |
"""
|
| 37 |
try:
|
| 38 |
-
import torch
|
| 39 |
from boltz import Boltz
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
model = Boltz.from_pretrained("boltz2")
|
| 42 |
result = model.predict(sequence)
|
| 43 |
|
|
@@ -61,12 +82,18 @@ def _predict_complex(
|
|
| 61 |
"""Predict complex structure and binding metrics using Boltz.
|
| 62 |
|
| 63 |
Returns:
|
| 64 |
-
Dict with: ipTM, i_pAE, pLDDT, pTM (or
|
| 65 |
"""
|
| 66 |
try:
|
| 67 |
-
import torch
|
| 68 |
from boltz import Boltz
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
model = Boltz.from_pretrained("boltz2")
|
| 71 |
result = model.predict([binder_seq, target_seq])
|
| 72 |
|
|
|
|
| 7 |
- Complex: Binding tasks (binder + target) -> ipTM, i_pAE
|
| 8 |
|
| 9 |
Batch chunking respects ZeroGPU time limits (~180-240s per burst).
|
| 10 |
+
|
| 11 |
+
Phase B activation checklist (must all be true to actually run Boltz):
|
| 12 |
+
1. HF Space hardware switched to a GPU tier (zero-a10g recommended).
|
| 13 |
+
2. requirements.txt has `torch` and `boltz` uncommented.
|
| 14 |
+
3. HF_TOKEN secret set on the Space (for the private hidden-tasks dataset).
|
| 15 |
+
On a cpu-basic Space the predictors return a structured failure dict
|
| 16 |
+
with `success=False` and an actionable error message rather than
|
| 17 |
+
crashing the dispatcher.
|
| 18 |
"""
|
| 19 |
|
| 20 |
from __future__ import annotations
|
|
|
|
| 36 |
# ---------------------------------------------------------------------------
|
| 37 |
|
| 38 |
|
| 39 |
+
_BOLTZ_NOT_INSTALLED = (
|
| 40 |
+
"Boltz / torch not available on this Space. To enable Phase B, "
|
| 41 |
+
"switch the Space hardware to ZeroGPU (zero-a10g) and uncomment the "
|
| 42 |
+
"torch + boltz lines in requirements.txt."
|
| 43 |
+
)
|
| 44 |
+
|
| 45 |
+
|
| 46 |
def _predict_monomer(sequence: str) -> dict[str, float]:
|
| 47 |
"""Predict structure of a single protein sequence using Boltz.
|
| 48 |
|
| 49 |
Returns:
|
| 50 |
+
Dict with: pLDDT, pTM (or a structured failure dict).
|
| 51 |
"""
|
| 52 |
try:
|
| 53 |
+
import torch # noqa: F401
|
| 54 |
from boltz import Boltz
|
| 55 |
+
except ImportError:
|
| 56 |
+
logger.warning(_BOLTZ_NOT_INSTALLED)
|
| 57 |
+
return {
|
| 58 |
+
"pLDDT": 0.0, "pTM": 0.0,
|
| 59 |
+
"success": False, "error": _BOLTZ_NOT_INSTALLED,
|
| 60 |
+
}
|
| 61 |
+
try:
|
| 62 |
model = Boltz.from_pretrained("boltz2")
|
| 63 |
result = model.predict(sequence)
|
| 64 |
|
|
|
|
| 82 |
"""Predict complex structure and binding metrics using Boltz.
|
| 83 |
|
| 84 |
Returns:
|
| 85 |
+
Dict with: ipTM, i_pAE, pLDDT, pTM (or a structured failure dict).
|
| 86 |
"""
|
| 87 |
try:
|
| 88 |
+
import torch # noqa: F401
|
| 89 |
from boltz import Boltz
|
| 90 |
+
except ImportError:
|
| 91 |
+
logger.warning(_BOLTZ_NOT_INSTALLED)
|
| 92 |
+
return {
|
| 93 |
+
"pLDDT": 0.0, "pTM": 0.0, "ipTM": 0.0, "i_pAE": 0.0,
|
| 94 |
+
"success": False, "error": _BOLTZ_NOT_INSTALLED,
|
| 95 |
+
}
|
| 96 |
+
try:
|
| 97 |
model = Boltz.from_pretrained("boltz2")
|
| 98 |
result = model.predict([binder_seq, target_seq])
|
| 99 |
|
requirements.txt
CHANGED
|
@@ -9,3 +9,13 @@ datasets>=2.16
|
|
| 9 |
anthropic>=0.75
|
| 10 |
openai>=1.40
|
| 11 |
google-genai>=0.3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
anthropic>=0.75
|
| 10 |
openai>=1.40
|
| 11 |
google-genai>=0.3
|
| 12 |
+
|
| 13 |
+
# Phase B (Boltz post-eval). The `spaces` shim is safe on any hardware
|
| 14 |
+
# tier; the `@spaces.GPU(...)` decorator is a no-op on cpu-basic and
|
| 15 |
+
# provisions ZeroGPU on zero-a10g. Boltz-1 + torch require an actual
|
| 16 |
+
# CUDA build, so they are gated: uncomment ONLY after switching the
|
| 17 |
+
# Space hardware to a GPU tier (zero-a10g recommended) — otherwise pip
|
| 18 |
+
# will pull ~2GB of CUDA wheels onto a CPU image and the build fails.
|
| 19 |
+
spaces>=0.30
|
| 20 |
+
# torch>=2.2 # ZeroGPU only — uncomment after hardware flip
|
| 21 |
+
# boltz>=0.4 # ZeroGPU only — uncomment after hardware flip
|