File size: 4,894 Bytes
a646476
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2d21c89
a646476
 
 
2d21c89
a646476
2d21c89
 
 
a646476
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2d21c89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a646476
 
 
 
2d21c89
 
 
a646476
 
 
 
 
 
 
 
 
 
 
 
 
 
a8d8621
a646476
a8d8621
 
a646476
 
a8d8621
 
 
a646476
 
 
a8d8621
 
a646476
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
license: mit
library_name: pytorch
tags:
  - boolean-networks
  - neuro-symbolic
  - program-induction
  - gene-regulatory-networks
  - systems-biology
  - active-learning
---

# ABLE: Active Boolean Learning Engine

Model weights accompanying the paper **"ABLE: Choosing Perturbation
Experiments to Recover Gene Logic"** (AI for Science Workshop at ICML
2026).

ABLE is a neuro-symbolic pipeline for recovering executable Boolean
regulatory rules from perturbation-state transition data, with
support-conditional uniqueness certificates and active experiment
planning. This repo hosts the paper's released checkpoints. The public
code lives in a companion package (`able-public`); see the
reproducibility README there for install and reproduction commands.

## Contents

| File | Size (bytes) | SHA-256 | Purpose |
|---|---:|---|---|
| `checkpoint_n50_ncf_best.pt` | 24,097,458 | `57c968490a2f1535582cc009fc38f659b6fe4b56f89bf72c9bcfb285640a0c8d` | Main 50-variable NCF-pointer proposer. Used for BBM (Table 2, Figs. 2/3/4/6), Ablation A (Table 9 row), and all default evaluation commands in the public README. |
| `checkpoint_n15_ncf_best.pt` | 23,965,466 | `26cdef1bb4bfb39fbb4c278d2f40528c1328664a80c22c97ee99a901fe4a34f0` | 15-variable NCF-pointer proposer used for Table 1 (four curated biological networks). |
| `checkpoint_n50_unconstrained_best.pt` | 25,312,058 | `03510ef826edce9a53cfa87049abf77cd17ea564e87ef4f06167d19e5b952f83` | Ablation B: 50-variable NCF-free decoder variant (unconstrained truth-table head), used only for Appendix Table 9 / Ablation B. See provenance note below. |

All three are plain PyTorch state dicts saved via
`torch.save({"model_state_dict": ..., "optimizer_state_dict": ...,
"config": ..., "step": ..., "best_metric": ...}, path)`; load them with
`torch.load(path, map_location=..., weights_only=False)`.

## Training recipe (reference)

- Synthetic streaming dataset of k-junta Boolean networks (see
  `NCFStreamingDataset` in the paper codebase).
- Transformer backbone: `d_model=256`, `n_heads=8`, 4 encoder + 2 decoder
  layers, pointer dim 64.
- `num_steps=300000`, AdamW with `lr=1e-4`, `weight_decay=1e-5`.
- `n=50` runs: `num_obs=200`, `noise_rate=0.05`, mixture noise schedule,
  `batch_size=16`.
- `n=15` run: `num_obs=60`, `batch_size=64`.
- Seed 42; single-GPU training.

Exact configs are embedded in each `.pt` under the `"config"` key, and
are also committed alongside the public training scripts.

## Provenance note for `checkpoint_n50_unconstrained_best.pt`

The original post-paper checkpoint for the Ablation B (`unconstrained`)
variant was unrecoverable at release time. The file in this repo is a
**retrain** produced from the same committed training script and
configuration (seed 42, same `DEFAULT_CONFIG`). It reproduces the
paper's expected ablation regime on the synthetic held-out eval
(`transition_acc` bouncing in `[0.014, 0.022]`, `tt_bit_acc ~= 0.836`,
`regulator_set_f1 ~= 0.60`, `functional_agreement ~= 0.92`) but will
**not be byte-identical** to the artifact that originally produced the
paper's Appendix Table 9 / Ablation B numbers, because synthetic data
streaming is sensitive to dataloader-order PRNG draws. Downstream BBM
Lift-Cert numbers are expected to be statistically equivalent but
may differ within run-to-run noise. If bit-exact reproduction of the
paper table is required, rerun the Lift-Cert pipeline against this
checkpoint and report the refreshed numbers.

## Intended use

- Reproduction of the ICML-2026 AI4Science paper numbers. The companion
  CLI `able-download-checkpoints` consumes this repo.
- Research extensions on k-junta Boolean-network recovery from
  perturbation transitions (neuro-symbolic, active-learning, and
  certificate-style work).

## Limitations

- Trained on **synthetic** Boolean networks matched to the paper's
  structural priors (max-indegree 6, mean-indegree ~2.5, NCF-majority
  distributional prior). Out-of-distribution biological networks may
  require retraining or domain adaptation.
- Ablation-B checkpoint (`*_unconstrained_*`) is only meaningful as a
  control: it removes the NCF prior from the decoder head. It is **not**
  the recommended proposer for downstream work.
- The decoder consumes quantised occupancy statistics, not raw state
  trajectories; inference pipelines must feed data through the paired
  preprocessing code in `able-public`.

## Download

The companion code package is available at https://github.com/phuayj/able.
Install it and run the bundled checkpoint downloader:

```bash
git clone https://github.com/phuayj/able.git
cd able
pip install -e .
able-download-checkpoints --output-dir checkpoints
```

This places all three checkpoint files under `checkpoints/`. No
authentication is required for downloads.

## Citation

See `CITATION.cff` in the paper codebase.

## License

MIT (weights released alongside the paper code).