SynPlanner-data
Data repository for SynPlanner β an open-source tool for retrosynthetic planning.
Quick start
pip install SynPlanner
synplan download_preset --preset synplanner-article --save_to synplan_data
Repository structure
SynPlanner-data/
βββ policy/ Reaction rules + policy network weights
β βββ {architecture}/
β βββ {rules_version}/
β βββ reaction_rules.tsv
β βββ pipeline.yaml
β βββ {weights_version}/
β βββ ranking_policy.ckpt
β βββ filtering_policy.ckpt
β
βββ value/ Value network weights
β βββ {architecture}/
β βββ {version}/
β βββ value_network.ckpt
β βββ meta.yaml
β
βββ building_blocks/ Building block sets
β βββ {name}/
β βββ building_blocks.tsv
β βββ meta.yaml
β
βββ reaction_data/ Reaction data pipeline (raw β standardized β filtered)
β βββ {source}/
β βββ raw/
β βββ standardized/{YYYY-MM-DD}/
β βββ filtered/{YYYY-MM-DD}/
β
βββ training_data/ Per-network training inputs
β βββ ranking_policy/{YYYY-MM-DD}/
β βββ filtering_policy/{YYYY-MM-DD}/
β βββ value_network/{YYYY-MM-DD}/
β
βββ presets/ Ready-to-use preset definitions
β βββ {name}.yaml
β
βββ benchmarks/ Benchmark target sets
βββ sascore/
Versioning
Model components (policy/, value/, building_blocks/) are versioned by name:
- Architecture family:
policy/supervised_gcn/,policy/transformer/ - Rules version:
policy/supervised_gcn/v1/β new extraction = new version - Weights version:
policy/supervised_gcn/v1/v1/β retrained weights = new inner version - Value network:
value/supervised_gcn/v1/β retrained =v2/
Pipeline data (reaction_data/, training_data/) is versioned by date (YYYY-MM-DD).
New processing protocol = new date directory alongside old ones.
Presets
Presets are YAML files that map logical names to exact file paths:
name: synplanner-article
files:
reaction_rules: policy/supervised_gcn/v1/reaction_rules.tsv
ranking_policy: policy/supervised_gcn/v1/v1/ranking_policy.ckpt
filtering_policy: policy/supervised_gcn/v1/v1/filtering_policy.ckpt
value_network: value/supervised_gcn/v1/value_network.ckpt
building_blocks: building_blocks/emolecules-salt-ln/building_blocks.tsv
Download a preset (Python):
from synplan.utils.loading import download_preset
paths = download_preset("synplanner-article", save_to="synplan_data")
Available data
policy/supervised_gcn/v1
GCN-based policy trained on filtered USPTO with 24k extracted reaction rules.
| File | Description | Size |
|---|---|---|
reaction_rules.tsv |
Reaction rules in SMARTS format | 8.6 MB |
v1/ranking_policy.ckpt |
Ranking policy network | 157 MB |
v1/filtering_policy.ckpt |
Filtering policy network | 298 MB |
value/supervised_gcn/v1
GCN-based value network trained via self-learning on ChEMBL targets.
| File | Description | Size |
|---|---|---|
value_network.ckpt |
Value network | 15 MB |
building_blocks/emolecules-salt-ln
Standardized building blocks from eMolecules, Sigma Aldrich, and Lancaster.
| File | Description | Size |
|---|---|---|
building_blocks.tsv |
186k molecules (SMILES + price columns) | 6.9 MB |
reaction_data/uspto
USPTO reaction data pipeline.
| File | Description | Size |
|---|---|---|
raw/uspto_full_mapped.smi.zip |
Original USPTO (1.48M reactions) | 165 MB |
standardized/2024-12-31/ |
Standardized reactions + config | to be added |
filtered/2024-12-31/ |
Filtered reactions + config | to be added |
training_data
| Directory | Description | Size |
|---|---|---|
filtering_policy/2024-12-31/ |
600k ChEMBL + COCONUT molecules | 17 MB |
value_network/2024-12-31/ |
28k ChEMBL + COCONUT targets | 720 KB |
ranking_policy/2024-12-31/ |
Config + README (uses reaction_data) | β |
benchmarks/sascore
700 target molecules split into 7 SAScore bins (1.5β8.5). Downloaded separately from presets.
Citation
If you use this data, please cite:
Akhmetshin, T.; Zankov, D.; Gantzer, P.; Babadeev, D.; Pinigina, A.; Madzhidov, T.; Varnek, A. SynPlanner: An End-to-End Tool for Synthesis Planning. J. Chem. Inf. Model. 2025, 65 (1), 15β21. DOI: 10.1021/acs.jcim.4c02004