SynPlanner-data

Data repository for SynPlanner — an open-source tool for retrosynthetic planning.

Quick start

pip install SynPlanner
synplan download_preset --preset synplanner-article --save_to synplan_data

Repository structure

SynPlanner-data/
├── policy/                        Reaction rules + policy network weights
│   └── {architecture}/
│       └── {rules_version}/
│           ├── reaction_rules.tsv
│           ├── pipeline.yaml
│           └── {weights_version}/
│               ├── ranking_policy.ckpt
│               └── filtering_policy.ckpt
│
├── value/                         Value network weights
│   └── {architecture}/
│       └── {version}/
│           ├── value_network.ckpt
│           └── meta.yaml
│
├── building_blocks/               Building block sets
│   └── {name}/
│       ├── building_blocks.tsv
│       └── meta.yaml
│
├── reaction_data/                 Reaction data pipeline (raw → standardized → filtered)
│   └── {source}/
│       ├── raw/
│       ├── standardized/{YYYY-MM-DD}/
│       └── filtered/{YYYY-MM-DD}/
│
├── training_data/                 Per-network training inputs
│   ├── ranking_policy/{YYYY-MM-DD}/
│   ├── filtering_policy/{YYYY-MM-DD}/
│   └── value_network/{YYYY-MM-DD}/
│
├── presets/                       Ready-to-use preset definitions
│   └── {name}.yaml
│
└── benchmarks/                    Benchmark target sets
    └── sascore/

Versioning

Model components (policy/, value/, building_blocks/) are versioned by name:

Architecture family: policy/supervised_gcn/, policy/transformer/
Rules version: policy/supervised_gcn/v1/ — new extraction = new version
Weights version: policy/supervised_gcn/v1/v1/ — retrained weights = new inner version
Value network: value/supervised_gcn/v1/ — retrained = v2/

Pipeline data (reaction_data/, training_data/) is versioned by date (YYYY-MM-DD). New processing protocol = new date directory alongside old ones.

Presets

Presets are YAML files that map logical names to exact file paths:

name: synplanner-article
files:
  reaction_rules: policy/supervised_gcn/v1/reaction_rules.tsv
  ranking_policy: policy/supervised_gcn/v1/v1/ranking_policy.ckpt
  filtering_policy: policy/supervised_gcn/v1/v1/filtering_policy.ckpt
  value_network: value/supervised_gcn/v1/value_network.ckpt
  building_blocks: building_blocks/emolecules-salt-ln/building_blocks.tsv

Download a preset (Python):

from synplan.utils.loading import download_preset

paths = download_preset("synplanner-article", save_to="synplan_data")

Available data

policy/supervised_gcn/v1

GCN-based policy trained on filtered USPTO with 24k extracted reaction rules.

File	Description	Size
`reaction_rules.tsv`	Reaction rules in SMARTS format	8.6 MB
`v1/ranking_policy.ckpt`	Ranking policy network	157 MB
`v1/filtering_policy.ckpt`	Filtering policy network	298 MB

value/supervised_gcn/v1

GCN-based value network trained via self-learning on ChEMBL targets.

File	Description	Size
`value_network.ckpt`	Value network	15 MB

building_blocks/emolecules-salt-ln

Standardized building blocks from eMolecules, Sigma Aldrich, and Lancaster.

File	Description	Size
`building_blocks.tsv`	186k molecules (SMILES + price columns)	6.9 MB

reaction_data/uspto

USPTO reaction data pipeline.

File	Description	Size
`raw/uspto_full_mapped.smi.zip`	Original USPTO (1.48M reactions)	165 MB
`standardized/2024-12-31/`	Standardized reactions + config	to be added
`filtered/2024-12-31/`	Filtered reactions + config	to be added

training_data

Directory	Description	Size
`filtering_policy/2024-12-31/`	600k ChEMBL + COCONUT molecules	17 MB
`value_network/2024-12-31/`	28k ChEMBL + COCONUT targets	720 KB
`ranking_policy/2024-12-31/`	Config + README (uses reaction_data)	—

benchmarks/sascore

700 target molecules split into 7 SAScore bins (1.5–8.5). Downloaded separately from presets.

Citation

If you use this data, please cite:

Akhmetshin, T.; Zankov, D.; Gantzer, P.; Babadeev, D.; Pinigina, A.; Madzhidov, T.; Varnek, A. SynPlanner: An End-to-End Tool for Synthesis Planning. J. Chem. Inf. Model. 2025, 65 (1), 15–21. DOI: 10.1021/acs.jcim.4c02004

Laboratoire-De-Chemoinformatique
/

SynPlanner-data