SynPlanner-data

Data repository for SynPlanner β€” an open-source tool for retrosynthetic planning.

Quick start

pip install SynPlanner
synplan download_preset --preset synplanner-article --save_to synplan_data

Repository structure

SynPlanner-data/
β”œβ”€β”€ policy/                        Reaction rules + policy network weights
β”‚   └── {architecture}/
β”‚       └── {rules_version}/
β”‚           β”œβ”€β”€ reaction_rules.tsv
β”‚           β”œβ”€β”€ pipeline.yaml
β”‚           └── {weights_version}/
β”‚               β”œβ”€β”€ ranking_policy.ckpt
β”‚               └── filtering_policy.ckpt
β”‚
β”œβ”€β”€ value/                         Value network weights
β”‚   └── {architecture}/
β”‚       └── {version}/
β”‚           β”œβ”€β”€ value_network.ckpt
β”‚           └── meta.yaml
β”‚
β”œβ”€β”€ building_blocks/               Building block sets
β”‚   └── {name}/
β”‚       β”œβ”€β”€ building_blocks.tsv
β”‚       └── meta.yaml
β”‚
β”œβ”€β”€ reaction_data/                 Reaction data pipeline (raw β†’ standardized β†’ filtered)
β”‚   └── {source}/
β”‚       β”œβ”€β”€ raw/
β”‚       β”œβ”€β”€ standardized/{YYYY-MM-DD}/
β”‚       └── filtered/{YYYY-MM-DD}/
β”‚
β”œβ”€β”€ training_data/                 Per-network training inputs
β”‚   β”œβ”€β”€ ranking_policy/{YYYY-MM-DD}/
β”‚   β”œβ”€β”€ filtering_policy/{YYYY-MM-DD}/
β”‚   └── value_network/{YYYY-MM-DD}/
β”‚
β”œβ”€β”€ presets/                       Ready-to-use preset definitions
β”‚   └── {name}.yaml
β”‚
└── benchmarks/                    Benchmark target sets
    └── sascore/

Versioning

Model components (policy/, value/, building_blocks/) are versioned by name:

  • Architecture family: policy/supervised_gcn/, policy/transformer/
  • Rules version: policy/supervised_gcn/v1/ β€” new extraction = new version
  • Weights version: policy/supervised_gcn/v1/v1/ β€” retrained weights = new inner version
  • Value network: value/supervised_gcn/v1/ β€” retrained = v2/

Pipeline data (reaction_data/, training_data/) is versioned by date (YYYY-MM-DD). New processing protocol = new date directory alongside old ones.

Presets

Presets are YAML files that map logical names to exact file paths:

name: synplanner-article
files:
  reaction_rules: policy/supervised_gcn/v1/reaction_rules.tsv
  ranking_policy: policy/supervised_gcn/v1/v1/ranking_policy.ckpt
  filtering_policy: policy/supervised_gcn/v1/v1/filtering_policy.ckpt
  value_network: value/supervised_gcn/v1/value_network.ckpt
  building_blocks: building_blocks/emolecules-salt-ln/building_blocks.tsv

Download a preset (Python):

from synplan.utils.loading import download_preset

paths = download_preset("synplanner-article", save_to="synplan_data")

Available data

policy/supervised_gcn/v1

GCN-based policy trained on filtered USPTO with 24k extracted reaction rules.

File Description Size
reaction_rules.tsv Reaction rules in SMARTS format 8.6 MB
v1/ranking_policy.ckpt Ranking policy network 157 MB
v1/filtering_policy.ckpt Filtering policy network 298 MB

value/supervised_gcn/v1

GCN-based value network trained via self-learning on ChEMBL targets.

File Description Size
value_network.ckpt Value network 15 MB

building_blocks/emolecules-salt-ln

Standardized building blocks from eMolecules, Sigma Aldrich, and Lancaster.

File Description Size
building_blocks.tsv 186k molecules (SMILES + price columns) 6.9 MB

reaction_data/uspto

USPTO reaction data pipeline.

File Description Size
raw/uspto_full_mapped.smi.zip Original USPTO (1.48M reactions) 165 MB
standardized/2024-12-31/ Standardized reactions + config to be added
filtered/2024-12-31/ Filtered reactions + config to be added

training_data

Directory Description Size
filtering_policy/2024-12-31/ 600k ChEMBL + COCONUT molecules 17 MB
value_network/2024-12-31/ 28k ChEMBL + COCONUT targets 720 KB
ranking_policy/2024-12-31/ Config + README (uses reaction_data) β€”

benchmarks/sascore

700 target molecules split into 7 SAScore bins (1.5–8.5). Downloaded separately from presets.

Citation

If you use this data, please cite:

Akhmetshin, T.; Zankov, D.; Gantzer, P.; Babadeev, D.; Pinigina, A.; Madzhidov, T.; Varnek, A. SynPlanner: An End-to-End Tool for Synthesis Planning. J. Chem. Inf. Model. 2025, 65 (1), 15–21. DOI: 10.1021/acs.jcim.4c02004

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support