Chromophore Spectral Property Predictor (7 Properties)

A Chemprop v2 multi-component MPNN model that predicts 7 spectroscopic properties of organic chromophores from molecular structure (SMILES) and solvent.

Model Description

  • Architecture: MulticomponentMPNN (depth=4, hidden=400, FFN=400, dropout=0.15)
  • Parameters: 1.1M
  • Framework: Chemprop 2.2.1 (PyTorch Lightning)
  • Inputs: Chromophore SMILES + Solvent SMILES
  • Training data: 20,502 chromophore-solvent pairs from Scientific Data publication
  • Training: 100 epochs on NVIDIA A100 GPU, masked loss for missing values

Predicted Properties

Property MAE Test Samples
Absorption max (nm) 15.40 0.9503 1,739
Emission max (nm) 18.31 0.9212 1,847
Quantum yield 0.1232 0.6754 1,377
abs FWHM (cm⁻¹) 462.2 0.8247 664
emi FWHM (cm⁻¹) 383.6 0.7576 1,091
log(ε/mol⁻¹ dm³ cm⁻¹) 0.1403 0.8582 817
Lifetime (ns) 4.84 0.0834 685

Note: Lifetime prediction has very low R² and should not be relied upon.

Usage

Installation

pip install chemprop>=2.0

Prediction

chemprop predict \
  -i input.csv \
  --model-paths best.pt \
  -o predictions.csv \
  -s Chromophore Solvent

Input CSV format

Chromophore,Solvent
CCN(CC)c1ccc2c(C)cc(=O)oc2c1,CCO
Nc1ccc2c(C(F)(F)F)cc(=O)oc2c1,CC#N

Python API

import torch
from chemprop.models import MPNN
from chemprop.data import MoleculeDatapoint, MoleculeDataset, MulticomponentDataset, build_dataloader
from chemprop.featurizers import SimpleMoleculeMolGraphFeaturizer

# Load model
model_data = torch.load("best.pt", map_location="cpu", weights_only=False)
model = MPNN(
    model_data["hyper_parameters"]["message_passing"],
    model_data["hyper_parameters"]["agg"],
    model_data["hyper_parameters"]["predictor"]
)
model.load_state_dict(model_data["state_dict"])
model.eval()

Training Details

  • Dataset: DB for chromophore_Sci_Data_rev03.csv (20,836 raw entries, 20,502 after cleaning)
  • Data cleaning: Removed 314 duplicates, 17 entries with negative Stokes shift, 20 entries with invalid solvent
  • Missing values: Handled via masked loss (Chemprop native support)
  • Split: Random 80/10/10 (train: 16,401 / val: 2,050 / test: 2,051)
  • Optimizer: Adam with warmup (5 epochs) + cosine decay
  • Hardware: NVIDIA A100-SXM4-40GB, RAIDEN HPC Cluster (RIKEN)

Limitations

  • Quantum yield predictions have moderate accuracy (MAE=0.12, R²=0.68) due to inherent measurement noise and limited structural information
  • Lifetime predictions are unreliable (R²<0.1) - this property requires quantum mechanical calculations beyond SMILES
  • Best performance on common solvents (DCM, acetonitrile, toluene); rare solvents may have higher error
  • Training data is from literature compilations with varying experimental conditions

Citation

If you use this model, please cite:

@article{joung2020experimental,
  title={Experimental database of optical properties of organic compounds},
  author={Joung, Joonyoung F and Han, Minhi and Jeong, Minseok and Park, Sungnam},
  journal={Scientific Data},
  volume={7},
  pages={295},
  year={2020}
}

@article{heid2024chemprop,
  title={Chemprop: A Machine Learning Package for Chemical Property Prediction},
  author={Heid, Esther and others},
  journal={Journal of Chemical Information and Modeling},
  volume={64},
  number={1},
  pages={9--17},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support