Papers
arxiv:2511.21747

QuantumChem-200K: A Large-Scale Open Organic Molecular Dataset for Quantum-Chemistry Property Screening and Language Model Benchmarking

Published on Nov 23, 2025
Authors:
,

Abstract

A large-scale quantum-chemical dataset and fine-tuned language model enable accurate prediction of photophysical properties for accelerated discovery of photoinitiators in two-photon polymerization.

AI-generated summary

The discovery of next-generation photoinitiators for two-photon polymerization (TPP) is hindered by the absence of large, open datasets containing the quantum-chemical and photophysical properties required to model photodissociation and excited-state behavior. Existing molecular datasets typically provide only basic physicochemical descriptors and therefore cannot support data-driven screening or AI-assisted design of photoinitiators. To address this gap, we introduce QuantumChem-200K, a large-scale dataset of over 200,000 organic molecules annotated with eleven quantum-chemical properties, including two-photon absorption (TPA) cross sections, TPA spectral ranges, singlet-triplet intersystem crossing (ISC) energies, toxicity and synthetic accessibility scores, hydrophilicity, solubility, boiling point, molecular weight, and aromaticity. These values are computed using a hybrid workflow that integrates density function theory (DFT), semi-empirical excited-state methods, atomistic quantum solvers, and neural-network predictors. Using QuantumChem-200K, we fine tune the open-source Qwen2.5-32B large language model to create a chemistry AI assistant capable of forward property prediction from SMILES. Benchmarking on 3000 unseen molecules from VQM24 and ZINC20 demonstrates that domain-specific fine-tuning significantly improves accuracy over GPT-4o, Llama-3.1-70B, and the base Qwen2.5-32B model, particularly for TPA and ISC predictions central to photoinitiator design. QuantumChem-200K and the corresponding AI assistant together provide the first scalable platform for high-throughput, LLM-driven photoinitiator screening and accelerated discovery of photosensitive materials.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.21747 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.21747 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.