Spaces:
Sleeping
Sleeping
| # Model documentation & parameters | |
| **Algorithm Version**: Which model version to use. | |
| **Maximal sequence length**: The maximal number of SMILES tokens in the generated molecule. | |
| **Number of samples**: How many samples should be generated (between 1 and 50). | |
| # Model card -- PolymerBlocks | |
| **Model Details**: *PolymerBlocks* is a sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers). The model relies on a Variational Autoencoder architecture as described in [Born et al. (2021; *iScience*)](https://www.sciencedirect.com/science/article/pii/S2589004221002376). | |
| **Developers**: Matteo Manica and colleagues from IBM Research. | |
| **Distributors**: Original authors' code integrated into GT4SD. | |
| **Model date**: Not yet published. | |
| **Model version**: Only initial model version. The model has been pre-trained on 500K compounds from PubChem and further fine-tuned on the SMILES representing monomers and catalysts collected in the database presented in [Park et al. (2022)](https://doi.org/10.26434/chemrxiv-2022-811rl). | |
| **Model type**: A sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers). | |
| **Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**: the sequence-based model is a standard GRU-based VAE trained to reconstruct SMILES representation of molecules. Given the nature of the pre-training and fine-tuning data, the model is biased to create molecules that resemble catalysts and monomers employed in ring-opening polymerization. | |
| **Paper or other resource for more information**: Details on the model used and code can be found in [Born et al. (2021; *iScience*)](https://www.sciencedirect.com/science/article/pii/S2589004221002376). | |
| **License**: MIT | |
| **Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core). | |
| **Intended Use. Use cases that were envisioned during development**: Chemical research, in particular discovery and catalysts for polymerization. | |
| **Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes. | |
| **Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties. | |
| **Metrics**: N.A. | |
| **Datasets**: See description in the model versions. | |
| **Ethical Considerations**: Unclear, please consult with original authors in case of questions. | |
| **Caveats and Recommendations**: Unclear, please consult with original authors in case of questions. | |
| Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs) | |
| ## Citation | |
| ```bib | |
| @article{manica2022gt4sd, | |
| title={GT4SD: Generative Toolkit for Scientific Discovery}, | |
| author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others}, | |
| journal={arXiv preprint arXiv:2207.03928}, | |
| year={2022} | |
| } | |
| ``` | |