polymer_blocks

Sleeping

App Files Files Community

polymer_blocks / model_cards /article.md

drugilsberg

feat: updating model card.

9c92b60 almost 3 years ago

preview code

raw

history blame contribute delete

3.26 kB

	# Model documentation & parameters

	Algorithm Version: Which model version to use.

	Maximal sequence length: The maximal number of SMILES tokens in the generated molecule.

	Number of samples: How many samples should be generated (between 1 and 50).



	# Model card -- PolymerBlocks

	Model Details: PolymerBlocks is a sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers). The model relies on a Variational Autoencoder architecture as described in [Born et al. (2021; iScience)](https://www.sciencedirect.com/science/article/pii/S2589004221002376).

	Developers: Matteo Manica and colleagues from IBM Research.

	Distributors: Original authors' code integrated into GT4SD.

	Model date: Not yet published.

	Model version: Only initial model version. The model has been pre-trained on 500K compounds from PubChem and further fine-tuned on the SMILES representing monomers and catalysts collected in the database presented in [Park et al. (2022)](https://doi.org/10.26434/chemrxiv-2022-811rl).

	Model type: A sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers).

	Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: the sequence-based model is a standard GRU-based VAE trained to reconstruct SMILES representation of molecules. Given the nature of the pre-training and fine-tuning data, the model is biased to create molecules that resemble catalysts and monomers employed in ring-opening polymerization.

	Paper or other resource for more information: Details on the model used and code can be found in [Born et al. (2021; iScience)](https://www.sciencedirect.com/science/article/pii/S2589004221002376).

	License: MIT

	Where to send questions or comments about the model: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).

	Intended Use. Use cases that were envisioned during development: Chemical research, in particular discovery and catalysts for polymerization.

	Primary intended uses/users: Researchers and computational chemists using the model for model comparison or research exploration purposes.

	Out-of-scope use cases: Production-level inference, producing molecules with harmful properties.

	Metrics: N.A.

	Datasets: See description in the model versions.

	Ethical Considerations: Unclear, please consult with original authors in case of questions.

	Caveats and Recommendations: Unclear, please consult with original authors in case of questions.

	Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)

	## Citation

	```bib
	@article{manica2022gt4sd,
	title={GT4SD: Generative Toolkit for Scientific Discovery},
	author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
	journal={arXiv preprint arXiv:2207.03928},
	year={2022}
	}
	```

	# Model documentation & parameters

	Algorithm Version: Which model version to use.

	Maximal sequence length: The maximal number of SMILES tokens in the generated molecule.

	Number of samples: How many samples should be generated (between 1 and 50).



	# Model card -- PolymerBlocks

	Model Details: PolymerBlocks is a sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers). The model relies on a Variational Autoencoder architecture as described in [Born et al. (2021; iScience)](https://www.sciencedirect.com/science/article/pii/S2589004221002376).

	Developers: Matteo Manica and colleagues from IBM Research.

	Distributors: Original authors' code integrated into GT4SD.

	Model date: Not yet published.

	Model version: Only initial model version. The model has been pre-trained on 500K compounds from PubChem and further fine-tuned on the SMILES representing monomers and catalysts collected in the database presented in [Park et al. (2022)](https://doi.org/10.26434/chemrxiv-2022-811rl).

	Model type: A sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers).

	Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: the sequence-based model is a standard GRU-based VAE trained to reconstruct SMILES representation of molecules. Given the nature of the pre-training and fine-tuning data, the model is biased to create molecules that resemble catalysts and monomers employed in ring-opening polymerization.

	Paper or other resource for more information: Details on the model used and code can be found in [Born et al. (2021; iScience)](https://www.sciencedirect.com/science/article/pii/S2589004221002376).

	License: MIT

	Where to send questions or comments about the model: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).

	Intended Use. Use cases that were envisioned during development: Chemical research, in particular discovery and catalysts for polymerization.

	Primary intended uses/users: Researchers and computational chemists using the model for model comparison or research exploration purposes.

	Out-of-scope use cases: Production-level inference, producing molecules with harmful properties.

	Metrics: N.A.

	Datasets: See description in the model versions.

	Ethical Considerations: Unclear, please consult with original authors in case of questions.

	Caveats and Recommendations: Unclear, please consult with original authors in case of questions.

	Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)

	## Citation

	```bib
	@article{manica2022gt4sd,
	title={GT4SD: Generative Toolkit for Scientific Discovery},
	author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
	journal={arXiv preprint arXiv:2207.03928},
	year={2022}
	}
	```