Update README.md

350b5c6 verified 7 months ago

2.07 kB

	---
	{}
	---

	## References to read

	[1]: Prein, T., Pan, E., Doerr, T., Olivetti, E., & Rupp, J. L. M. 2024. MTEncoder: A Transformer-Based Framework for Materials Representation Learning. Materials Today. Available at chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://openreview.net/pdf?id=wug7i3O7y1
	<br>
	[2] Schmidt, Jonathan, Hai-Chen Wang, Tiago F.T. Cerqueira, Silvana Botti, Aldo H. Romero, and Miguel A.L. Marques. 2024. "Improving Machine-Learning Models in Materials Science through Large Datasets." Journal Name (To be determined). https://www.sciencedirect.com/science/article/pii/S2542529324002360.

	# MTEncoder (SyntMTE)

	## Overview

	MTEncoder is a transformer-based model for encoding materials’ elemental compositions into dense vector representations. Each material is tokenized into:

	- Individual element tokens (e.g., Na, Fe, O)
	- A special `Compound` token (`[CPD]`) that aggregates elemental information

	These tokens are fed into a transformer encoder, which produces context-rich embeddings. The embedding of the `[CPD]` token serves as the learned representation of the material and is passed through an MLP head to predict various properties[1].

	## Pretraining Tasks

	MTEncoder is pretrained on the Alexandria dataset [2] across 12 tasks:

	\| Pretraining Objective \|
	\|----------------------------------------------\|
	\| Stress \|
	\| Band Gap (Direct) \|
	\| Band Gap (Indirect) \|
	\| Density of States at Fermi Level \|
	\| Energy Above Hull \|
	\| Formation Energy \|
	\| Corrected Total Energy \|
	\| Phase Separation Energy \|
	\| Number of Atomic Sites \|
	\| Total Magnetic Moment \|
	\| Crystal Space Group \|
	\| Masked Element Reconstruction (Self-Supervised) \|

	Table: Pretraining objectives for MTEncoder (drawn from the Alexandria materials dataset).

	---
	{}
	---

	## References to read

	[1]: Prein, T., Pan, E., Doerr, T., Olivetti, E., & Rupp, J. L. M. 2024. MTEncoder: A Transformer-Based Framework for Materials Representation Learning. Materials Today. Available at chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://openreview.net/pdf?id=wug7i3O7y1
	<br>
	[2] Schmidt, Jonathan, Hai-Chen Wang, Tiago F.T. Cerqueira, Silvana Botti, Aldo H. Romero, and Miguel A.L. Marques. 2024. "Improving Machine-Learning Models in Materials Science through Large Datasets." Journal Name (To be determined). https://www.sciencedirect.com/science/article/pii/S2542529324002360.

	# MTEncoder (SyntMTE)

	## Overview

	MTEncoder is a transformer-based model for encoding materials’ elemental compositions into dense vector representations. Each material is tokenized into:

	- Individual element tokens (e.g., Na, Fe, O)
	- A special `Compound` token (`[CPD]`) that aggregates elemental information

	These tokens are fed into a transformer encoder, which produces context-rich embeddings. The embedding of the `[CPD]` token serves as the learned representation of the material and is passed through an MLP head to predict various properties[1].

	## Pretraining Tasks

	MTEncoder is pretrained on the Alexandria dataset [2] across 12 tasks:

	\| Pretraining Objective \|
	\|----------------------------------------------\|
	\| Stress \|
	\| Band Gap (Direct) \|
	\| Band Gap (Indirect) \|
	\| Density of States at Fermi Level \|
	\| Energy Above Hull \|
	\| Formation Energy \|
	\| Corrected Total Energy \|
	\| Phase Separation Energy \|
	\| Number of Atomic Sites \|
	\| Total Magnetic Moment \|
	\| Crystal Space Group \|
	\| Masked Element Reconstruction (Self-Supervised) \|

	Table: Pretraining objectives for MTEncoder (drawn from the Alexandria materials dataset).