|
|
--- |
|
|
{} |
|
|
--- |
|
|
|
|
|
## References to read |
|
|
|
|
|
[1]: Prein, T., Pan, E., Doerr, T., Olivetti, E., & Rupp, J. L. M. 2024. **MTEncoder: A Transformer-Based Framework for Materials Representation Learning.** *Materials Today*. Available at chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://openreview.net/pdf?id=wug7i3O7y1 |
|
|
<br> |
|
|
[2] Schmidt, Jonathan, Hai-Chen Wang, Tiago F.T. Cerqueira, Silvana Botti, Aldo H. Romero, and Miguel A.L. Marques. 2024. "Improving Machine-Learning Models in Materials Science through Large Datasets." Journal Name (To be determined). https://www.sciencedirect.com/science/article/pii/S2542529324002360. |
|
|
|
|
|
# MTEncoder (SyntMTE) |
|
|
|
|
|
## Overview |
|
|
|
|
|
MTEncoder is a transformer-based model for encoding materials’ elemental compositions into dense vector representations. Each material is tokenized into: |
|
|
|
|
|
- Individual element tokens (e.g., Na, Fe, O) |
|
|
- A special `Compound` token (`[CPD]`) that aggregates elemental information |
|
|
|
|
|
These tokens are fed into a transformer encoder, which produces context-rich embeddings. The embedding of the `[CPD]` token serves as the learned representation of the material and is passed through an MLP head to predict various properties[1]. |
|
|
|
|
|
## Pretraining Tasks |
|
|
|
|
|
MTEncoder is pretrained on the Alexandria dataset [2] across 12 tasks: |
|
|
|
|
|
| Pretraining Objective | |
|
|
|----------------------------------------------| |
|
|
| Stress | |
|
|
| Band Gap (Direct) | |
|
|
| Band Gap (Indirect) | |
|
|
| Density of States at Fermi Level | |
|
|
| Energy Above Hull | |
|
|
| Formation Energy | |
|
|
| Corrected Total Energy | |
|
|
| Phase Separation Energy | |
|
|
| Number of Atomic Sites | |
|
|
| Total Magnetic Moment | |
|
|
| Crystal Space Group | |
|
|
| Masked Element Reconstruction (Self-Supervised) | |
|
|
|
|
|
*Table: Pretraining objectives for MTEncoder (drawn from the Alexandria materials dataset).* |