References to read
[1]: Prein, T., Pan, E., Doerr, T., Olivetti, E., & Rupp, J. L. M. 2024. MTEncoder: A Transformer-Based Framework for Materials Representation Learning. Materials Today. Available at chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://openreview.net/pdf?id=wug7i3O7y1
[2] Schmidt, Jonathan, Hai-Chen Wang, Tiago F.T. Cerqueira, Silvana Botti, Aldo H. Romero, and Miguel A.L. Marques. 2024. "Improving Machine-Learning Models in Materials Science through Large Datasets." Journal Name (To be determined). https://www.sciencedirect.com/science/article/pii/S2542529324002360.
MTEncoder (SyntMTE)
Overview
MTEncoder is a transformer-based model for encoding materials’ elemental compositions into dense vector representations. Each material is tokenized into:
- Individual element tokens (e.g., Na, Fe, O)
- A special
Compoundtoken ([CPD]) that aggregates elemental information
These tokens are fed into a transformer encoder, which produces context-rich embeddings. The embedding of the [CPD] token serves as the learned representation of the material and is passed through an MLP head to predict various properties[1].
Pretraining Tasks
MTEncoder is pretrained on the Alexandria dataset [2] across 12 tasks:
| Pretraining Objective |
|---|
| Stress |
| Band Gap (Direct) |
| Band Gap (Indirect) |
| Density of States at Fermi Level |
| Energy Above Hull |
| Formation Energy |
| Corrected Total Energy |
| Phase Separation Energy |
| Number of Atomic Sites |
| Total Magnetic Moment |
| Crystal Space Group |
| Masked Element Reconstruction (Self-Supervised) |
Table: Pretraining objectives for MTEncoder (drawn from the Alexandria materials dataset).
- Downloads last month
- 7