References to read

[1]: Prein, T., Pan, E., Doerr, T., Olivetti, E., & Rupp, J. L. M. 2024. MTEncoder: A Transformer-Based Framework for Materials Representation Learning. Materials Today. Available at chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://openreview.net/pdf?id=wug7i3O7y1
[2] Schmidt, Jonathan, Hai-Chen Wang, Tiago F.T. Cerqueira, Silvana Botti, Aldo H. Romero, and Miguel A.L. Marques. 2024. "Improving Machine-Learning Models in Materials Science through Large Datasets." Journal Name (To be determined). https://www.sciencedirect.com/science/article/pii/S2542529324002360.

MTEncoder (SyntMTE)

Overview

MTEncoder is a transformer-based model for encoding materials’ elemental compositions into dense vector representations. Each material is tokenized into:

Individual element tokens (e.g., Na, Fe, O)
A special Compound token ([CPD]) that aggregates elemental information

These tokens are fed into a transformer encoder, which produces context-rich embeddings. The embedding of the [CPD] token serves as the learned representation of the material and is passed through an MLP head to predict various properties[1].

Pretraining Tasks

MTEncoder is pretrained on the Alexandria dataset [2] across 12 tasks:

Pretraining Objective
Stress
Band Gap (Direct)
Band Gap (Indirect)
Density of States at Fermi Level
Energy Above Hull
Formation Energy
Corrected Total Energy
Phase Separation Energy
Number of Atomic Sites
Total Magnetic Moment
Crystal Space Group
Masked Element Reconstruction (Self-Supervised)

Table: Pretraining objectives for MTEncoder (drawn from the Alexandria materials dataset).

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support