--- {} --- ## References to read [1]: Prein, T., Pan, E., Doerr, T., Olivetti, E., & Rupp, J. L. M. 2024. **MTEncoder: A Transformer-Based Framework for Materials Representation Learning.** *Materials Today*. Available at chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://openreview.net/pdf?id=wug7i3O7y1
[2] Schmidt, Jonathan, Hai-Chen Wang, Tiago F.T. Cerqueira, Silvana Botti, Aldo H. Romero, and Miguel A.L. Marques. 2024. "Improving Machine-Learning Models in Materials Science through Large Datasets." Journal Name (To be determined). https://www.sciencedirect.com/science/article/pii/S2542529324002360. # MTEncoder (SyntMTE) ## Overview MTEncoder is a transformer-based model for encoding materials’ elemental compositions into dense vector representations. Each material is tokenized into: - Individual element tokens (e.g., Na, Fe, O) - A special `Compound` token (`[CPD]`) that aggregates elemental information These tokens are fed into a transformer encoder, which produces context-rich embeddings. The embedding of the `[CPD]` token serves as the learned representation of the material and is passed through an MLP head to predict various properties[1]. ## Pretraining Tasks MTEncoder is pretrained on the Alexandria dataset [2] across 12 tasks: | Pretraining Objective | |----------------------------------------------| | Stress | | Band Gap (Direct) | | Band Gap (Indirect) | | Density of States at Fermi Level | | Energy Above Hull | | Formation Energy | | Corrected Total Energy | | Phase Separation Energy | | Number of Atomic Sites | | Total Magnetic Moment | | Crystal Space Group | | Masked Element Reconstruction (Self-Supervised) | *Table: Pretraining objectives for MTEncoder (drawn from the Alexandria materials dataset).*