Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,6 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-4.0
|
| 3 |
---
|
| 4 |
-
|
| 5 |
# Description
|
| 6 |
|
| 7 |
DPA-3.2-5M is trained using a multitask strategy on the [OpenLAM datasets V2](https://www.aissquare.com/datasets/detail?pageType=datasets&name=OpenLAM-TrainingSet-v2&id=391).
|
|
@@ -121,31 +120,44 @@ For more advanced usages like fine-tuning and zero-shot inference, please refer
|
|
| 121 |
|
| 122 |
*Note: If you are unsure which model branch to use, we recommend starting with OMat24, which performs well across a wide range of systems, including inorganic materials, catalytic systems, and molecular systems. If the accuracy of the Omat24 branch does not meet your needs for catalytic applications, consider trying the OC20M branch. For molecular systems, we recommend the OMol25 branch. For 2D inorganic materials, we recommend the Alex2D branch, and for inorganic materials under high pressure, the MPGen_OpenCSP branch. For alloy systems, we recommend the Alloy_APEX branch. *
|
| 123 |
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
|
| 138 |
-
|
|
| 139 |
-
|
|
| 140 |
-
|
|
| 141 |
-
|
|
| 142 |
-
|
|
| 143 |
-
|
|
| 144 |
-
|
|
| 145 |
-
|
|
| 146 |
-
|
|
| 147 |
-
|
|
| 148 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
|
| 150 |
```
|
| 151 |
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-4.0
|
| 3 |
---
|
|
|
|
| 4 |
# Description
|
| 5 |
|
| 6 |
DPA-3.2-5M is trained using a multitask strategy on the [OpenLAM datasets V2](https://www.aissquare.com/datasets/detail?pageType=datasets&name=OpenLAM-TrainingSet-v2&id=391).
|
|
|
|
| 120 |
|
| 121 |
*Note: If you are unsure which model branch to use, we recommend starting with OMat24, which performs well across a wide range of systems, including inorganic materials, catalytic systems, and molecular systems. If the accuracy of the Omat24 branch does not meet your needs for catalytic applications, consider trying the OC20M branch. For molecular systems, we recommend the OMol25 branch. For 2D inorganic materials, we recommend the Alex2D branch, and for inorganic materials under high pressure, the MPGen_OpenCSP branch. For alloy systems, we recommend the Alloy_APEX branch. *
|
| 122 |
|
| 123 |
+
One can use model branch alias as shown bellow
|
| 124 |
+
|
| 125 |
+
```bash
|
| 126 |
+
dp --pt freeze -c DPA-3.2-5M.pt -o frozen_model.pth --model-branch water
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
which is equivalent to
|
| 130 |
+
|
| 131 |
+
```bash
|
| 132 |
+
dp --pt freeze -c DPA-3.2-5M.pt -o frozen_model.pth --model-branch H2O_H2O-PD
|
| 133 |
+
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
| Model Branch | Alias | Element Coverage | Description | Computation Level | References |
|
| 137 |
+
| ------------------- | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 138 |
+
| OMat24 | "Default","Materials", "Omat24", "materials", "omat24" | H, He, Li, Be, B, C, N, O, F, Ne, Na, Mg, Al, Si, P, S, Cl, Ar, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Br, Kr, Rb, Sr, Y, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Cd, In, Sn, Sb, Te, I, Xe, Cs, Ba, La, Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, Hf, Ta, W, Re, Os, Ir, Pt, Au, Hg, Tl, Pb, Bi, Ac, Th, Pa, U, Np, Pu | OMat24 is a large-scale open dataset containing over 110 million DFT calculations spanning diverse structures and compositions. It is designed to support AI-driven materials discovery by providing broad and deep coverage of chemical space. | PBE (+U)/PAW | https://arxiv.org/html/2410.12771v1 |
|
| 139 |
+
| Alloy\_APEX | "Alloys","Alloy_tongqi","Li2025APEX" ,"alloys" | Li, Be, Na, Mg, Al, Si, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, Sr, Y, Zr, Nb, Mo, Ru, Rh, Pd, Ag, Cd, In, Sn, La, Ce, Pr, Nd, Sm, Gd, Tb, Dy, Ho, Er, Tm, Lu, Hf, Ta, W, Re, Os, Ir, Pt, Au, Pb | The dataset covers elemental and multi-component alloy systems composed of 53 metallic elements, including structures with defects such as vacancies, interstitials, and surfaces. These configurations are generated for single elements, compounds, solid solutions, and their defects over 50–3000 K and −0.5–5 GPa. | PBE/Norm-conserving, 1360 eV, 0.15̊A−1 | https://www.nature.com/articles/s41524-025-01580-y |
|
| 140 |
+
| Alex2D | "2DMaterials", "2dmaterials" | H, Li, Be, B, C, N, O, F, Na, Mg, Al, Si, P, S, Cl, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Br, Rb, Sr, Y, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Cd, In, Sn, Sb, Te, I, Cs, Ba, La, Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, Hf, Ta, W, Re, Os, Ir, Pt, Au, Hg, Tl, Pb, Bi, Ac, Th, Pa, U, Np, Pu | This dataset contains approximately 6,500 novel two-dimensional materials generated through a symmetry-based combinatorial method that systematically fills Wyckoff positions under constraints of charge neutrality and electronegativity balance. The resulting structures span over 30 stoichiometries, exhibit diverse tiling patterns and polymorphisms, and all lie within 250 meV/atom of the thermodynamic convex hull. | PBE/PAW, 520 eV, 0.4̊A−1 | https://iopscience.iop.org/article/10.1088/2053-1583/accc43 |
|
| 141 |
+
| OC20M | "Catalysis","catalysis" | H, B, C, N, O, Na, Al, Si, P, S, Cl, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Rb, Sr, Y, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Cd, In, Sn, Sb, Te, Cs, Hf, Ta, W, Re, Os, Ir, Pt, Au, Hg, Tl, Pb, Bi | The OC20M is a subset of OC20 dataset, which contains over 1.2 million DFT relaxations and approximately 265 million single-point evaluations covering diverse catalyst surfaces and adsorbates involving C, N, and O species. | rPBE/PAW, 350 eV | Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, et al. Open catalyst 2020 (oc20) dataset and community challenges. ACS Catalysis, 11(10):6059–6072, 2021. |
|
| 142 |
+
| ODAC23 | | H, Li, Be, B, C, N, O, F, Na, Mg, Al, Si, P, S, Cl, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Br, Sr, Y, Zr, Nb, Mo, Ru, Rh, Pd, Ag, Cd, Sn, Sb, Te, I, Cs, Ba, La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Lu, Hf, W, Re, Pt, Au, Hg, Bi, Th, U, Np | The dataset contains over 38 million quantum chemistry calculations on thousands of metal-organic frameworks (MOFs) interacting with carbon dioxide and water. It provides comprehensive data to support machine learning-driven development of MOFs for direct air capture (DAC) applications. | PBE-D3/PAW, 600 eV, gamma only | https://pubs.acs.org/doi/10.1021/acscentsci.3c01629 |
|
| 143 |
+
| Domains\_Alloy | "Dai2023Alloy" | Li, Be, Na, Mg, Al, Si, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, Sr, Y, Zr, Nb, Mo, Ru, Rh, Pd, Ag, Cd, In, Sn, La, Ce, Pr, Nd, Sm, Gd, Tb, Dy, Ho, Er, Tm, Lu, Hf, Ta, W, Re, Os, Ir, Pt, Au, Pb | The dataset contains structure-energy-force-virial data for 53 typical metallic elements in alloy systems, including\~9000 intermetallic compounds and FCC, BCC, HCP structures. It consists of two parts: DFT-generated relaxed and deformed structures, and randomly distorted structures produced covering pure metals, solid solutions, and intermetallics with vacancies. | PBE/Norm-conserving, 1360 eV, 0.094̊A−1 | https://aissquare.com/datasets/detail?pageType=datasets&name=Alloy\_DPA\_v1\_0&id=147 |
|
| 144 |
+
| OC22 | | H, Li, Be, C, N, O, Na, Mg, Al, Si, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Rb, Sr, Y, Zr, Nb, Mo, Ru, Rh, Pd, Ag, Cd, In, Sn, Sb, Te, Cs, Ba, Ce, Lu, Hf, Ta, W, Re, Os, Ir, Pt, Au, Hg, Tl, Pb, Bi | The OC22 dataset contains DFT relaxation trajectories and single-point calculations covering a diverse set of oxide materials, adsorbates, and coverages relevant to Oxygen Evolution Reaction catalysts. It provides a large-scale, open benchmark for training machine learning models on total energy and forces predictions for oxide electrocatalysts. | PBE (+U)/PAW, 500 eV | https://pubs.acs.org/doi/10.1021/acscatal.2c05426 |
|
| 145 |
+
| MPTrj | "MP_traj_v024_alldata_mixu" | H, He, Li, Be, B, C, N, O, F, Ne, Na, Mg, Al, Si, P, S, Cl, Ar, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Br, Kr, Rb, Sr, Y, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Cd, In, Sn, Sb, Te, I, Xe, Cs, Ba, La, Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, Hf, Ta, W, Re, Os, Ir, Pt, Au, Hg, Tl, Pb, Bi, Ac, Th, Pa, U, Np, Pu | The MPtrj dataset contains DFT trajectory data for 145,923 compounds from the Materials Project, curated by filtering GGA and GGA+U calculations for consistency, convergence, and energy quality. It includes non-deprecated, non-duplicate structures with verified settings, enabling reliable machine learning on energy and force predictions across a broad materials space. | PBE (+U)/PAW, 520 eV, 0.04̊A−1 | https://www.nature.com/articles/s42256-023-00716-3 |
|
| 146 |
+
| Organic\_Reactions | "Li2025General" | H, C, N, O | The dataset consists of over 17 million semi-empirical energy-labeled non-equilibrium structures along reaction pathways involving C, H, O, and N, generated using NEB and structural alignment. It is complemented by a fine-tuning dataset of over 200,000 DFT-labeled structures selected via active learning to support the development of reactive machine learning potentials. | GFN2-xTB | Li B, Mi S, Xiao J, Zhang D, Zhang S, Zhang J, et al. General reactive machine learning potentials for CHON elements. ChemRxiv. 2025; doi:10.26434/chemrxiv-2025-1d293-v2 This content is a preprint and has not been peer-reviewed. |
|
| 147 |
+
| SSE\_ABACUS | "Shi2024SSE" | Li, B, O, Al, Si, P, S, Cl, Sc, Ga, Ge, As, Se, Br, Y, Zr, In, Sn, Sb, I, Dy, Ho, Er, Tm, Yb, Lu, Ta | This dataset can be used to study and predict the properties and behavior of solid-state electrolytes under various conditions, such as different temperatures and pressures. This can help researchers and engineers design better materials for use in energy storage devices, such as batteries and supercapacitors. | PBE-sol, 100Ry | https://aissquare.com/datasets/detail?pageType=datasets&name=SSE-abacus&id=260 |
|
| 148 |
+
| Domains\_SSE\_PBE | "Huang2021Deep-PBE" | Li, Si, P, S, Ge, Sn | The dataset consists of interatomic potentials and simulation data for Li10GeP2S12-type solid-state electrolytes, including Li10GeP2S12, Li10SiP2S12, and Li10SnP2S12. It covers diffusion processes across a wide temperature range and large system sizes, incorporating effects of thermal expansion, configurational disorder, and density functional variations. | PBE/PAW, 650 eV, 0.26̊A−1 | Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, and Weinan E. Deep potential generation scheme and simulation protocol for the li10gep2s12-type superionic conductors. The Journal of Chemical Physics, 154(9):094703, 2021. |
|
| 149 |
+
| Electrolyte | "Shi2024Electrolyte" | H, Li, C, O, F, P | The dataset includes samples of systems containing Li, P, F, C, H, and O, specifically focused on lithium hexafluorophosphate and various carbonate solvents. It covers a wide range of temperatures from 0 to 450 K and pressures from 0 to 1 GPa, with LiPF6 concentrations between 0.8 and 1.2 mol/L. | PBE-D3, 800 Ry | https://www.aissquare.com/datasets/detail?name=Electrolyte&id=216&pageType=datasets |
|
| 150 |
+
| Domains\_SemiCond | "Liu2024Machine" | B, C, N, Al, Si, P, S, Zn, Ga, Ge, As, Se, Cd, In, Sb, Te | For 19 semiconductors ranging from group IIB to VIA, including Si, Ge, SiC, BAs, BN, AlN, AlP, AlAs, InP, InAs, InSb, GaN, GaP, GaAs, CdTe, InTe, CdSe, ZnS, and CdS. | PBE/LCAO, 1360 eV, 0.151̊A−1 | https://pubs.acs.org/doi/full/10.1021/acs.jctc.3c01320 |
|
| 151 |
+
| Domains\_Anode | "Zhang2023Cathode" | Fe, Co, Li, O, Cr, Ni, Na, Mn | The dataset contains O3-type layered oxide cathodes (LixTMO2 and NaxTMO2, TM = Ni, Mn, Fe, Co, Cr) generated from\~300 bulk systems. It spans compositions with x = 0, 0.5, 1, includes Jahn-Teller distorted structures, and covers temperatures from 50 K to 1250 K and pressures from 0 to 3000 bar. | PBE (+U)/PAW, 520 eV, 0.25̊A−1 | https://aissquare.com/datasets/detail?pageType=datasets&name=Cathode%28Anode%29\_DPA\_v1\_0&id=130 |
|
| 152 |
+
| Domains\_Cluster | "Gong2023Cluster" | Al, Si, Ni, Cu, Ru, Pd, Ag, Pt, Au | The dataset contains structure-to-energy-force labels for 31 mono- and multi-metallic clusters generated. It covers diverse elemental combinations relevant to catalysis, namely, Au, Ag, Cu, Pt, Pd, Ni, Si, Al, Ru, AuAg, AuCu, AgCu, AuPt, AgPt, CuPt, AuPd, AgPd, CuPd, AuNi, AgNi, CuNi, PtPd, PtNi, NiPd, AgCuPt AuAgCu, AuAgPd, AuAgPt, AuCuPd, AuCuPt, PtPdNi. | PBE-D3/TZV2P, 400-1000 Ry | https://aissquare.com/datasets/detail?pageType=datasets&name=Cluster\_DPA\_v1\_0&id=131 |
|
| 153 |
+
| Hybrid\_Perovskite | "Tuo2023Hybrid" | H, C, N, I, Pb | Dataset for lead-based organic-inorganic hybrid perovskite MAPbI3 and FAPbI3. The cover temperature range is 50\~800K. The covered pressure range is below 1GPa. | PBE-D3(BJ)/PAW, 500 eV, 0.16̊A−1 | [https://doi.org/10.1002/adfm.202301663](https://doi.org/10.1002/adfm.202301663) |
|
| 154 |
+
| Domains\_FerroEle | "UniPero" | O, Na, Mg, K, Ca, Ti, Zn, Sr, Zr, Nb, In, Ba, Hf, Pb, Bi | The dataset includes perovskite oxides with increasing chemical complexity, ranging from simple three-element systems like PbTiO3, BaTiO3, and SrTiO3 to complex solid solutions such as Pb(Mg1/3Nb2/3)O3 and PIN-PMN-PT. It covers around 200 compositions involving 14 different metal elements. | PBEsol/LCAO, 100\~Ry, 0.189̊A−1 | Jing Wu, Jiyuan Yang, Yuan-Jinsheng Liu, Duo Zhang, Yudi Yang, Yuzhi Zhang, Linfeng Zhang, Shi Liu, et al. Universal interatomic potential for perovskite oxides. Physical Review B, 108(18):L180104, 2023. |
|
| 155 |
+
| H2O\_H2O\_PD | "water", "Zhang2021Phase" | H, O | Water phase diagram, covering from low temperature and pressure to about 2400 K and 50 GPa, excluding the vapor stability region. | SCAN/PAW, 1500 eV, 0.5̊A−1 | Linfeng Zhang, Han Wang, Roberto Car, and Weinan E. Phase diagram of a deep potential water model. Physical review letters, 126(23):236001, 2021. |
|
| 156 |
+
| Others\_In2Se3 | "Wu2021Accurate" | Se, In | The dataset contains diverse monolayer α‑In2Se3 configurations, which supports the development of a high-accuracy deep neural network potential capable of reproducing thermodynamic properties, polarization switching pathways, domain-wall kinetics, and a temperature-driven phase transition. | PBE/PAW, 600 eV, 2\*2\*2 kpt | Jing Wu, Liyi Bai, Jiawei Huang, Liyang Ma, Jian Liu, and Shi Liu. Accurate force field of two-dimensional ferroelectrics from deep learning. Physical Review B, 104(17):174107, 2021. |
|
| 157 |
+
| Metals\_AlMgCu | "Metals_AlMgCu","Jiang2021Accurate" | Al, Mg, Cu | For Al-Mg-Cu alloy. | PBE/PAW, 650 eV, 0.1̊A−1 | Wanrun Jiang, Yuzhi Zhang, Linfeng Zhang, and Han Wang. Accurate deep potential model for the al–cu–mg alloy in the full concentration space. Chinese Physics B, 30(5):050706, 2021. |
|
| 158 |
+
| Metals\_AgAu\_PBED3 | "Wang2021Generalizable" | Ag, Au | Ag-Au nanoalloys. | PBE-D3/PAW, 650 eV, 0.1̊A−1 | YiNan Wang, LinFeng Zhang, Ben Xu, XiaoYang Wang, and Han Wang. A generalizable machine learning potential of ag–au nanoalloys and its application to surface reconstruction, segregation and diffusion. Modelling and Simulation in Materials Science and Engineering, 30(2):025003, 2021. |
|
| 159 |
+
| MPGen\_OpenCSP | "OpenCSP" , "opencsp" | H, He, Li, Be, B, C, N, O, F, Ne, Na, Mg, Al, Si, P, S, Cl, Ar, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Br, Kr, Rb, Sr, Y, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Cd, In, Sn, Sb, Te, I, Xe, Cs, Ba, La, Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, Hf, Ta, W, Re, Os, Ir, Pt, Au, Hg, Tl, Pb, Bi | The OpenCSP dataset was constructed by relaxing CALYPSO-proposed structures to pressure-constrained local minima on the potential energy surface, ensuring direct applicability to CSP tasks. All data were obtained through single-point ABACUS calculations, with structures generated using the DP-GEN concurrent learning framework, which samples relaxation trajectories under randomly sampled pressure conditions. | PBE/PAW, 1360 eV, 0.15 | [https://arxiv.org/abs/2509.10293](https://arxiv.org/abs/2509.10293) |
|
| 160 |
+
| OMol25 | "Molecules" ,"molecules", "omol25" | H, He, Li, Be, B, C, N, O, F, Ne, Na, Mg, Al, Si, P, S, Cl, Ar, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Br, Kr, Rb, Sr, Y, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Cd, In, Sn, Sb, Te, I, Xe, Cs, Ba, La, Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, Hf, Ta, W, Re, Os, Ir, Pt, Au, Hg, Tl, Pb, Bi | Open Molecules 2025 (OMol25) Dataset is a large-scale resource for training molecular chemistry machine learning models. OMol25 comprises over 100 million DFT single-point calculations containing up to 350 atoms at a high level of DFT theory. | ωB97M-V/def2-TZVPD | [https://arxiv.orgabs/2505.08762](https://arxiv.orgabs/2505.08762) |
|
| 161 |
|
| 162 |
```
|
| 163 |
|