add example training script
Browse files- 2023-08-14-mace-universal.sbatch +59 -0
- README.md +8 -2
2023-08-14-mace-universal.sbatch
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
#SBATCH -C gpu
|
| 3 |
+
#SBATCH -G 40
|
| 4 |
+
#SBATCH -N 10
|
| 5 |
+
#SBATCH --ntasks=40
|
| 6 |
+
#SBATCH --ntasks-per-node=4
|
| 7 |
+
#SBATCH --cpus-per-task=4
|
| 8 |
+
#SBATCH --time=6:00:00
|
| 9 |
+
#SBATCH --time-min=02:00:00
|
| 10 |
+
#SBATCH --error=%x-%j.err
|
| 11 |
+
#SBATCH --output=%x-%j.out
|
| 12 |
+
#SBATCH --requeue
|
| 13 |
+
#SBATCH --exclusive
|
| 14 |
+
#SBATCH --open-mode=append
|
| 15 |
+
|
| 16 |
+
exp_name=$(basename "$SLURM_SUBMIT_DIR")
|
| 17 |
+
|
| 18 |
+
srun python run_train.py \
|
| 19 |
+
--name=$exp_name \
|
| 20 |
+
--train_file="train.h5" \
|
| 21 |
+
--valid_file="valid.h5" \
|
| 22 |
+
--statistics_file="statistics.json" \
|
| 23 |
+
--energy_weight=1 \
|
| 24 |
+
--forces_weight=1 \
|
| 25 |
+
--eval_interval=1 \
|
| 26 |
+
--config_type_weights='{"Default":1.0}' \
|
| 27 |
+
--E0s='average' \
|
| 28 |
+
--error_table='PerAtomMAE' \
|
| 29 |
+
--stress_key='stress' \
|
| 30 |
+
--model="ScaleShiftMACE" \
|
| 31 |
+
--MLP_irreps="64x0e" \
|
| 32 |
+
--interaction_first="RealAgnosticResidualInteractionBlock" \
|
| 33 |
+
--interaction="RealAgnosticResidualInteractionBlock" \
|
| 34 |
+
--num_interactions=2 \
|
| 35 |
+
--num_channels=128 \
|
| 36 |
+
--max_ell=3 \
|
| 37 |
+
--hidden_irreps='64x0e + 64x1o + 64x2e' \
|
| 38 |
+
--num_cutoff_basis=10 \
|
| 39 |
+
--lr=1e-2 \
|
| 40 |
+
--correlation=3 \
|
| 41 |
+
--r_max=6.0 \
|
| 42 |
+
--num_radial_basis=10 \
|
| 43 |
+
--scaling='rms_forces_scaling' \
|
| 44 |
+
--distributed \
|
| 45 |
+
--num_workers=4 \
|
| 46 |
+
--batch_size=10 \
|
| 47 |
+
--valid_batch_size=30 \
|
| 48 |
+
--max_num_epochs=500 \
|
| 49 |
+
--patience=250 \
|
| 50 |
+
--amsgrad \
|
| 51 |
+
--weight_decay=1e-8 \
|
| 52 |
+
--ema \
|
| 53 |
+
--ema_decay=0.999 \
|
| 54 |
+
--default_dtype="float32"\
|
| 55 |
+
--clip_grad=100 \
|
| 56 |
+
--device=cuda \
|
| 57 |
+
--seed=3 \
|
| 58 |
+
--save_cpu \
|
| 59 |
+
--restart_latest &
|
README.md
CHANGED
|
@@ -79,11 +79,17 @@ If you use the pretrained models in this repository, please cite all the followi
|
|
| 79 |
}
|
| 80 |
```
|
| 81 |
|
| 82 |
-
# Training
|
| 83 |
|
| 84 |
## Training Data
|
| 85 |
|
| 86 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
-
## Training Procedure
|
|
|
|
| 79 |
}
|
| 80 |
```
|
| 81 |
|
| 82 |
+
# Training Guide
|
| 83 |
|
| 84 |
## Training Data
|
| 85 |
|
| 86 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 87 |
|
| 88 |
+
For now, please download MPTrj data from [figshare](https://figshare.com/articles/dataset/Materials_Project_Trjectory_MPtrj_Dataset/23713842). We may upload to HuggingFace Datasets in the future.
|
| 89 |
+
|
| 90 |
+
## Fine-tuning
|
| 91 |
+
|
| 92 |
+
<!-- This should link to a Training Procedure Card, perhaps with a short stub of information on what the training procedure is all about as well as documentation related to hyperparameters or additional training details. -->
|
| 93 |
+
|
| 94 |
+
We provide an example multi-GPU training script [2023-08-14-mace-universal.sbatch]([2023-08-14-mace-universal.model](https://huggingface.co/cyrusyc/mace-universal/blob/main/2023-08-14-mace-universal.sbatch)), which uses 40 A100s on NERSC Perlmutter. Please see MACE `multi-gpu` branch for more detailed instructions.
|
| 95 |
|
|
|