| # ViSNet | |
| ## Reference | |
| Yusong Wang, Tong Wang, Shaoning Li, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao, and Tie-Yan Liu. | |
| Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. | |
| Nature Communications, 15(1), January 2024. ISSN: 2041-1723. | |
| URL: https://dx.doi.org/10.1038/s41467-023-43720-2. | |
| ## Hyperparameters, model configurations and training strategies | |
| ### Model architecture | |
| | Parameter | Value | Description | | |
| |--------------------|----------|--------------------------------------------------------------------------| | |
| | `num_layers` | `4` | Number of ViSNet layers. | | |
| | `num_channels` | `128` | Number of channels. | | |
| | `l_max` | `2` | Highest harmonic order included in the Spherical Harmonics series. | | |
| | `num_heads` | `8` | Number of heads in the attention block. | | |
| | `num_rbf` | `32` | Number of radial basis functions in the embedding block. | | |
| | `trainable_rbf` | `False` | Whether to add learnable weights to the radial embedding basis functions.| | |
| | `activation` | `silu` | Activation function for the output block. | | |
| | `attn_activation` | `silu` | Activation function for the attention block. | | |
| | `vecnorm_type` | `None` | Type of the vector norm. | | |
| | `atomic_energies` | `average`| Treatment of the atomic energies. | | |
| | `avg_um_neighbors` | `None` | Mean number of neighbors. | | |
| ### Training | |
| | Parameter | Value | Description | | |
| |--------------------------|--------|--------------------------------------------------| | |
| | `num_epochs` | `220` | Number of epochs to run. | | |
| | `ema_decay` | `0.99` | The EMA decay rate. | | |
| | `eval_num_graphs` | `None` | Number of validation set graphs to evaluate on. | | |
| | `use_ema_params_for_eval`| `True` | Whether to use the EMA parameters for evaluation.| | |
| ### Optimizer | |
| | Parameter | Value | Description | | |
| |----------------------------------|----------------|-----------------------------------------------------------------| | |
| | `init_learning_rate` | `0.0001` | Initial learning rate. | | |
| | `peak_learning_rate` | `0.0001` | Peak learning rate. | | |
| | `final_learning_rate` | `0.0001` | Final learning rate. | | |
| | `weight_decay` | `0` | Weight decay. | | |
| | `warmup_steps` | `4000` | Number of optimizer warm-up steps. | | |
| | `transition_steps` | `360000` | Number of optimizer transition steps. | | |
| | `grad_norm` | `500` | Gradient norm used for gradient clipping. | | |
| | `num_gradient_accumulation_steps`| `1` | Steps to accumulate before taking an optimizer step. | | |
| | `algorithm` | `optax.amsgrad`| The AMSGrad optimizer. | | |
| | `b1` | `0.9` | Exponential decay rate to track first moment of past gradients. | | |
| | `b2` | `0.999` | Exponential decay rate to track second moment of past gradients.| | |
| | `eps` | `1e-8` | Constant applied to denominator outside the square root. | | |
| | `eps_root` | `0.0` | Constant applied to denominator inside the square root. | | |
| ### Huber Loss Energy weight schedule | |
| | Parameter | Value | Description | | |
| |-----------------------|------------------------------------|-------------------------------------------------------------------------------------------------| | |
| | `schedule` | `optax.piecewise_constant_schedule`| Piecewise constant schedule with scaled jumps at specific boundaries. | | |
| | `init_value` | `40` | Initial value. | | |
| | `boundaries_and_scale`| `{115: 25}` | Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.| | |
| ### Huber Loss Force weight schedule | |
| | Parameter | Value | Description | | |
| |-----------------------|------------------------------------|-------------------------------------------------------------------------------------------------| | |
| | `schedule` | `optax.piecewise_constant_schedule`| Piecewise constant schedule with scaled jumps at specific boundaries. | | |
| | `init_value` | `1000` | Initial value. | | |
| | `boundaries_and_scale`| `{115: 0.04}` | Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.| | |
| ### Dataset | |
| | Parameter | Value | Description | | |
| |-----------------------------|-------|--------------------------------------------| | |
| | `graph_cutoff_angstrom` | `5` | Graph cutoff distance (in Å). | | |
| | `max_n_node` | `32` | Maximum number of nodes allowed in a batch.| | |
| | `max_n_edge` | `288` | Maximum number of edges allowed in a batch.| | |
| | `batch_size` | `16` | Number of graphs in a batch. | | |
| This model was trained on the [SPICE2_curated dataset](https://huggingface.co/datasets/InstaDeepAI/SPICE2-curated). | |
| ## How to Use | |
| For complete usage instructions and more information, please refer to our [documentation](https://instadeep.github.io/mlip) | |
| ## License summary | |
| 1. The Licensed Models are **only** available under this License for Non-Commercial Purposes. | |
| 2. You are permitted to reproduce, publish, share and adapt the Output generated by the Licensed Model only for Non-Commercial Purposes and in accordance with this License. | |
| 3. You may **not** use the Licensed Models or any of its Outputs in connection with: | |
| 1. any Commercial Purposes, unless agreed by Us under a separate licence; | |
| 2. to train, improve or otherwise influence the functionality or performance of any other third-party derivative model that is commercial or intended for a Commercial Purpose and is similar to the Licensed Models; | |
| 3. to create models distilled or derived from the Outputs of the Licensed Models, unless such models are for Non-Commercial Purposes and open-sourced under the same license as the Licensed Models; or | |
| 4. in violation of any applicable laws and regulations. | |