visnet-organics / README.md

Update README.md

d71d2c3 verified 9 months ago

7.62 kB

	# ViSNet
	## Reference
	Yusong Wang, Tong Wang, Shaoning Li, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao, and Tie-Yan Liu.
	Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing.
	Nature Communications, 15(1), January 2024. ISSN: 2041-1723.
	URL: https://dx.doi.org/10.1038/s41467-023-43720-2.
	## Hyperparameters, model configurations and training strategies
	### Model architecture
	\| Parameter \| Value \| Description \|
	\|--------------------\|----------\|--------------------------------------------------------------------------\|
	\| `num_layers` \| `4` \| Number of ViSNet layers. \|
	\| `num_channels` \| `128` \| Number of channels. \|
	\| `l_max` \| `2` \| Highest harmonic order included in the Spherical Harmonics series. \|
	\| `num_heads` \| `8` \| Number of heads in the attention block. \|
	\| `num_rbf` \| `32` \| Number of radial basis functions in the embedding block. \|
	\| `trainable_rbf` \| `False` \| Whether to add learnable weights to the radial embedding basis functions.\|
	\| `activation` \| `silu` \| Activation function for the output block. \|
	\| `attn_activation` \| `silu` \| Activation function for the attention block. \|
	\| `vecnorm_type` \| `None` \| Type of the vector norm. \|
	\| `atomic_energies` \| `average`\| Treatment of the atomic energies. \|
	\| `avg_um_neighbors` \| `None` \| Mean number of neighbors. \|
	### Training
	\| Parameter \| Value \| Description \|
	\|--------------------------\|--------\|--------------------------------------------------\|
	\| `num_epochs` \| `220` \| Number of epochs to run. \|
	\| `ema_decay` \| `0.99` \| The EMA decay rate. \|
	\| `eval_num_graphs` \| `None` \| Number of validation set graphs to evaluate on. \|
	\| `use_ema_params_for_eval`\| `True` \| Whether to use the EMA parameters for evaluation.\|
	### Optimizer
	\| Parameter \| Value \| Description \|
	\|----------------------------------\|----------------\|-----------------------------------------------------------------\|
	\| `init_learning_rate` \| `0.0001` \| Initial learning rate. \|
	\| `peak_learning_rate` \| `0.0001` \| Peak learning rate. \|
	\| `final_learning_rate` \| `0.0001` \| Final learning rate. \|
	\| `weight_decay` \| `0` \| Weight decay. \|
	\| `warmup_steps` \| `4000` \| Number of optimizer warm-up steps. \|
	\| `transition_steps` \| `360000` \| Number of optimizer transition steps. \|
	\| `grad_norm` \| `500` \| Gradient norm used for gradient clipping. \|
	\| `num_gradient_accumulation_steps`\| `1` \| Steps to accumulate before taking an optimizer step. \|
	\| `algorithm` \| `optax.amsgrad`\| The AMSGrad optimizer. \|
	\| `b1` \| `0.9` \| Exponential decay rate to track first moment of past gradients. \|
	\| `b2` \| `0.999` \| Exponential decay rate to track second moment of past gradients.\|
	\| `eps` \| `1e-8` \| Constant applied to denominator outside the square root. \|
	\| `eps_root` \| `0.0` \| Constant applied to denominator inside the square root. \|
	### Huber Loss Energy weight schedule
	\| Parameter \| Value \| Description \|
	\|-----------------------\|------------------------------------\|-------------------------------------------------------------------------------------------------\|
	\| `schedule` \| `optax.piecewise_constant_schedule`\| Piecewise constant schedule with scaled jumps at specific boundaries. \|
	\| `init_value` \| `40` \| Initial value. \|
	\| `boundaries_and_scale`\| `{115: 25}` \| Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.\|
	### Huber Loss Force weight schedule
	\| Parameter \| Value \| Description \|
	\|-----------------------\|------------------------------------\|-------------------------------------------------------------------------------------------------\|
	\| `schedule` \| `optax.piecewise_constant_schedule`\| Piecewise constant schedule with scaled jumps at specific boundaries. \|
	\| `init_value` \| `1000` \| Initial value. \|
	\| `boundaries_and_scale`\| `{115: 0.04}` \| Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.\|
	### Dataset
	\| Parameter \| Value \| Description \|
	\|-----------------------------\|-------\|--------------------------------------------\|
	\| `graph_cutoff_angstrom` \| `5` \| Graph cutoff distance (in Å). \|
	\| `max_n_node` \| `32` \| Maximum number of nodes allowed in a batch.\|
	\| `max_n_edge` \| `288` \| Maximum number of edges allowed in a batch.\|
	\| `batch_size` \| `16` \| Number of graphs in a batch. \|
	This model was trained on the [SPICE2_curated dataset](https://huggingface.co/datasets/InstaDeepAI/SPICE2-curated).
	## How to Use
	For complete usage instructions and more information, please refer to our [documentation](https://instadeep.github.io/mlip)

	## License summary

	1. The Licensed Models are only available under this License for Non-Commercial Purposes.
	2. You are permitted to reproduce, publish, share and adapt the Output generated by the Licensed Model only for Non-Commercial Purposes and in accordance with this License.
	3. You may not use the Licensed Models or any of its Outputs in connection with:
	1. any Commercial Purposes, unless agreed by Us under a separate licence;
	2. to train, improve or otherwise influence the functionality or performance of any other third-party derivative model that is commercial or intended for a Commercial Purpose and is similar to the Licensed Models;
	3. to create models distilled or derived from the Outputs of the Licensed Models, unless such models are for Non-Commercial Purposes and open-sourced under the same license as the Licensed Models; or
	4. in violation of any applicable laws and regulations.

	# ViSNet
	## Reference
	Yusong Wang, Tong Wang, Shaoning Li, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao, and Tie-Yan Liu.
	Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing.
	Nature Communications, 15(1), January 2024. ISSN: 2041-1723.
	URL: https://dx.doi.org/10.1038/s41467-023-43720-2.
	## Hyperparameters, model configurations and training strategies
	### Model architecture
	\| Parameter \| Value \| Description \|
	\|--------------------\|----------\|--------------------------------------------------------------------------\|
	\| `num_layers` \| `4` \| Number of ViSNet layers. \|
	\| `num_channels` \| `128` \| Number of channels. \|
	\| `l_max` \| `2` \| Highest harmonic order included in the Spherical Harmonics series. \|
	\| `num_heads` \| `8` \| Number of heads in the attention block. \|
	\| `num_rbf` \| `32` \| Number of radial basis functions in the embedding block. \|
	\| `trainable_rbf` \| `False` \| Whether to add learnable weights to the radial embedding basis functions.\|
	\| `activation` \| `silu` \| Activation function for the output block. \|
	\| `attn_activation` \| `silu` \| Activation function for the attention block. \|
	\| `vecnorm_type` \| `None` \| Type of the vector norm. \|
	\| `atomic_energies` \| `average`\| Treatment of the atomic energies. \|
	\| `avg_um_neighbors` \| `None` \| Mean number of neighbors. \|
	### Training
	\| Parameter \| Value \| Description \|
	\|--------------------------\|--------\|--------------------------------------------------\|
	\| `num_epochs` \| `220` \| Number of epochs to run. \|
	\| `ema_decay` \| `0.99` \| The EMA decay rate. \|
	\| `eval_num_graphs` \| `None` \| Number of validation set graphs to evaluate on. \|
	\| `use_ema_params_for_eval`\| `True` \| Whether to use the EMA parameters for evaluation.\|
	### Optimizer
	\| Parameter \| Value \| Description \|
	\|----------------------------------\|----------------\|-----------------------------------------------------------------\|
	\| `init_learning_rate` \| `0.0001` \| Initial learning rate. \|
	\| `peak_learning_rate` \| `0.0001` \| Peak learning rate. \|
	\| `final_learning_rate` \| `0.0001` \| Final learning rate. \|
	\| `weight_decay` \| `0` \| Weight decay. \|
	\| `warmup_steps` \| `4000` \| Number of optimizer warm-up steps. \|
	\| `transition_steps` \| `360000` \| Number of optimizer transition steps. \|
	\| `grad_norm` \| `500` \| Gradient norm used for gradient clipping. \|
	\| `num_gradient_accumulation_steps`\| `1` \| Steps to accumulate before taking an optimizer step. \|
	\| `algorithm` \| `optax.amsgrad`\| The AMSGrad optimizer. \|
	\| `b1` \| `0.9` \| Exponential decay rate to track first moment of past gradients. \|
	\| `b2` \| `0.999` \| Exponential decay rate to track second moment of past gradients.\|
	\| `eps` \| `1e-8` \| Constant applied to denominator outside the square root. \|
	\| `eps_root` \| `0.0` \| Constant applied to denominator inside the square root. \|
	### Huber Loss Energy weight schedule
	\| Parameter \| Value \| Description \|
	\|-----------------------\|------------------------------------\|-------------------------------------------------------------------------------------------------\|
	\| `schedule` \| `optax.piecewise_constant_schedule`\| Piecewise constant schedule with scaled jumps at specific boundaries. \|
	\| `init_value` \| `40` \| Initial value. \|
	\| `boundaries_and_scale`\| `{115: 25}` \| Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.\|
	### Huber Loss Force weight schedule
	\| Parameter \| Value \| Description \|
	\|-----------------------\|------------------------------------\|-------------------------------------------------------------------------------------------------\|
	\| `schedule` \| `optax.piecewise_constant_schedule`\| Piecewise constant schedule with scaled jumps at specific boundaries. \|
	\| `init_value` \| `1000` \| Initial value. \|
	\| `boundaries_and_scale`\| `{115: 0.04}` \| Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.\|
	### Dataset
	\| Parameter \| Value \| Description \|
	\|-----------------------------\|-------\|--------------------------------------------\|
	\| `graph_cutoff_angstrom` \| `5` \| Graph cutoff distance (in Å). \|
	\| `max_n_node` \| `32` \| Maximum number of nodes allowed in a batch.\|
	\| `max_n_edge` \| `288` \| Maximum number of edges allowed in a batch.\|
	\| `batch_size` \| `16` \| Number of graphs in a batch. \|
	This model was trained on the [SPICE2_curated dataset](https://huggingface.co/datasets/InstaDeepAI/SPICE2-curated).
	## How to Use
	For complete usage instructions and more information, please refer to our [documentation](https://instadeep.github.io/mlip)

	## License summary

	1. The Licensed Models are only available under this License for Non-Commercial Purposes.
	2. You are permitted to reproduce, publish, share and adapt the Output generated by the Licensed Model only for Non-Commercial Purposes and in accordance with this License.
	3. You may not use the Licensed Models or any of its Outputs in connection with:
	1. any Commercial Purposes, unless agreed by Us under a separate licence;
	2. to train, improve or otherwise influence the functionality or performance of any other third-party derivative model that is commercial or intended for a Commercial Purpose and is similar to the Licensed Models;
	3. to create models distilled or derived from the Outputs of the Licensed Models, unless such models are for Non-Commercial Purposes and open-sourced under the same license as the Licensed Models; or
	4. in violation of any applicable laws and regulations.