Commit
·
8d40725
1
Parent(s):
a4693d4
Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- jxie/guacamol
|
| 5 |
+
- AdrianM0/MUV
|
| 6 |
+
library_name: transformers
|
| 7 |
+
---
|
| 8 |
+
## Model Details
|
| 9 |
+
|
| 10 |
+
We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning.
|
| 11 |
+
|
| 12 |
+
### Enumeration-aware Molecular Transformers
|
| 13 |
+
Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models.
|
| 14 |
+
#### a. Molecular Domain Adaptation (Contrastive Encoder-based)
|
| 15 |
+
##### i. Architecture
|
| 16 |
+

|
| 17 |
+
##### ii. Contrastive Learning
|
| 18 |
+
<img width="1418" alt="Screenshot 2023-04-22 at 11 54 23 AM" src="https://user-images.githubusercontent.com/6007894/233777069-439c18cc-77a2-4ae2-a81e-d7e94c30a6be.png">
|
| 19 |
+
|
| 20 |
+
#### b. Canonicalization Encoder-decoder (Denoising Encoder-decoder)
|
| 21 |
+
<img width="702" alt="Screenshot 2023-04-22 at 11 43 06 AM" src="https://user-images.githubusercontent.com/6007894/233776512-ab6cdeef-02f1-4076-9b76-b228cbf26456.png">
|
| 22 |
+
|
| 23 |
+
### Pretraining steps for this model:
|
| 24 |
+
|
| 25 |
+
- Pretrain BERT model with Masked language modeling with masked proportion set to 15% on Guacamol dataset
|