UdS-LSV
/

smole-bert

Model card Files Files and versions

shahrukhx01 commited on Nov 2, 2023

Commit

8d40725

·

1 Parent(s): a4693d4

Create README.md

Files changed (1) hide show

README.md +25 -0

README.md ADDED Viewed

	@@ -0,0 +1,25 @@

+---
+license: apache-2.0
+datasets:
+- jxie/guacamol
+- AdrianM0/MUV
+library_name: transformers
+---
+## Model Details
+We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning.
+### Enumeration-aware Molecular Transformers
+Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models.
+#### a. Molecular Domain Adaptation (Contrastive Encoder-based)
+##### i. Architecture
+![smole bert drawio](https://user-images.githubusercontent.com/6007894/233776921-41667331-1ab7-413c-92f7-4e6fad512f5c.svg)
+##### ii. Contrastive Learning
+<img width="1418" alt="Screenshot 2023-04-22 at 11 54 23 AM" src="https://user-images.githubusercontent.com/6007894/233777069-439c18cc-77a2-4ae2-a81e-d7e94c30a6be.png">
+#### b. Canonicalization Encoder-decoder (Denoising Encoder-decoder)
+<img width="702" alt="Screenshot 2023-04-22 at 11 43 06 AM" src="https://user-images.githubusercontent.com/6007894/233776512-ab6cdeef-02f1-4076-9b76-b228cbf26456.png">
+### Pretraining steps for this model:
+- Pretrain BERT model with Masked language modeling with masked proportion set to 15% on Guacamol dataset