Shoriful025 commited on
Commit
2d6db17
·
verified ·
1 Parent(s): d211c66

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - chemistry
6
+ - biology
7
+ - smiles
8
+ - toxicity
9
+ - molecular-prediction
10
+ ---
11
+
12
+ # molecular_toxicity_predictor
13
+
14
+ ## Overview
15
+ This model is a RoBERTa-based transformer (ChemBERTa) designed for the binary classification of chemical compounds based on potential molecular toxicity. It inputs SMILES (Simplified Molecular Input Line Entry System) strings and predicts whether the compound is likely to exhibit toxic properties in human cells.
16
+
17
+ ## Model Architecture
18
+ The model uses a BERT-style pre-training approach on chemical structures.
19
+
20
+
21
+
22
+ - **Input:** Tokenized SMILES sequences representing molecular graphs.
23
+ - **Architecture:** RoBERTa-base with 6 hidden layers, optimized for chemical informatics.
24
+ - **Vocabulary:** A custom BPE (Byte-Pair Encoding) tokenizer trained on 77 million molecules from the ZINC database.
25
+
26
+ ## Intended Use
27
+ - **Drug Discovery:** Early-stage screening of candidate molecules to filter out toxic compounds.
28
+ - **Regulatory Safety:** Preliminary safety assessment for industrial chemicals.
29
+ - **Environmental Health:** Predicting the impact of synthetic compounds on aquatic ecosystems.
30
+
31
+ ## Limitations
32
+ - **Stereochemistry:** Limited ability to distinguish between enantiomers or specific spatial isomers that may have differing toxicities.
33
+ - **Domain Gap:** May not generalize well to extremely large biological macromolecules (e.g., proteins or long peptides).
34
+ - **In-Vitro vs In-Vivo:** Predicts molecular interaction, but does not simulate systemic metabolism or organ-specific toxicity (e.g., liver vs kidney).