eacortes commited on
Commit
c852481
·
verified ·
1 Parent(s): 2a8d0a7

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -150,19 +150,6 @@ fill = pipeline("fill-mask", model=model, tokenizer=tokenizer)
150
  print(fill("c1ccccc1[MASK]"))
151
  ```
152
 
153
- ## Intended Use
154
- * Primary: Research and development for molecular property prediction, experimentation with pooling strategies, and as a foundational model for downstream applications.
155
- * Appropriate for: Binary / multi-class classification (e.g., toxicity, activity) and single-task or multi-task regression (e.g., solubility, clearance) after fine-tuning.
156
- * Not intended for generating novel molecules.
157
-
158
- ## Limitations
159
- - Out-of-domain performance may degrade for: very long (>128 token) SMILES, inorganic / organometallic compounds, polymers, or charged / enumerated tautomers are not well represented in training.
160
- - No guarantee of synthesizability, safety, or biological efficacy.
161
-
162
- ## Ethical Considerations & Responsible Use
163
- - Potential biases arise from training corpora skewed to drug-like space.
164
- - Do not deploy in clinical or regulatory settings without rigorous, domain-specific validation.
165
-
166
  ## Architecture
167
  - Backbone: ModernBERT
168
  - Hidden size: 768
@@ -293,6 +280,19 @@ Optimal parameters (per dataset) for the `MLM + DAPT + TAFT OPT` merged model:
293
 
294
  </details>
295
 
 
 
 
 
 
 
 
 
 
 
 
 
 
296
  ## Hardware
297
  Training and experiments were performed on 2 NVIDIA RTX 3090 GPUs.
298
 
 
150
  print(fill("c1ccccc1[MASK]"))
151
  ```
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  ## Architecture
154
  - Backbone: ModernBERT
155
  - Hidden size: 768
 
280
 
281
  </details>
282
 
283
+ ## Intended Use
284
+ * Primary: Research and development for molecular property prediction, experimentation with pooling strategies, and as a foundational model for downstream applications.
285
+ * Appropriate for: Binary / multi-class classification (e.g., toxicity, activity) and single-task or multi-task regression (e.g., solubility, clearance) after fine-tuning.
286
+ * Not intended for generating novel molecules.
287
+
288
+ ## Limitations
289
+ - Out-of-domain performance may degrade for: very long (>128 token) SMILES, inorganic / organometallic compounds, polymers, or charged / enumerated tautomers are not well represented in training.
290
+ - No guarantee of synthesizability, safety, or biological efficacy.
291
+
292
+ ## Ethical Considerations & Responsible Use
293
+ - Potential biases arise from training corpora skewed to drug-like space.
294
+ - Do not deploy in clinical or regulatory settings without rigorous, domain-specific validation.
295
+
296
  ## Hardware
297
  Training and experiments were performed on 2 NVIDIA RTX 3090 GPUs.
298