Derify
/

ModChemBERT-MLM-DAPT

cheminformatics

chemical-language-model

molecular-property-prediction

Eval Results (legacy)

Model card Files Files and versions

eacortes commited on Sep 24, 2025

Commit

c852481

·

verified ·

1 Parent(s): 2a8d0a7

Upload README.md

Files changed (1) hide show

README.md +13 -13

README.md CHANGED Viewed

@@ -150,19 +150,6 @@ fill = pipeline("fill-mask", model=model, tokenizer=tokenizer)
 print(fill("c1ccccc1[MASK]"))
 ```
-## Intended Use
-* Primary: Research and development for molecular property prediction, experimentation with pooling strategies, and as a foundational model for downstream applications.
-* Appropriate for: Binary / multi-class classification (e.g., toxicity, activity) and single-task or multi-task regression (e.g., solubility, clearance) after fine-tuning.
-* Not intended for generating novel molecules.
-## Limitations
-- Out-of-domain performance may degrade for: very long (>128 token) SMILES, inorganic / organometallic compounds, polymers, or charged / enumerated tautomers are not well represented in training.
-- No guarantee of synthesizability, safety, or biological efficacy.
-## Ethical Considerations & Responsible Use
-- Potential biases arise from training corpora skewed to drug-like space.
-- Do not deploy in clinical or regulatory settings without rigorous, domain-specific validation.
 ## Architecture
 - Backbone: ModernBERT
 - Hidden size: 768
@@ -293,6 +280,19 @@ Optimal parameters (per dataset) for the `MLM + DAPT + TAFT OPT` merged model:
 </details>
 ## Hardware
 Training and experiments were performed on 2 NVIDIA RTX 3090 GPUs.

 print(fill("c1ccccc1[MASK]"))
 ```
 ## Architecture
 - Backbone: ModernBERT
 - Hidden size: 768
 </details>
+## Intended Use
+* Primary: Research and development for molecular property prediction, experimentation with pooling strategies, and as a foundational model for downstream applications.
+* Appropriate for: Binary / multi-class classification (e.g., toxicity, activity) and single-task or multi-task regression (e.g., solubility, clearance) after fine-tuning.
+* Not intended for generating novel molecules.
+## Limitations
+- Out-of-domain performance may degrade for: very long (>128 token) SMILES, inorganic / organometallic compounds, polymers, or charged / enumerated tautomers are not well represented in training.
+- No guarantee of synthesizability, safety, or biological efficacy.
+## Ethical Considerations & Responsible Use
+- Potential biases arise from training corpora skewed to drug-like space.
+- Do not deploy in clinical or regulatory settings without rigorous, domain-specific validation.
 ## Hardware
 Training and experiments were performed on 2 NVIDIA RTX 3090 GPUs.