Spaces:
Running
Running
make instruction on org card more explicit
Browse files
README.md
CHANGED
|
@@ -27,7 +27,39 @@ The models were pre-trained on SMILES strings from the [Enamine REAL Space](http
|
|
| 27 |
- **Tokenization**: [``Smirk``](https://eeg.engin.umich.edu/smirk/) tokenizer
|
| 28 |
|
| 29 |
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
```python
|
| 33 |
from transformers import AutoModel
|
|
@@ -35,7 +67,7 @@ from smirk import SmirkTokenizerFast
|
|
| 35 |
|
| 36 |
# Load the model
|
| 37 |
model = AutoModel.from_pretrained(
|
| 38 |
-
"
|
| 39 |
trust_remote_code=True
|
| 40 |
)
|
| 41 |
|
|
@@ -48,82 +80,59 @@ smiles_batch = [
|
|
| 48 |
results = model.predict(smiles_batch)
|
| 49 |
```
|
| 50 |
|
| 51 |
-
### Setting Up Your Environment
|
| 52 |
-
|
| 53 |
-
Create a virtual environment and install dependencies:
|
| 54 |
-
|
| 55 |
-
```bash
|
| 56 |
-
python -m venv .venv
|
| 57 |
-
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
| 58 |
-
pip install -r requirements.txt
|
| 59 |
-
```
|
| 60 |
-
|
| 61 |
-
> **Note**: SMIRK tokenizers require Rust to be installed. See the [Rust installation guide](https://www.rust-lang.org/tools/install) for details.
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
## Model Inputs and Outputs
|
| 65 |
-
|
| 66 |
-
### Inputs
|
| 67 |
-
- **SMILES strings**: Standard SMILES notation for molecular structures
|
| 68 |
-
- **Batch size**: Variable, automatically padded during inference
|
| 69 |
-
|
| 70 |
-
### Outputs
|
| 71 |
-
- **Predictions**: Task-specific numerical or categorical predictions
|
| 72 |
-
- **Format**: Dictionary with channel names and predicted values (if channels are configured), or raw tensor output
|
| 73 |
-
|
| 74 |
-
|
| 75 |
## Provided Models
|
| 76 |
|
| 77 |
### Pre-trained
|
| 78 |
-
- `mist-1.8B-dh61satt`: Flagship MIST model (MIST-1.8B)
|
| 79 |
-
- `mist-28M-ti624ev1`: Smaller MIST model (MIST-28M).
|
| 80 |
|
| 81 |
Below is a full list of finetuned variants hosted on HuggingFace:
|
| 82 |
### MoleculeNet Benchmark Models
|
| 83 |
|
| 84 |
-
| Folder
|
| 85 |
-
| ----------------------------
|
| 86 |
-
| mist-1.8B-fbdn8e35-bbbp
|
| 87 |
-
| mist-1.8B-1a4puhg2-hiv
|
| 88 |
-
| mist-1.8B-m50jgolp-bace
|
| 89 |
-
| mist-1.8B-uop1z0dc-tox21
|
| 90 |
-
| mist-1.8B-lu1l5ieh-clintox
|
| 91 |
-
| mist-1.8B-l1wfo7oa-sider *
|
| 92 |
-
| mist-1.8B-hxiygjsm-esol *
|
| 93 |
-
| mist-1.8B-iwqj2cld-freesolv | MIST-1.8B| MoleculeNet FreeSolv
|
| 94 |
-
| mist-1.8B-jvt4azpz-lipo
|
| 95 |
-
| mist-1.8B-8nd1ot5j-qm8
|
| 96 |
-
| mist-28M-3xpfhv48-bbbp
|
| 97 |
-
| mist-28M-8fh43gke-hiv
|
| 98 |
-
| mist-28M-8loj3bab-bace
|
| 99 |
-
| mist-28M-kw4ks27p-tox21
|
| 100 |
-
| mist-28M-97vfcykk-clintox
|
| 101 |
-
| mist-28M-z8qo16uy-sider
|
| 102 |
-
| mist-28M-kcwb9le5-esol
|
| 103 |
-
| mist-28M-0uiq7o7m-freesolv *
|
| 104 |
-
| mist-28M-xzr5ulva-lipo
|
| 105 |
-
| mist-28M-gzwqzpcr-qm8
|
| 106 |
-
| mist-26.9M-kkgx0omx-qm9
|
|
|
|
| 107 |
|
| 108 |
`*` Indicates models currently not available on hugging-face due to storage limits
|
| 109 |
|
| 110 |
#### QM9 Benchmark Models
|
| 111 |
The single target (MIST-1.8B encoder) models for properties in QM9 are available.
|
| 112 |
|
| 113 |
-
| Folder
|
| 114 |
-
| ---------------------------- | :------: | ----------------------------------------------------------------- |
|
| 115 |
-
| mist-1.8B-ez05expv-mu
|
| 116 |
-
| mist-1.8B-rcwary93-alpha *
|
| 117 |
-
| mist-1.8B-jmjosq12-homo *
|
| 118 |
-
| mist-1.8B-n14wshc9-lumo *
|
| 119 |
-
| mist-1.8B-kayun6v3-gap *
|
| 120 |
-
| mist-1.8B-xxe7t35e-r2 *
|
| 121 |
-
| mist-1.8B-6nmcwyrp-zpve
|
| 122 |
-
| mist-1.8B-a7akimjj-u0
|
| 123 |
-
| mist-1.8B-85f24xkj-u298
|
| 124 |
-
| mist-1.8B-3fbbz4is-h298
|
| 125 |
-
| mist-1.8B-09sntn03-g298
|
| 126 |
-
| mist-1.8B-j356b3nf-cv
|
| 127 |
|
| 128 |
`*` Indicates models currently not available on hugging-face due to storage limits
|
| 129 |
|
|
@@ -131,30 +140,32 @@ The single target (MIST-1.8B encoder) models for properties in QM9 are available
|
|
| 131 |
|
| 132 |
These models consist of a MIST-encoder and task network finetuned on a single dataset used in the applications demonstrated in the manuscript.
|
| 133 |
|
| 134 |
-
| Folder
|
| 135 |
-
| ------------------------- | :------: | ----------------------------------------------------------- |
|
| 136 |
-
| mist-26.9M-48kpooqf-odour
|
| 137 |
-
| mist-26.9M-6hk5coof-dn
|
| 138 |
-
| mist-26.9M-0vxdbm36-kt
|
| 139 |
-
| mist-26.9M-b302p09x-bp
|
| 140 |
-
| mist-26.9M-cyuo2xb6-fp
|
| 141 |
-
| mist-26.9M-y3ge5pf9-mp
|
| 142 |
|
| 143 |
### Finetuned Multi-Task Models
|
| 144 |
These are additional multi-target finetuned models consisting of a MIST encoder and task network.
|
| 145 |
-
|
| 146 |
-
|
|
| 147 |
-
|
|
| 148 |
-
| mist-
|
| 149 |
-
| mist-28M-
|
|
|
|
| 150 |
|
| 151 |
### Finetuned Mixture Models
|
| 152 |
|
| 153 |
These models consist of a MIST-encoder and physics informed task network for mixture property prediction.
|
| 154 |
-
|
| 155 |
-
|
|
| 156 |
-
|
|
| 157 |
-
| mist-
|
|
|
|
| 158 |
|
| 159 |
## Citation
|
| 160 |
|
|
|
|
| 27 |
- **Tokenization**: [``Smirk``](https://eeg.engin.umich.edu/smirk/) tokenizer
|
| 28 |
|
| 29 |
|
| 30 |
+
|
| 31 |
+
## Model Inputs and Outputs
|
| 32 |
+
|
| 33 |
+
### Inputs
|
| 34 |
+
- **SMILES strings**: Standard SMILES notation for molecular structures
|
| 35 |
+
- **Batch size**: Variable, automatically padded during inference
|
| 36 |
+
|
| 37 |
+
### Outputs
|
| 38 |
+
- **Predictions**: Task-specific numerical or categorical predictions
|
| 39 |
+
- **Format**: Dictionary with channel names and predicted values (if channels are configured), or raw tensor output
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
## Quick Start
|
| 43 |
+
|
| 44 |
+
Tutorials are available in Google Colab:
|
| 45 |
+
- [Inference](https://colab.research.google.com/github/BattModels/mist-demo/blob/main/tutorials/molecular_property_prediction.ipynb)
|
| 46 |
+
- [Finetuning](https://colab.research.google.com/github/BattModels/mist-demo/blob/main/tutorials/run_finetuning.ipynb)
|
| 47 |
+
|
| 48 |
+
#### Running Locally
|
| 49 |
+
|
| 50 |
+
To run the model locally, create a virtual environment and install dependencies:
|
| 51 |
+
|
| 52 |
+
```bash
|
| 53 |
+
python -m venv .venv
|
| 54 |
+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
| 55 |
+
pip install -r requirements.txt
|
| 56 |
+
```
|
| 57 |
+
> **Note**: SMIRK tokenizers require Rust to be installed. See the [Rust installation guide](https://www.rust-lang.org/tools/install) for details.
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
Use the model!
|
| 61 |
+
For a full list of model IDs and properties see the list of provided models below.
|
| 62 |
+
For details on the specific inputs and outputs formats for each model variant see the model card.
|
| 63 |
|
| 64 |
```python
|
| 65 |
from transformers import AutoModel
|
|
|
|
| 67 |
|
| 68 |
# Load the model
|
| 69 |
model = AutoModel.from_pretrained(
|
| 70 |
+
"mist-models/mist-{size}-{model_id}-{property}",
|
| 71 |
trust_remote_code=True
|
| 72 |
)
|
| 73 |
|
|
|
|
| 80 |
results = model.predict(smiles_batch)
|
| 81 |
```
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
## Provided Models
|
| 84 |
|
| 85 |
### Pre-trained
|
| 86 |
+
- [`mist-1.8B-dh61satt`](https://huggingface.co/mist-models/mist-1.8B-dh61satt): Flagship MIST model (MIST-1.8B)
|
| 87 |
+
- [`mist-28M-ti624ev1`](https://huggingface.co/mist-models/mist-28M-ti624ev1): Smaller MIST model (MIST-28M).
|
| 88 |
|
| 89 |
Below is a full list of finetuned variants hosted on HuggingFace:
|
| 90 |
### MoleculeNet Benchmark Models
|
| 91 |
|
| 92 |
+
| Folder | Encoder | Dataset |
|
| 93 |
+
| ---------------------------------------------------------------------- | :-------: | ------------------------- |
|
| 94 |
+
| [mist-1.8B-fbdn8e35-bbbp](https://huggingface.co/mist-models/mist-1.8B-fbdn8e35-bbbp) | MIST-1.8B | MoleculeNet BBBP |
|
| 95 |
+
| [mist-1.8B-1a4puhg2-hiv](https://huggingface.co/mist-models/mist-1.8B-1a4puhg2-hiv) | MIST-1.8B | MoleculeNet HIV |
|
| 96 |
+
| [mist-1.8B-m50jgolp-bace](https://huggingface.co/mist-models/mist-1.8B-m50jgolp-bace) | MIST-1.8B | MoleculeNet BACE |
|
| 97 |
+
| [mist-1.8B-uop1z0dc-tox21](https://huggingface.co/mist-models/mist-1.8B-uop1z0dc-tox21) | MIST-1.8B | MoleculeNet Tox21 |
|
| 98 |
+
| [mist-1.8B-lu1l5ieh-clintox](https://huggingface.co/mist-models/mist-1.8B-lu1l5ieh-clintox) | MIST-1.8B | MoleculeNet ClinTox |
|
| 99 |
+
| mist-1.8B-l1wfo7oa-sider * | MIST-1.8B | MoleculeNet SIDER. |
|
| 100 |
+
| mist-1.8B-hxiygjsm-esol * | MIST-1.8B | MoleculeNet ESOL |
|
| 101 |
+
| [mist-1.8B-iwqj2cld-freesolv](https://huggingface.co/mist-models/mist-1.8B-iwqj2cld-freesolv) | MIST-1.8B | MoleculeNet FreeSolv |
|
| 102 |
+
| [mist-1.8B-jvt4azpz-lipo](https://huggingface.co/mist-models/mist-1.8B-jvt4azpz-lipo) | MIST-1.8B | MoleculeNet Lipophilicity |
|
| 103 |
+
| [mist-1.8B-8nd1ot5j-qm8](https://huggingface.co/mist-models/mist-1.8B-8nd1ot5j-qm8) | MIST-1.8B | MoleculeNet QM8 |
|
| 104 |
+
| [mist-28M-3xpfhv48-bbbp](https://huggingface.co/mist-models/mist-28M-3xpfhv48-bbbp) | MIST-28M | MoleculeNet BBBP |
|
| 105 |
+
| [mist-28M-8fh43gke-hiv](https://huggingface.co/mist-models/mist-28M-8fh43gke-hiv) | MIST-28M | MoleculeNet HIV |
|
| 106 |
+
| [mist-28M-8loj3bab-bace](https://huggingface.co/mist-models/mist-28M-8loj3bab-bace) | MIST-28M | MoleculeNet BACE |
|
| 107 |
+
| [mist-28M-kw4ks27p-tox21](https://huggingface.co/mist-models/mist-28M-kw4ks27p-tox21) | MIST-28M | MoleculeNet Tox21 |
|
| 108 |
+
| [mist-28M-97vfcykk-clintox](https://huggingface.co/mist-models/mist-28M-97vfcykk-clintox) | MIST-28M | MoleculeNet ClinTox |
|
| 109 |
+
| [mist-28M-z8qo16uy-sider](https://huggingface.co/mist-models/mist-28M-z8qo16uy-sider) | MIST-28M | MoleculeNet SIDER |
|
| 110 |
+
| [mist-28M-kcwb9le5-esol](https://huggingface.co/mist-models/mist-28M-kcwb9le5-esol) | MIST-28M | MoleculeNet ESOL |
|
| 111 |
+
| mist-28M-0uiq7o7m-freesolv * | MIST-28M | MoleculeNet FreeSolv |
|
| 112 |
+
| [mist-28M-xzr5ulva-lipo](https://huggingface.co/mist-models/mist-28M-xzr5ulva-lipo) | MIST-28M | MoleculeNet Lipophilicity |
|
| 113 |
+
| [mist-28M-gzwqzpcr-qm8](https://huggingface.co/mist-models/mist-28M-gzwqzpcr-qm8) | MIST-28M | MoleculeNet QM8 |
|
| 114 |
+
| [mist-26.9M-kkgx0omx-qm9](https://huggingface.co/mist-models/mist-26.9M-kkgx0omx-qm9) | MIST-28M | MoleculeNet QM9 |
|
| 115 |
+
|
| 116 |
|
| 117 |
`*` Indicates models currently not available on hugging-face due to storage limits
|
| 118 |
|
| 119 |
#### QM9 Benchmark Models
|
| 120 |
The single target (MIST-1.8B encoder) models for properties in QM9 are available.
|
| 121 |
|
| 122 |
+
| Folder | Encoder | Target |
|
| 123 |
+
| ---------------------------------------------------------------------- | :-------: | ----------------------------------------------------------------- |
|
| 124 |
+
| [mist-1.8B-ez05expv-mu](https://huggingface.co/mist-models/mist-1.8B-ez05expv-mu) | MIST-1.8B | μ - Dipole moment (unit: D) |
|
| 125 |
+
| mist-1.8B-rcwary93-alpha * | MIST-1.8B | α - Isotropic polarizability (unit: Bohr^3) |
|
| 126 |
+
| mist-1.8B-jmjosq12-homo * | MIST-1.8B | HOMO - Highest occupied molecular orbital energy (unit: Hartree) |
|
| 127 |
+
| mist-1.8B-n14wshc9-lumo * | MIST-1.8B | LUMO - Lowest unoccupied molecular orbital energy (unit: Hartree) |
|
| 128 |
+
| mist-1.8B-kayun6v3-gap * | MIST-1.8B | Gap - Gap between HOMO and LUMO (unit: Hartree) |
|
| 129 |
+
| mist-1.8B-xxe7t35e-r2 * | MIST-1.8B | \<R2\> - Electronic spatial extent (unit: Bohr^2) |
|
| 130 |
+
| [mist-1.8B-6nmcwyrp-zpve](https://huggingface.co/mist-models/mist-1.8B-6nmcwyrp-zpve) | MIST-1.8B | ZPVE - Zero point vibrational energy (unit: Hartree) |
|
| 131 |
+
| [mist-1.8B-a7akimjj-u0](https://huggingface.co/mist-models/mist-1.8B-a7akimjj-u0) | MIST-1.8B | U0 - Internal energy at 0K (unit: Hartree) |
|
| 132 |
+
| [mist-1.8B-85f24xkj-u298](https://huggingface.co/mist-models/mist-1.8B-85f24xkj-u298) | MIST-1.8B | U298 - Internal energy at 298.15K (unit: Hartree) |
|
| 133 |
+
| [mist-1.8B-3fbbz4is-h298](https://huggingface.co/mist-models/mist-1.8B-3fbbz4is-h298) | MIST-1.8B | H298 - Enthalpy at 298.15K (unit: Hartree) |
|
| 134 |
+
| [mist-1.8B-09sntn03-g298](https://huggingface.co/mist-models/mist-1.8B-09sntn03-g298) | MIST-1.8B | G298 - Free energy at 298.15K (unit: Hartree) |
|
| 135 |
+
| [mist-1.8B-j356b3nf-cv](https://huggingface.co/mist-models/mist-1.8B-j356b3nf-cv) | MIST-1.8B | Cv - Heat capacity at 298.15K (unit: cal/(mol*K)) |
|
| 136 |
|
| 137 |
`*` Indicates models currently not available on hugging-face due to storage limits
|
| 138 |
|
|
|
|
| 140 |
|
| 141 |
These models consist of a MIST-encoder and task network finetuned on a single dataset used in the applications demonstrated in the manuscript.
|
| 142 |
|
| 143 |
+
| Folder | Encoder | Dataset |
|
| 144 |
+
| ---------------------------------------------------------------------- | :------: | ----------------------------------------------------------- |
|
| 145 |
+
| [mist-26.9M-48kpooqf-odour](https://huggingface.co/mist-models/mist-26.9M-48kpooqf-odour) | MIST-28M | Olfaction |
|
| 146 |
+
| [mist-26.9M-6hk5coof-dn](https://huggingface.co/mist-models/mist-26.9M-6hk5coof-dn) | MIST-28M | Donor Number |
|
| 147 |
+
| [mist-26.9M-0vxdbm36-kt](https://huggingface.co/mist-models/mist-26.9M-0vxdbm36-kt) | MIST-28M | Kamlet-Taft Solvochromatic Parameters |
|
| 148 |
+
| [mist-26.9M-b302p09x-bp](https://huggingface.co/mist-models/mist-26.9M-b302p09x-bp) | MIST-28M | Boiling Point (Part of Characteristic Temperatures Dataset) |
|
| 149 |
+
| [mist-26.9M-cyuo2xb6-fp](https://huggingface.co/mist-models/mist-26.9M-cyuo2xb6-fp) | MIST-28M | Flash Point (Part of Characteristic Temperatures Dataset) |
|
| 150 |
+
| [mist-26.9M-y3ge5pf9-mp](https://huggingface.co/mist-models/mist-26.9M-y3ge5pf9-mp) | MIST-28M | Melting Point (Part of Characteristic Temperatures Dataset) |
|
| 151 |
|
| 152 |
### Finetuned Multi-Task Models
|
| 153 |
These are additional multi-target finetuned models consisting of a MIST encoder and task network.
|
| 154 |
+
|
| 155 |
+
| Folder | Encoder | Dataset |
|
| 156 |
+
| ---------------------------------------------------------------------- | :------: | ------------------------------------- |
|
| 157 |
+
| [mist-26.9M-kkgx0omx-qm9](https://huggingface.co/mist-models/mist-26.9M-kkgx0omx-qm9) | MIST-28M | QM9 Dataset with SMILES randomization |
|
| 158 |
+
| [mist-28M-ttqcvt6fs-toxcast](https://huggingface.co/mist-models/mist-28M-ttqcvt6fs-toxcast) | MIST-28M | ToxCast |
|
| 159 |
+
| [mist-28M-yr1urd2c-muv](https://huggingface.co/mist-models/mist-28M-yr1urd2c-muv) | MIST-28M | Maximum Unbiased Validation (MUV) |
|
| 160 |
|
| 161 |
### Finetuned Mixture Models
|
| 162 |
|
| 163 |
These models consist of a MIST-encoder and physics informed task network for mixture property prediction.
|
| 164 |
+
|
| 165 |
+
| Folder | Encoder | Dataset |
|
| 166 |
+
| ---------------------------------------------------------------------- | :------: | ----------------------------------------------- |
|
| 167 |
+
| [mist-conductivity-28M-2mpg8dcd](https://huggingface.co/mist-models/mist-conductivity-28M-2mpg8dcd) | MIST-28M | Ionic Conductivity |
|
| 168 |
+
| [mist-mixtures-zffffbex](https://huggingface.co/mist-models/mist-mixtures-zffffbex) | MIST-28M | Excess Density, Molar Volume and Molar Enthalpy |
|
| 169 |
|
| 170 |
## Citation
|
| 171 |
|