Spaces:

mist-models
/

infer-attribute

Running

App Files Files Community

anoushka2000 commited on Jan 14

Commit

9f072eb

verified ·

1 Parent(s): 21201da

make instruction on org card more explicit

Browse files

Files changed (1) hide show

README.md +93 -82

README.md CHANGED Viewed

@@ -27,7 +27,39 @@ The models were pre-trained on SMILES strings from the [Enamine REAL Space](http
 - **Tokenization**: [``Smirk``](https://eeg.engin.umich.edu/smirk/) tokenizer
-### Quick Start
 ```python
 from transformers import AutoModel
@@ -35,7 +67,7 @@ from smirk import SmirkTokenizerFast
 # Load the model
 model = AutoModel.from_pretrained(
-    "path/to/model",
     trust_remote_code=True
 )
@@ -48,82 +80,59 @@ smiles_batch = [
 results = model.predict(smiles_batch)
 ```
-### Setting Up Your Environment
-Create a virtual environment and install dependencies:
-```bash
-python -m venv .venv
-source .venv/bin/activate  # On Windows: .venv\Scripts\activate
-pip install -r requirements.txt
-```
-> **Note**: SMIRK tokenizers require Rust to be installed. See the [Rust installation guide](https://www.rust-lang.org/tools/install) for details.
-## Model Inputs and Outputs
-### Inputs
-- **SMILES strings**: Standard SMILES notation for molecular structures
-- **Batch size**: Variable, automatically padded during inference
-### Outputs
-- **Predictions**: Task-specific numerical or categorical predictions
-- **Format**: Dictionary with channel names and predicted values (if channels are configured), or raw tensor output
 ## Provided Models
 ### Pre-trained
-- `mist-1.8B-dh61satt`: Flagship MIST model (MIST-1.8B)
-- `mist-28M-ti624ev1`: Smaller MIST model (MIST-28M).
 Below is a full list of finetuned variants hosted on HuggingFace:
 ### MoleculeNet Benchmark Models
-| Folder                       | Encoder  | Dataset                              |
-| ---------------------------- | :------: | ------------------------------------ |
-| mist-1.8B-fbdn8e35-bbbp      | MIST-1.8B| MoleculeNet BBBP                     |
-| mist-1.8B-1a4puhg2-hiv       | MIST-1.8B| MoleculeNet HIV                      |
-| mist-1.8B-m50jgolp-bace      | MIST-1.8B| MoleculeNet BACE                     |
-| mist-1.8B-uop1z0dc-tox21     | MIST-1.8B| MoleculeNet Tox21                    |
-| mist-1.8B-lu1l5ieh-clintox   | MIST-1.8B| MoleculeNet ClinTox                  |
-| mist-1.8B-l1wfo7oa-sider *   | MIST-1.8B| MoleculeNet SIDER.                   |
-| mist-1.8B-hxiygjsm-esol *    | MIST-1.8B| MoleculeNet ESOL                     |
-| mist-1.8B-iwqj2cld-freesolv  | MIST-1.8B| MoleculeNet FreeSolv                 |
-| mist-1.8B-jvt4azpz-lipo      | MIST-1.8B| MoleculeNet Lipophilicity            |
-| mist-1.8B-8nd1ot5j-qm8       | MIST-1.8B| MoleculeNet QM8                      |
-| mist-28M-3xpfhv48-bbbp       | MIST-28M | MoleculeNet BBBP                     |
-| mist-28M-8fh43gke-hiv        | MIST-28M | MoleculeNet HIV                      |
-| mist-28M-8loj3bab-bace       | MIST-28M | MoleculeNet BACE                     |
-| mist-28M-kw4ks27p-tox21      | MIST-28M | MoleculeNet Tox21                    |
-| mist-28M-97vfcykk-clintox    | MIST-28M | MoleculeNet ClinTox                  |
-| mist-28M-z8qo16uy-sider      | MIST-28M | MoleculeNet SIDER                    |
-| mist-28M-kcwb9le5-esol       | MIST-28M | MoleculeNet ESOL                     |
-| mist-28M-0uiq7o7m-freesolv * | MIST-28M | MoleculeNet FreeSolv                 |
-| mist-28M-xzr5ulva-lipo       | MIST-28M | MoleculeNet Lipophilicity            |
-| mist-28M-gzwqzpcr-qm8        | MIST-28M | MoleculeNet QM8                      |
-| mist-26.9M-kkgx0omx-qm9      | MIST-28M | MoleculeNet QM9                      |
 `*` Indicates models currently not available on hugging-face due to storage limits
 #### QM9 Benchmark Models
 The single target (MIST-1.8B encoder) models for properties in QM9 are available.
-| Folder                       | Encoder  | Target                                                            |
-| ---------------------------- | :------: | ----------------------------------------------------------------- |
-| mist-1.8B-ez05expv-mu        | MIST-1.8B| μ - Dipole moment (unit: D)                                       |
-| mist-1.8B-rcwary93-alpha *   | MIST-1.8B| α - Isotropic polarizability (unit: Bohr^3)                       |
-| mist-1.8B-jmjosq12-homo *    | MIST-1.8B| HOMO - Highest occupied molecular orbital energy (unit: Hartree)  |
-| mist-1.8B-n14wshc9-lumo *    | MIST-1.8B| LUMO - Lowest unoccupied molecular orbital energy (unit: Hartree) |
-| mist-1.8B-kayun6v3-gap *     | MIST-1.8B| Gap - Gap between HOMO and LUMO (unit: Hartree)                   |
-| mist-1.8B-xxe7t35e-r2 *      | MIST-1.8B| \<R2\> - Electronic spatial extent (unit: Bohr^2)                 |
-| mist-1.8B-6nmcwyrp-zpve      | MIST-1.8B| ZPVE - Zero point vibrational energy (unit: Hartree)              |
-| mist-1.8B-a7akimjj-u0        | MIST-1.8B| U0 - Internal energy at 0K (unit: Hartree)                        |
-| mist-1.8B-85f24xkj-u298      | MIST-1.8B| U298 - Internal energy at 298.15K (unit: Hartree)                 |
-| mist-1.8B-3fbbz4is-h298      | MIST-1.8B| H298 - Enthalpy at 298.15K (unit: Hartree)                        |
-| mist-1.8B-09sntn03-g298      | MIST-1.8B| G298 - Free energy at 298.15K (unit: Hartree)                     |
-| mist-1.8B-j356b3nf-cv        | MIST-1.8B| Cv - Heat capacity at 298.15K (unit: cal/(mol*K))                 |
 `*` Indicates models currently not available on hugging-face due to storage limits
@@ -131,30 +140,32 @@ The single target (MIST-1.8B encoder) models for properties in QM9 are available
 These models consist of a MIST-encoder and task network finetuned on a single dataset used in the applications demonstrated in the manuscript.
-| Folder                    | Encoder  | Dataset                                                     |
-| ------------------------- | :------: | ----------------------------------------------------------- |
-| mist-26.9M-48kpooqf-odour | MIST-28M | Olfaction                                                   |
-| mist-26.9M-6hk5coof-dn    | MIST-28M | Donor Number                                                |
-| mist-26.9M-0vxdbm36-kt    | MIST-28M | Kamlet-Taft Solvochromatic Parameters                       |
-| mist-26.9M-b302p09x-bp    | MIST-28M | Boiling Point (Part of Characteristic Temperatures Dataset) |
-| mist-26.9M-cyuo2xb6-fp    | MIST-28M | Flash Point (Part of Characteristic Temperatures Dataset)   |
-| mist-26.9M-y3ge5pf9-mp    | MIST-28M | Melting Point (Part of Characteristic Temperatures Dataset) |
 ### Finetuned Multi-Task Models
 These are additional multi-target finetuned models consisting of a MIST encoder and task network.
-| Folder                     | Encoder  | Dataset                                                     |
-| -------------------------- | :------: | ----------------------------------------------------------- |
-| mist-26.9M-kkgx0omx-qm9    | MIST-28M | QM9 Dataset with SMILES randomization                       |
-| mist-28M-ttqcvt6fs-toxcast | MIST-28M | ToxCast                                                     |
-| mist-28M-yr1urd2c-muv      | MIST-28M | Maximum Unbiased Validation (MUV)                           |
 ### Finetuned Mixture Models
 These models consist of a MIST-encoder and physics informed task network for mixture property prediction.
-| Folder                           | Encoder  | Dataset                                                     |
-| -------------------------------- | :------: | ----------------------------------------------------------- |
-| mist-conductivity-28M-2mpg8dcd   | MIST-28M | Ionic Conductivity                                          |
-| mist-mixtures-zffffbex           | MIST-28M | Excess Density, Molar Volume and Molar Enthalpy             |
 ## Citation

 - **Tokenization**: [``Smirk``](https://eeg.engin.umich.edu/smirk/) tokenizer
+## Model Inputs and Outputs
+### Inputs
+- **SMILES strings**: Standard SMILES notation for molecular structures
+- **Batch size**: Variable, automatically padded during inference
+### Outputs
+- **Predictions**: Task-specific numerical or categorical predictions
+- **Format**: Dictionary with channel names and predicted values (if channels are configured), or raw tensor output
+## Quick Start
+Tutorials are available in Google Colab:
+- [Inference](https://colab.research.google.com/github/BattModels/mist-demo/blob/main/tutorials/molecular_property_prediction.ipynb)
+- [Finetuning](https://colab.research.google.com/github/BattModels/mist-demo/blob/main/tutorials/run_finetuning.ipynb)
+#### Running Locally
+To run the model locally, create a virtual environment and install dependencies:
+```bash
+python -m venv .venv
+source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+pip install -r requirements.txt
+```
+> **Note**: SMIRK tokenizers require Rust to be installed. See the [Rust installation guide](https://www.rust-lang.org/tools/install) for details.
+Use the model!
+For a full list of model IDs and properties see the list of provided models below.
+For details on the specific inputs and outputs formats for each model variant see the model card.
 ```python
 from transformers import AutoModel
 # Load the model
 model = AutoModel.from_pretrained(
+    "mist-models/mist-{size}-{model_id}-{property}",
     trust_remote_code=True
 )
 results = model.predict(smiles_batch)
 ```
 ## Provided Models
 ### Pre-trained
+- [`mist-1.8B-dh61satt`](https://huggingface.co/mist-models/mist-1.8B-dh61satt): Flagship MIST model (MIST-1.8B)
+- [`mist-28M-ti624ev1`](https://huggingface.co/mist-models/mist-28M-ti624ev1): Smaller MIST model (MIST-28M).
 Below is a full list of finetuned variants hosted on HuggingFace:
 ### MoleculeNet Benchmark Models
+| Folder                                                                 | Encoder   | Dataset                   |
+| ---------------------------------------------------------------------- | :-------: | ------------------------- |
+| [mist-1.8B-fbdn8e35-bbbp](https://huggingface.co/mist-models/mist-1.8B-fbdn8e35-bbbp)           | MIST-1.8B | MoleculeNet BBBP          |
+| [mist-1.8B-1a4puhg2-hiv](https://huggingface.co/mist-models/mist-1.8B-1a4puhg2-hiv)            | MIST-1.8B | MoleculeNet HIV           |
+| [mist-1.8B-m50jgolp-bace](https://huggingface.co/mist-models/mist-1.8B-m50jgolp-bace)          | MIST-1.8B | MoleculeNet BACE          |
+| [mist-1.8B-uop1z0dc-tox21](https://huggingface.co/mist-models/mist-1.8B-uop1z0dc-tox21)        | MIST-1.8B | MoleculeNet Tox21         |
+| [mist-1.8B-lu1l5ieh-clintox](https://huggingface.co/mist-models/mist-1.8B-lu1l5ieh-clintox)    | MIST-1.8B | MoleculeNet ClinTox       |
+| mist-1.8B-l1wfo7oa-sider *      | MIST-1.8B | MoleculeNet SIDER.        |
+| mist-1.8B-hxiygjsm-esol *        | MIST-1.8B | MoleculeNet ESOL          |
+| [mist-1.8B-iwqj2cld-freesolv](https://huggingface.co/mist-models/mist-1.8B-iwqj2cld-freesolv)  | MIST-1.8B | MoleculeNet FreeSolv      |
+| [mist-1.8B-jvt4azpz-lipo](https://huggingface.co/mist-models/mist-1.8B-jvt4azpz-lipo)          | MIST-1.8B | MoleculeNet Lipophilicity |
+| [mist-1.8B-8nd1ot5j-qm8](https://huggingface.co/mist-models/mist-1.8B-8nd1ot5j-qm8)            | MIST-1.8B | MoleculeNet QM8           |
+| [mist-28M-3xpfhv48-bbbp](https://huggingface.co/mist-models/mist-28M-3xpfhv48-bbbp)            | MIST-28M  | MoleculeNet BBBP          |
+| [mist-28M-8fh43gke-hiv](https://huggingface.co/mist-models/mist-28M-8fh43gke-hiv)              | MIST-28M  | MoleculeNet HIV           |
+| [mist-28M-8loj3bab-bace](https://huggingface.co/mist-models/mist-28M-8loj3bab-bace)            | MIST-28M  | MoleculeNet BACE          |
+| [mist-28M-kw4ks27p-tox21](https://huggingface.co/mist-models/mist-28M-kw4ks27p-tox21)          | MIST-28M  | MoleculeNet Tox21         |
+| [mist-28M-97vfcykk-clintox](https://huggingface.co/mist-models/mist-28M-97vfcykk-clintox)      | MIST-28M  | MoleculeNet ClinTox       |
+| [mist-28M-z8qo16uy-sider](https://huggingface.co/mist-models/mist-28M-z8qo16uy-sider)          | MIST-28M  | MoleculeNet SIDER         |
+| [mist-28M-kcwb9le5-esol](https://huggingface.co/mist-models/mist-28M-kcwb9le5-esol)            | MIST-28M  | MoleculeNet ESOL          |
+| mist-28M-0uiq7o7m-freesolv *  | MIST-28M  | MoleculeNet FreeSolv      |
+| [mist-28M-xzr5ulva-lipo](https://huggingface.co/mist-models/mist-28M-xzr5ulva-lipo)            | MIST-28M  | MoleculeNet Lipophilicity |
+| [mist-28M-gzwqzpcr-qm8](https://huggingface.co/mist-models/mist-28M-gzwqzpcr-qm8)              | MIST-28M  | MoleculeNet QM8           |
+| [mist-26.9M-kkgx0omx-qm9](https://huggingface.co/mist-models/mist-26.9M-kkgx0omx-qm9)          | MIST-28M  | MoleculeNet QM9           |
 `*` Indicates models currently not available on hugging-face due to storage limits
 #### QM9 Benchmark Models
 The single target (MIST-1.8B encoder) models for properties in QM9 are available.
+| Folder                                                                 | Encoder   | Target                                                            |
+| ---------------------------------------------------------------------- | :-------: | ----------------------------------------------------------------- |
+| [mist-1.8B-ez05expv-mu](https://huggingface.co/mist-models/mist-1.8B-ez05expv-mu)               | MIST-1.8B | μ - Dipole moment (unit: D)                                       |
+| mist-1.8B-rcwary93-alpha *                                             | MIST-1.8B | α - Isotropic polarizability (unit: Bohr^3)                       |
+| mist-1.8B-jmjosq12-homo *                                              | MIST-1.8B | HOMO - Highest occupied molecular orbital energy (unit: Hartree)  |
+| mist-1.8B-n14wshc9-lumo *                                              | MIST-1.8B | LUMO - Lowest unoccupied molecular orbital energy (unit: Hartree) |
+| mist-1.8B-kayun6v3-gap *                                               | MIST-1.8B | Gap - Gap between HOMO and LUMO (unit: Hartree)                   |
+| mist-1.8B-xxe7t35e-r2 *                                                | MIST-1.8B | \<R2\> - Electronic spatial extent (unit: Bohr^2)                 |
+| [mist-1.8B-6nmcwyrp-zpve](https://huggingface.co/mist-models/mist-1.8B-6nmcwyrp-zpve)          | MIST-1.8B | ZPVE - Zero point vibrational energy (unit: Hartree)              |
+| [mist-1.8B-a7akimjj-u0](https://huggingface.co/mist-models/mist-1.8B-a7akimjj-u0)              | MIST-1.8B | U0 - Internal energy at 0K (unit: Hartree)                        |
+| [mist-1.8B-85f24xkj-u298](https://huggingface.co/mist-models/mist-1.8B-85f24xkj-u298)          | MIST-1.8B | U298 - Internal energy at 298.15K (unit: Hartree)                 |
+| [mist-1.8B-3fbbz4is-h298](https://huggingface.co/mist-models/mist-1.8B-3fbbz4is-h298)          | MIST-1.8B | H298 - Enthalpy at 298.15K (unit: Hartree)                        |
+| [mist-1.8B-09sntn03-g298](https://huggingface.co/mist-models/mist-1.8B-09sntn03-g298)          | MIST-1.8B | G298 - Free energy at 298.15K (unit: Hartree)                     |
+| [mist-1.8B-j356b3nf-cv](https://huggingface.co/mist-models/mist-1.8B-j356b3nf-cv)              | MIST-1.8B | Cv - Heat capacity at 298.15K (unit: cal/(mol*K))                 |
 `*` Indicates models currently not available on hugging-face due to storage limits
 These models consist of a MIST-encoder and task network finetuned on a single dataset used in the applications demonstrated in the manuscript.
+| Folder                                                                 | Encoder  | Dataset                                                     |
+| ---------------------------------------------------------------------- | :------: | ----------------------------------------------------------- |
+| [mist-26.9M-48kpooqf-odour](https://huggingface.co/mist-models/mist-26.9M-48kpooqf-odour)       | MIST-28M | Olfaction                                                   |
+| [mist-26.9M-6hk5coof-dn](https://huggingface.co/mist-models/mist-26.9M-6hk5coof-dn)             | MIST-28M | Donor Number                                                |
+| [mist-26.9M-0vxdbm36-kt](https://huggingface.co/mist-models/mist-26.9M-0vxdbm36-kt)             | MIST-28M | Kamlet-Taft Solvochromatic Parameters                       |
+| [mist-26.9M-b302p09x-bp](https://huggingface.co/mist-models/mist-26.9M-b302p09x-bp)             | MIST-28M | Boiling Point (Part of Characteristic Temperatures Dataset) |
+| [mist-26.9M-cyuo2xb6-fp](https://huggingface.co/mist-models/mist-26.9M-cyuo2xb6-fp)             | MIST-28M | Flash Point (Part of Characteristic Temperatures Dataset)   |
+| [mist-26.9M-y3ge5pf9-mp](https://huggingface.co/mist-models/mist-26.9M-y3ge5pf9-mp)             | MIST-28M | Melting Point (Part of Characteristic Temperatures Dataset) |
 ### Finetuned Multi-Task Models
 These are additional multi-target finetuned models consisting of a MIST encoder and task network.
+| Folder                                                                 | Encoder  | Dataset                               |
+| ---------------------------------------------------------------------- | :------: | ------------------------------------- |
+| [mist-26.9M-kkgx0omx-qm9](https://huggingface.co/mist-models/mist-26.9M-kkgx0omx-qm9)            | MIST-28M | QM9 Dataset with SMILES randomization |
+| [mist-28M-ttqcvt6fs-toxcast](https://huggingface.co/mist-models/mist-28M-ttqcvt6fs-toxcast)      | MIST-28M | ToxCast                               |
+| [mist-28M-yr1urd2c-muv](https://huggingface.co/mist-models/mist-28M-yr1urd2c-muv)                | MIST-28M | Maximum Unbiased Validation (MUV)     |
 ### Finetuned Mixture Models
 These models consist of a MIST-encoder and physics informed task network for mixture property prediction.
+| Folder                                                                 | Encoder  | Dataset                                         |
+| ---------------------------------------------------------------------- | :------: | ----------------------------------------------- |
+| [mist-conductivity-28M-2mpg8dcd](https://huggingface.co/mist-models/mist-conductivity-28M-2mpg8dcd) | MIST-28M | Ionic Conductivity                              |
+| [mist-mixtures-zffffbex](https://huggingface.co/mist-models/mist-mixtures-zffffbex)              | MIST-28M | Excess Density, Molar Volume and Molar Enthalpy |
 ## Citation