Update README.md
Browse files
README.md
CHANGED
|
@@ -3,6 +3,9 @@ license: mit
|
|
| 3 |
tags:
|
| 4 |
- chemistry
|
| 5 |
- smiles
|
|
|
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
# Model Card for Model hogru/MolReactGen-GuacaMol-Molecules
|
|
@@ -44,7 +47,6 @@ The main use of this model is to pass the master's examination of the author ;-)
|
|
| 44 |
The model can be used in a Hugging Face text generation pipeline. For the intended use case a wrapper around the raw text generation pipeline is needed. This is the [`generate.py` from the repository](https://github.com/hogru/MolReactGen/blob/main/src/molreactgen/generate.py).
|
| 45 |
The model has a default `GenerationConfig()` (`generation_config.json`) which can be overwritten. Depending on the number of molecules to be generated (`num_return_sequences` in the `JSON` file) this might take a while. The generation code above shows a progress bar during generation.
|
| 46 |
|
| 47 |
-
|
| 48 |
## Bias, Risks, and Limitations
|
| 49 |
|
| 50 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
|
@@ -63,11 +65,11 @@ The model generates molecules that are similar to the GuacaMol training data, wh
|
|
| 63 |
|
| 64 |
The default Hugging Face `Trainer()` has been used, with an `EarlyStoppingCallback()`.
|
| 65 |
|
| 66 |
-
###
|
| 67 |
|
| 68 |
The training data was pre-processed with a `PreTrainedTokenizerFast()` trained on the training data with a character level pre-tokenizer and Unigram as the sub-word tokenization algorithm with a vocabulary size of 88. Other tokenizers can be configured.
|
| 69 |
|
| 70 |
-
###
|
| 71 |
|
| 72 |
- **Batch size:** 64
|
| 73 |
- **Gradient accumulation steps:** 4
|
|
@@ -86,7 +88,7 @@ More configuration (options) can be found in the [`conf`](https://github.com/hog
|
|
| 86 |
|
| 87 |
Please see the slides / the poster mentioned above.
|
| 88 |
|
| 89 |
-
###
|
| 90 |
|
| 91 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 92 |
|
|
|
|
| 3 |
tags:
|
| 4 |
- chemistry
|
| 5 |
- smiles
|
| 6 |
+
widget:
|
| 7 |
+
- text: "^"
|
| 8 |
+
example_title: "Sample molecule | SMILES"
|
| 9 |
---
|
| 10 |
|
| 11 |
# Model Card for Model hogru/MolReactGen-GuacaMol-Molecules
|
|
|
|
| 47 |
The model can be used in a Hugging Face text generation pipeline. For the intended use case a wrapper around the raw text generation pipeline is needed. This is the [`generate.py` from the repository](https://github.com/hogru/MolReactGen/blob/main/src/molreactgen/generate.py).
|
| 48 |
The model has a default `GenerationConfig()` (`generation_config.json`) which can be overwritten. Depending on the number of molecules to be generated (`num_return_sequences` in the `JSON` file) this might take a while. The generation code above shows a progress bar during generation.
|
| 49 |
|
|
|
|
| 50 |
## Bias, Risks, and Limitations
|
| 51 |
|
| 52 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
|
|
|
| 65 |
|
| 66 |
The default Hugging Face `Trainer()` has been used, with an `EarlyStoppingCallback()`.
|
| 67 |
|
| 68 |
+
### Preprocessing
|
| 69 |
|
| 70 |
The training data was pre-processed with a `PreTrainedTokenizerFast()` trained on the training data with a character level pre-tokenizer and Unigram as the sub-word tokenization algorithm with a vocabulary size of 88. Other tokenizers can be configured.
|
| 71 |
|
| 72 |
+
### Training Hyperparameters
|
| 73 |
|
| 74 |
- **Batch size:** 64
|
| 75 |
- **Gradient accumulation steps:** 4
|
|
|
|
| 88 |
|
| 89 |
Please see the slides / the poster mentioned above.
|
| 90 |
|
| 91 |
+
### Metrics
|
| 92 |
|
| 93 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 94 |
|