hogru
/

MolReactGen-GuacaMol-Molecules

@@ -3,6 +3,9 @@ license: mit
 tags:
 - chemistry
 - smiles
 ---
 # Model Card for Model hogru/MolReactGen-GuacaMol-Molecules
@@ -44,7 +47,6 @@ The main use of this model is to pass the master's examination of the author ;-)
 The model can be used in a Hugging Face text generation pipeline. For the intended use case a wrapper around the raw text generation pipeline is needed. This is the [`generate.py` from the repository](https://github.com/hogru/MolReactGen/blob/main/src/molreactgen/generate.py).
 The model has a default `GenerationConfig()` (`generation_config.json`) which can be overwritten. Depending on the number of molecules to be generated (`num_return_sequences` in the `JSON` file) this might take a while. The generation code above shows a progress bar during generation.
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
@@ -63,11 +65,11 @@ The model generates molecules that are similar to the GuacaMol training data, wh
 The default Hugging Face `Trainer()` has been used, with an `EarlyStoppingCallback()`.
-#### Preprocessing
 The training data was pre-processed with a `PreTrainedTokenizerFast()` trained on the training data with a character level pre-tokenizer and Unigram as the sub-word tokenization algorithm with a vocabulary size of 88. Other tokenizers can be configured.
-#### Training Hyperparameters
 - **Batch size:** 64
 - **Gradient accumulation steps:** 4
@@ -86,7 +88,7 @@ More configuration (options) can be found in the [`conf`](https://github.com/hog
 Please see the slides / the poster mentioned above.
-#### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->

 tags:
 - chemistry
 - smiles
+widget:
+- text: "^"
+  example_title: "Sample molecule | SMILES"
 ---
 # Model Card for Model hogru/MolReactGen-GuacaMol-Molecules
 The model can be used in a Hugging Face text generation pipeline. For the intended use case a wrapper around the raw text generation pipeline is needed. This is the [`generate.py` from the repository](https://github.com/hogru/MolReactGen/blob/main/src/molreactgen/generate.py).
 The model has a default `GenerationConfig()` (`generation_config.json`) which can be overwritten. Depending on the number of molecules to be generated (`num_return_sequences` in the `JSON` file) this might take a while. The generation code above shows a progress bar during generation.
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 The default Hugging Face `Trainer()` has been used, with an `EarlyStoppingCallback()`.
+### Preprocessing
 The training data was pre-processed with a `PreTrainedTokenizerFast()` trained on the training data with a character level pre-tokenizer and Unigram as the sub-word tokenization algorithm with a vocabulary size of 88. Other tokenizers can be configured.
+### Training Hyperparameters
 - **Batch size:** 64
 - **Gradient accumulation steps:** 4
 Please see the slides / the poster mentioned above.
+### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->