eacortes commited on
Commit
aa361a7
·
verified ·
1 Parent(s): dc33cfb

Upload 14 files

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,515 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - dense
9
+ - generated_from_trainer
10
+ - dataset_size:19692766
11
+ - loss:Matryoshka2dLoss
12
+ - loss:MatryoshkaLoss
13
+ - loss:TanimotoSentLoss
14
+ base_model: Derify/ChemBERTa-druglike
15
+ widget:
16
+ - source_sentence: CC1CCc2c(N)nc(C3CCCC3)n2C1
17
+ sentences:
18
+ - CC1CCc2c(N)nc(OC3CC3)n2C1
19
+ - CN1CC[NH+](C[C@H](O)C2CC2)C2(CCCCC2)C1
20
+ - Cc1c(F)cc(CNCC2CCC(C3CCC(C)CO3)CO2)cc1F
21
+ - source_sentence: CC(CCCO)NC(=O)CNc1ccccc1
22
+ sentences:
23
+ - CC(CCCO)N[C@H]1CCCN(Nc2ccccc2)[C@H]1C
24
+ - Cc1ccc(OC2=NCCO2)nc1
25
+ - Cc1ccccc1C#Cc1ccccc1N(O)c1ccccc1
26
+ - source_sentence: CCCCCCCc1ccc(CC=N[NH+]=C(N)N)cc1
27
+ sentences:
28
+ - COCC1(N2CCN(C)CC2)CCC[NH+]1Cc1cnc(N(C)C)nc1
29
+ - Cc1ccc(N=C(c2ccccc2)c2ccc(-n3ccnn3)cc2)cc1
30
+ - CCCCCCCc1cncc(CC=N[NH+]=C(N)N)c1
31
+ - source_sentence: CC(=CCCS(=O)(=O)[O-])C(=O)OCCCS(=O)(=O)[O-]
32
+ sentences:
33
+ - CC(=CCCS(=O)(=O)[O-])C(=O)OCCCS(=O)(=O)[O-]
34
+ - CCCCCOc1ccc(NC(=S)NC=O)cc1
35
+ - CCC(=O)N1CCCC(NC(=O)c2ccc(S(=O)(=O)N(C)C)cc2)C1
36
+ - source_sentence: Clc1nccc(C#CCCc2nc3ccccc3o2)n1
37
+ sentences:
38
+ - O=Cc1nc2ccccc2o1
39
+ - O=C([O-])COc1ccc(CCCS(=O)(=O)c2ccc(Cl)cc2)cc1NC(=O)c1cccc(C=Cc2nc3ccccc3s2)c1
40
+ - O[C@H]1CN(C(Cc2ccccc2)c2ccccc2)C[C@@H]1Cc1cnc[nH]1
41
+ datasets:
42
+ - Derify/pubchem_10m_genmol_similarity
43
+ pipeline_tag: sentence-similarity
44
+ library_name: sentence-transformers
45
+ metrics:
46
+ - spearman
47
+ model-index:
48
+ - name: SentenceTransformer based on Derify/ChemBERTa-druglike
49
+ results:
50
+ - task:
51
+ type: semantic-similarity
52
+ name: Semantic Similarity
53
+ dataset:
54
+ name: pubchem 10m genmol similarity
55
+ type: pubchem_10m_genmol_similarity
56
+ metrics:
57
+ - type: spearman
58
+ value: 0.9932120589500998
59
+ name: Spearman
60
+ ---
61
+
62
+ # SentenceTransformer based on Derify/ChemBERTa-druglike
63
+
64
+ This is a [Chem-MRL](https://github.com/emapco/chem-mrl) ([sentence-transformers](https://www.SBERT.net)) model finetuned from [Derify/ChemBERTa-druglike](https://huggingface.co/Derify/ChemBERTa-druglike) on the [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity) dataset. It maps SMILES to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, database indexing, molecular classification, clustering, and more.
65
+
66
+ ## Model Details
67
+
68
+ ### Model Description
69
+ - **Model Type:** ChemMRL (Sentence Transformer)
70
+ - **Base model:** [Derify/ChemBERTa-druglike](https://huggingface.co/Derify/ChemBERTa-druglike) <!-- at revision 5e76559157fde4f1aead643d9e1d402289f522af -->
71
+ - **Maximum Sequence Length:** 128 tokens
72
+ - **Output Dimensionality:** 1024 dimensions
73
+ - **Similarity Function:** Tanimoto
74
+ - **Training Dataset:**
75
+ - [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity)
76
+ - **Language:** en
77
+ - **License:** [Apache-2.0](https://huggingface.co/Derify/ChemBERTa-druglike/blob/main/LICENSE)
78
+
79
+ ### Model Sources
80
+
81
+ - **Repository:** [Chem-MRL on GitHub](https://github.com/emapco/chem-mrl)
82
+ - **Demo App Repository:** [Chem-MRL-demo on GitHub](https://github.com/emapco/chem-mrl-demo)
83
+
84
+ ### Full Model Architecture
85
+
86
+ ```
87
+ SentenceTransformer(
88
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'RobertaModel'})
89
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
90
+ (2): Normalize()
91
+ )
92
+ ```
93
+
94
+ ## Usage
95
+
96
+ ### Direct Usage (Sentence Transformers)
97
+
98
+ First install the Sentence Transformers library:
99
+
100
+ ```bash
101
+ pip install -U sentence-transformers
102
+ ```
103
+
104
+ Then you can load this model and run inference.
105
+ ```python
106
+ from chem_mrl import ChemMRL
107
+
108
+ # Download from the 🤗 Hub
109
+ model = ChemMRL("Derify/ChemMRL-beta")
110
+ # Run inference
111
+ sentences = [
112
+ "Clc1nccc(C#CCCc2nc3ccccc3o2)n1",
113
+ "O=Cc1nc2ccccc2o1",
114
+ "O[C@H]1CN(C(Cc2ccccc2)c2ccccc2)C[C@@H]1Cc1cnc[nH]1",
115
+ ]
116
+ embeddings = model.encode(sentences)
117
+ print(embeddings.shape)
118
+ # [3, 1024]
119
+
120
+ # Get the similarity scores for the embeddings
121
+ similarities = model.similarity(embeddings, embeddings)
122
+ print(similarities)
123
+ # tensor([[1.0000, 0.4848, 0.2158],
124
+ # [0.4848, 1.0000, 0.1735],
125
+ # [0.2158, 0.1735, 1.0000]])
126
+ ```
127
+
128
+ ## Evaluation
129
+
130
+ ### Metrics
131
+
132
+ #### Semantic Similarity
133
+
134
+ * Dataset: `pubchem_10m_genmol_similarity`
135
+ * Evaluated with <code>chem_mrl.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator</code> with these parameters:
136
+ ```json
137
+ {
138
+ "precision": "float32"
139
+ }
140
+ ```
141
+
142
+ | Metric | Value |
143
+ | :----------- | :--------- |
144
+ | **spearman** | **0.9932** |
145
+
146
+ <!--
147
+ ## Bias, Risks and Limitations
148
+
149
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
150
+ -->
151
+
152
+ <!--
153
+ ### Recommendations
154
+
155
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
156
+ -->
157
+
158
+ ## Training Details
159
+
160
+ ### Training Dataset
161
+
162
+ #### pubchem_10m_genmol_similarity
163
+
164
+ * Dataset: [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity) at [f68d779](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity/tree/f68d779a6284578132a3922655f6b1f74c576642)
165
+ * Size: 19,692,766 training samples
166
+ * Columns: <code>smiles_a</code>, <code>smiles_b</code>, and <code>label</code>
167
+ * Approximate statistics based on the first 1000 samples:
168
+ | | smiles_a | smiles_b | label |
169
+ | :------ | :---------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------- | :-------------------------------------------------------------- |
170
+ | type | string | string | float |
171
+ | details | <ul><li>min: 17 tokens</li><li>mean: 39.66 tokens</li><li>max: 119 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 38.29 tokens</li><li>max: 115 tokens</li></ul> | <ul><li>min: 0.02</li><li>mean: 0.57</li><li>max: 1.0</li></ul> |
172
+ * Samples:
173
+ | smiles_a | smiles_b | label |
174
+ | :--------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------- | :------------------------------ |
175
+ | <code>COc1ccc(NC(=O)C2CC[NH+](C(C)C(=O)Nc3ccc(C(=O)Nc4ccc(F)c(F)c4)cc3C)CC2)cc1NC(=O)C1CCCCC1</code> | <code>Cc1cc(C(=O)Nc2ccc(F)c(F)c2)ccc1NC(=O)C(C)[NH+]1CCC(C(=O)Nc2cccc(NC(=O)C3CCCCC3)c2)CC1</code> | <code>0.8495575189590454</code> |
176
+ | <code>OCCN1CC[NH+](Cc2ccccc2OC2CC2)CC1</code> | <code>OCCN1CC[NH+](Cc2ccccc2On2cccn2)CC1</code> | <code>0.6615384817123413</code> |
177
+ | <code>CC1CN(C(=O)C2CC[NH+](Cc3cccc(C(N)=O)c3)CC2)CC(C)O1</code> | <code>CC1CN(C(=O)C2CC[NH+](Cc3ccccc3)CC2)CC(C)O1</code> | <code>0.7123287916183472</code> |
178
+ * Loss: [<code>Matryoshka2dLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshka2dloss) with these parameters:
179
+ ```json
180
+ {
181
+ "loss": "TanimotoSentLoss",
182
+ "n_layers_per_step": -1,
183
+ "last_layer_weight": 2.0,
184
+ "prior_layers_weight": 1.0,
185
+ "kl_div_weight": 0.5,
186
+ "kl_temperature": 0.3,
187
+ "matryoshka_dims": [
188
+ 1024,
189
+ 512,
190
+ 256,
191
+ 128,
192
+ 64,
193
+ 32,
194
+ 16,
195
+ 8
196
+ ],
197
+ "matryoshka_weights": [
198
+ 1,
199
+ 1,
200
+ 1,
201
+ 1,
202
+ 1,
203
+ 1,
204
+ 1,
205
+ 1
206
+ ],
207
+ "n_dims_per_step": -1
208
+ }
209
+ ```
210
+
211
+ ### Training Hyperparameters
212
+ #### Non-Default Hyperparameters
213
+
214
+ - `eval_strategy`: steps
215
+ - `per_device_train_batch_size`: 64
216
+ - `per_device_eval_batch_size`: 128
217
+ - `learning_rate`: 8e-06
218
+ - `weight_decay`: 6.505130550397454e-06
219
+ - `warmup_ratio`: 0.2
220
+ - `data_seed`: 42
221
+ - `fp16`: True
222
+ - `tf32`: True
223
+ - `load_best_model_at_end`: True
224
+ - `optim`: adamw_apex_fused
225
+ - `dataloader_pin_memory`: False
226
+
227
+ #### All Hyperparameters
228
+ <details><summary>Click to expand</summary>
229
+
230
+ - `overwrite_output_dir`: False
231
+ - `do_predict`: False
232
+ - `eval_strategy`: steps
233
+ - `prediction_loss_only`: True
234
+ - `per_device_train_batch_size`: 64
235
+ - `per_device_eval_batch_size`: 128
236
+ - `per_gpu_train_batch_size`: None
237
+ - `per_gpu_eval_batch_size`: None
238
+ - `gradient_accumulation_steps`: 1
239
+ - `eval_accumulation_steps`: None
240
+ - `torch_empty_cache_steps`: None
241
+ - `learning_rate`: 8e-06
242
+ - `weight_decay`: 6.505130550397454e-06
243
+ - `adam_beta1`: 0.9
244
+ - `adam_beta2`: 0.999
245
+ - `adam_epsilon`: 1e-08
246
+ - `max_grad_norm`: 1.0
247
+ - `num_train_epochs`: 3
248
+ - `max_steps`: -1
249
+ - `lr_scheduler_type`: linear
250
+ - `lr_scheduler_kwargs`: {}
251
+ - `warmup_ratio`: 0.2
252
+ - `warmup_steps`: 0
253
+ - `log_level`: passive
254
+ - `log_level_replica`: warning
255
+ - `log_on_each_node`: True
256
+ - `logging_nan_inf_filter`: True
257
+ - `save_safetensors`: True
258
+ - `save_on_each_node`: False
259
+ - `save_only_model`: False
260
+ - `restore_callback_states_from_checkpoint`: False
261
+ - `no_cuda`: False
262
+ - `use_cpu`: False
263
+ - `use_mps_device`: False
264
+ - `seed`: 42
265
+ - `data_seed`: 42
266
+ - `jit_mode_eval`: False
267
+ - `use_ipex`: False
268
+ - `bf16`: False
269
+ - `fp16`: True
270
+ - `fp16_opt_level`: O1
271
+ - `half_precision_backend`: auto
272
+ - `bf16_full_eval`: False
273
+ - `fp16_full_eval`: False
274
+ - `tf32`: True
275
+ - `local_rank`: 0
276
+ - `ddp_backend`: None
277
+ - `tpu_num_cores`: None
278
+ - `tpu_metrics_debug`: False
279
+ - `debug`: []
280
+ - `dataloader_drop_last`: False
281
+ - `dataloader_num_workers`: 0
282
+ - `dataloader_prefetch_factor`: None
283
+ - `past_index`: -1
284
+ - `disable_tqdm`: False
285
+ - `remove_unused_columns`: True
286
+ - `label_names`: None
287
+ - `load_best_model_at_end`: True
288
+ - `ignore_data_skip`: False
289
+ - `fsdp`: []
290
+ - `fsdp_min_num_params`: 0
291
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
292
+ - `fsdp_transformer_layer_cls_to_wrap`: None
293
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
294
+ - `deepspeed`: None
295
+ - `label_smoothing_factor`: 0.0
296
+ - `optim`: adamw_apex_fused
297
+ - `optim_args`: None
298
+ - `adafactor`: False
299
+ - `group_by_length`: False
300
+ - `length_column_name`: length
301
+ - `ddp_find_unused_parameters`: None
302
+ - `ddp_bucket_cap_mb`: None
303
+ - `ddp_broadcast_buffers`: False
304
+ - `dataloader_pin_memory`: False
305
+ - `dataloader_persistent_workers`: False
306
+ - `skip_memory_metrics`: True
307
+ - `use_legacy_prediction_loop`: False
308
+ - `push_to_hub`: False
309
+ - `hub_model_id`: None
310
+ - `hub_strategy`: every_save
311
+ - `hub_private_repo`: None
312
+ - `hub_always_push`: False
313
+ - `hub_revision`: None
314
+ - `gradient_checkpointing`: False
315
+ - `gradient_checkpointing_kwargs`: None
316
+ - `include_inputs_for_metrics`: False
317
+ - `include_for_metrics`: []
318
+ - `eval_do_concat_batches`: True
319
+ - `fp16_backend`: auto
320
+ - `push_to_hub_model_id`: None
321
+ - `push_to_hub_organization`: None
322
+ - `mp_parameters`:
323
+ - `auto_find_batch_size`: False
324
+ - `full_determinism`: False
325
+ - `torchdynamo`: None
326
+ - `ray_scope`: last
327
+ - `ddp_timeout`: 1800
328
+ - `torch_compile`: False
329
+ - `torch_compile_backend`: None
330
+ - `torch_compile_mode`: None
331
+ - `include_tokens_per_second`: False
332
+ - `include_num_input_tokens_seen`: False
333
+ - `neftune_noise_alpha`: None
334
+ - `optim_target_modules`: None
335
+ - `batch_eval_metrics`: False
336
+ - `eval_on_start`: False
337
+ - `use_liger_kernel`: False
338
+ - `liger_kernel_config`: None
339
+ - `eval_use_gather_object`: False
340
+ - `average_tokens_across_devices`: False
341
+ - `prompts`: None
342
+ - `batch_sampler`: batch_sampler
343
+ - `multi_dataset_batch_sampler`: proportional
344
+ - `router_mapping`: {}
345
+ - `learning_rate_mapping`: {}
346
+
347
+ </details>
348
+
349
+ ### Training Logs
350
+ <details><summary>Click to expand</summary>
351
+
352
+ | Epoch | Step | Training Loss | pubchem_10m_genmol_similarity_spearman |
353
+ | :----: | :----: | :-----------: | :------------------------------------: |
354
+ | 0.0796 | 24500 | 121.4633 | - |
355
+ | 0.08 | 24616 | - | 0.9739 |
356
+ | 0.1592 | 49000 | 118.6111 | - |
357
+ | 0.16 | 49232 | - | 0.9817 |
358
+ | 0.2389 | 73500 | 117.491 | - |
359
+ | 0.24 | 73848 | - | 0.9848 |
360
+ | 0.3185 | 98000 | 116.3786 | - |
361
+ | 0.32 | 98464 | - | 0.9865 |
362
+ | 0.3997 | 123000 | 115.9773 | - |
363
+ | 0.4 | 123080 | - | 0.9873 |
364
+ | 0.4794 | 147500 | 115.2441 | - |
365
+ | 0.48 | 147696 | - | 0.9885 |
366
+ | 0.5590 | 172000 | 114.8674 | - |
367
+ | 0.56 | 172312 | - | 0.9887 |
368
+ | 0.6386 | 196500 | 114.6483 | - |
369
+ | 0.64 | 196928 | - | 0.9892 |
370
+ | 0.7199 | 221500 | 114.0507 | - |
371
+ | 0.72 | 221544 | - | 0.9898 |
372
+ | 0.7995 | 246000 | 113.5606 | - |
373
+ | 0.8 | 246160 | - | 0.9902 |
374
+ | 0.8791 | 270500 | 113.2762 | - |
375
+ | 0.88 | 270776 | - | 0.9907 |
376
+ | 0.9587 | 295000 | 113.3295 | - |
377
+ | 0.96 | 295392 | - | 0.9908 |
378
+ | 1.0400 | 320000 | 112.9253 | - |
379
+ | 1.04 | 320008 | - | 0.9909 |
380
+ | 1.1196 | 344500 | 112.584 | - |
381
+ | 1.12 | 344624 | - | 0.9910 |
382
+ | 1.1992 | 369000 | 112.616 | - |
383
+ | 1.2 | 369240 | - | 0.9916 |
384
+ | 1.2788 | 393500 | 112.4692 | - |
385
+ | 1.28 | 393856 | - | 0.9914 |
386
+ | 1.3585 | 418000 | 112.2679 | - |
387
+ | 1.3600 | 418472 | - | 0.9917 |
388
+ | 1.4397 | 443000 | 112.1639 | - |
389
+ | 1.44 | 443088 | - | 0.9919 |
390
+ | 1.5193 | 467500 | 112.1139 | - |
391
+ | 1.52 | 467704 | - | 0.9921 |
392
+ | 1.5990 | 492000 | 111.8096 | - |
393
+ | 1.6 | 492320 | - | 0.9923 |
394
+ | 1.6786 | 516500 | 111.8252 | - |
395
+ | 1.6800 | 516936 | - | 0.9922 |
396
+ | 1.7598 | 541500 | 111.836 | - |
397
+ | 1.76 | 541552 | - | 0.9924 |
398
+ | 1.8395 | 566000 | 111.8471 | - |
399
+ | 1.8400 | 566168 | - | 0.9924 |
400
+ | 1.9191 | 590500 | 111.7778 | - |
401
+ | 1.92 | 590784 | - | 0.9925 |
402
+ | 1.9987 | 615000 | 111.4892 | - |
403
+ | 2.0 | 615400 | - | 0.9927 |
404
+ | 2.0799 | 640000 | 111.2659 | - |
405
+ | 2.08 | 640016 | - | 0.9928 |
406
+ | 2.1596 | 664500 | 111.3635 | - |
407
+ | 2.16 | 664632 | - | 0.9927 |
408
+ | 2.2392 | 689000 | 111.0114 | - |
409
+ | 2.24 | 689248 | - | 0.9928 |
410
+ | 2.3188 | 713500 | 111.0559 | - |
411
+ | 2.32 | 713864 | - | 0.9929 |
412
+ | 2.3984 | 738000 | 110.5276 | - |
413
+ | 2.4 | 738480 | - | 0.9929 |
414
+ | 2.4797 | 763000 | 110.9828 | - |
415
+ | 2.48 | 763096 | - | 0.9930 |
416
+ | 2.5593 | 787500 | 110.8404 | - |
417
+ | 2.56 | 787712 | - | 0.9930 |
418
+ | 2.6389 | 812000 | 111.1937 | - |
419
+ | 2.64 | 812328 | - | 0.9931 |
420
+ | 2.7186 | 836500 | 110.6662 | - |
421
+ | 2.7200 | 836944 | - | 0.9931 |
422
+ | 2.7998 | 861500 | 110.7714 | - |
423
+ | 2.8 | 861560 | - | 0.9932 |
424
+ | 2.8794 | 886000 | 110.7638 | - |
425
+ | 2.88 | 886176 | - | 0.9932 |
426
+ | 2.9591 | 910500 | 110.7021 | - |
427
+ | 2.96 | 910792 | - | 0.9932 |
428
+ | 2.9997 | 923000 | 110.6097 | - |
429
+ </details>
430
+
431
+ ### Training Hardware
432
+ - **On Cloud**: No
433
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
434
+ - **CPU Model**: AMD Ryzen 7 3700X 8-Core Processor
435
+ - **RAM Size**: 62.70 GB
436
+
437
+ ### Framework Versions
438
+ - Python: 3.12.11
439
+ - Sentence Transformers: 5.0.0
440
+ - Transformers: 4.53.3
441
+ - PyTorch: 2.7.1+cu126
442
+ - Accelerate: 1.9.0
443
+ - Datasets: 3.6.0
444
+ - Tokenizers: 0.21.2
445
+
446
+ ## Citation
447
+
448
+ ### BibTeX
449
+
450
+ #### Sentence Transformers
451
+ ```bibtex
452
+ @inproceedings{reimers-2019-sentence-bert,
453
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
454
+ author = "Reimers, Nils and Gurevych, Iryna",
455
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
456
+ month = "11",
457
+ year = "2019",
458
+ publisher = "Association for Computational Linguistics",
459
+ url = "https://arxiv.org/abs/1908.10084",
460
+ }
461
+ ```
462
+
463
+ #### Matryoshka2dLoss
464
+ ```bibtex
465
+ @misc{li20242d,
466
+ title={2D Matryoshka Sentence Embeddings},
467
+ author={Xianming Li and Zongxi Li and Jing Li and Haoran Xie and Qing Li},
468
+ year={2024},
469
+ eprint={2402.14776},
470
+ archivePrefix={arXiv},
471
+ primaryClass={cs.CL}
472
+ }
473
+ ```
474
+
475
+ #### MatryoshkaLoss
476
+ ```bibtex
477
+ @misc{kusupati2024matryoshka,
478
+ title={Matryoshka Representation Learning},
479
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
480
+ year={2024},
481
+ eprint={2205.13147},
482
+ archivePrefix={arXiv},
483
+ primaryClass={cs.LG}
484
+ }
485
+ ```
486
+
487
+ #### CoSENTLoss
488
+ ```bibtex
489
+ @online{kexuefm-8847,
490
+ title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
491
+ author={Su Jianlin},
492
+ year={2022},
493
+ month={Jan},
494
+ url={https://kexue.fm/archives/8847},
495
+ }
496
+ ```
497
+
498
+ #### TanimotoSentLoss
499
+ ```bibtex
500
+ @online{emapco-chem-mrl-tanimotosentloss,
501
+ title={TanimotoSentLoss: Tanimoto Loss for SMILES Embeddings},
502
+ author={Emmanuel Cortes},
503
+ year={2025},
504
+ month={Jan},
505
+ url={https://github.com/emapco/chem-mrl/blob/main/chem_mrl/losses/TanimotoLoss.py},
506
+ }
507
+ ```
508
+
509
+ ## Model Card Authors
510
+
511
+ [@eacortes](https://huggingface.co/eacortes)
512
+
513
+ ## Model Card Contact
514
+
515
+ Manny Cortes (manny@derifyai.com)
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 4096,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 130,
16
+ "model_type": "roberta",
17
+ "num_attention_heads": 8,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.53.3",
23
+ "type_vocab_size": 1,
24
+ "use_cache": false,
25
+ "vocab_size": 581
26
+ }
config_chem_mrl.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": "0.7.2",
3
+ "embedding_pooling": "mean",
4
+ "eval_metric": "spearman",
5
+ "eval_similarity_fct": "tanimoto",
6
+ "kl_div_weight": 0.5,
7
+ "kl_temperature": 0.3,
8
+ "last_layer_weight": 2.0,
9
+ "loss_func": "tanimotosentloss",
10
+ "model_name": "Derify/ChemBERTa-druglike",
11
+ "mrl_dimension_weights": [
12
+ 1,
13
+ 1,
14
+ 1,
15
+ 1,
16
+ 1,
17
+ 1,
18
+ 1,
19
+ 1
20
+ ],
21
+ "mrl_dimensions": [
22
+ 1024,
23
+ 512,
24
+ 256,
25
+ 128,
26
+ 64,
27
+ 32,
28
+ 16,
29
+ 8
30
+ ],
31
+ "n_dims_per_step": -1,
32
+ "n_layers_per_step": -1,
33
+ "prior_layers_weight": 1.0,
34
+ "tanimoto_similarity_loss_func": null,
35
+ "use_2d_matryoshka": true,
36
+ "use_query_tokenizer": false
37
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.0.0",
5
+ "transformers": "4.53.3",
6
+ "pytorch": "2.7.1+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "tanimoto"
14
+ }
merges.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ #version: 0.2
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:388917cb9c0d52f13ac2a3ac337e4d1992d99de10fc65cb7f3859e6f17369d33
3
+ size 611764232
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
similarity_evaluation_pubchem_10m_genmol_similarity_float32_results.csv ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,steps,spearman
2
+ 0.08,24616,0.9738617225555145
3
+ 0.16,49232,0.981745903231021
4
+ 0.24,73848,0.9848138153161644
5
+ 0.32,98464,0.9865317179098163
6
+ 0.4,123080,0.9873270071133374
7
+ 0.48,147696,0.9884562662848129
8
+ 0.56,172312,0.9886743337938196
9
+ 0.64,196928,0.9891768507175762
10
+ 0.72,221544,0.9897869837572465
11
+ 0.8,246160,0.9901589904466065
12
+ 0.88,270776,0.9906949631377245
13
+ 0.96,295392,0.9908014916462329
14
+ 1.04,320008,0.9909399147553232
15
+ 1.12,344624,0.9910215920276151
16
+ 1.2,369240,0.9915920186173117
17
+ 1.28,393856,0.9914454622783068
18
+ 1.3599999999999999,418472,0.9917370160239519
19
+ 1.44,443088,0.9918566710914087
20
+ 1.52,467704,0.9921180116562484
21
+ 1.6,492320,0.992290711551611
22
+ 1.6800000000000002,516936,0.9921708494174164
23
+ 1.76,541552,0.9924133694566644
24
+ 1.8399999999999999,566168,0.9923654392219101
25
+ 1.92,590784,0.9925184941462679
26
+ 2.0,615400,0.9926690989243799
27
+ 2.08,640016,0.9927617759812062
28
+ 2.16,664632,0.9926963360759598
29
+ 2.24,689248,0.9928226049510629
30
+ 2.32,713864,0.9928830258172597
31
+ 2.4,738480,0.9929211735586345
32
+ 2.48,763096,0.9929776131958243
33
+ 2.56,787712,0.9930113241157583
34
+ 2.64,812328,0.9930781880900686
35
+ 2.7199999999999998,836944,0.9931109671330001
36
+ 2.8,861560,0.9931538502353688
37
+ 2.88,886176,0.9932005964928086
38
+ 2.96,910792,0.9932120589500998
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,684 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 128,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": "BatchLongest",
11
+ "direction": "Right",
12
+ "pad_to_multiple_of": null,
13
+ "pad_id": 1,
14
+ "pad_type_id": 0,
15
+ "pad_token": "<pad>"
16
+ },
17
+ "added_tokens": [
18
+ {
19
+ "id": 0,
20
+ "content": "<s>",
21
+ "single_word": false,
22
+ "lstrip": false,
23
+ "rstrip": false,
24
+ "normalized": true,
25
+ "special": true
26
+ },
27
+ {
28
+ "id": 1,
29
+ "content": "<pad>",
30
+ "single_word": false,
31
+ "lstrip": false,
32
+ "rstrip": false,
33
+ "normalized": true,
34
+ "special": true
35
+ },
36
+ {
37
+ "id": 2,
38
+ "content": "</s>",
39
+ "single_word": false,
40
+ "lstrip": false,
41
+ "rstrip": false,
42
+ "normalized": true,
43
+ "special": true
44
+ },
45
+ {
46
+ "id": 3,
47
+ "content": "<unk>",
48
+ "single_word": false,
49
+ "lstrip": false,
50
+ "rstrip": false,
51
+ "normalized": true,
52
+ "special": true
53
+ },
54
+ {
55
+ "id": 4,
56
+ "content": "<mask>",
57
+ "single_word": false,
58
+ "lstrip": true,
59
+ "rstrip": false,
60
+ "normalized": false,
61
+ "special": true
62
+ }
63
+ ],
64
+ "normalizer": null,
65
+ "pre_tokenizer": {
66
+ "type": "ByteLevel",
67
+ "add_prefix_space": false,
68
+ "trim_offsets": true,
69
+ "use_regex": true
70
+ },
71
+ "post_processor": {
72
+ "type": "RobertaProcessing",
73
+ "sep": [
74
+ "</s>",
75
+ 2
76
+ ],
77
+ "cls": [
78
+ "<s>",
79
+ 0
80
+ ],
81
+ "trim_offsets": true,
82
+ "add_prefix_space": false
83
+ },
84
+ "decoder": {
85
+ "type": "ByteLevel",
86
+ "add_prefix_space": true,
87
+ "trim_offsets": true,
88
+ "use_regex": true
89
+ },
90
+ "model": {
91
+ "type": "BPE",
92
+ "dropout": null,
93
+ "unk_token": null,
94
+ "continuing_subword_prefix": "",
95
+ "end_of_word_suffix": "",
96
+ "fuse_unk": false,
97
+ "byte_fallback": false,
98
+ "ignore_merges": false,
99
+ "vocab": {
100
+ "<s>": 0,
101
+ "<pad>": 1,
102
+ "</s>": 2,
103
+ "<unk>": 3,
104
+ "<mask>": 4,
105
+ "c": 5,
106
+ "C": 6,
107
+ "(": 7,
108
+ ")": 8,
109
+ "O": 9,
110
+ "1": 10,
111
+ "2": 11,
112
+ "=": 12,
113
+ "N": 13,
114
+ ".": 14,
115
+ "n": 15,
116
+ "3": 16,
117
+ "F": 17,
118
+ "Cl": 18,
119
+ ">>": 19,
120
+ "~": 20,
121
+ "-": 21,
122
+ "4": 22,
123
+ "[C@H]": 23,
124
+ "S": 24,
125
+ "[C@@H]": 25,
126
+ "[O-]": 26,
127
+ "Br": 27,
128
+ "#": 28,
129
+ "/": 29,
130
+ "[nH]": 30,
131
+ "[N+]": 31,
132
+ "s": 32,
133
+ "5": 33,
134
+ "o": 34,
135
+ "P": 35,
136
+ "[Na+]": 36,
137
+ "[Si]": 37,
138
+ "I": 38,
139
+ "[Na]": 39,
140
+ "[Pd]": 40,
141
+ "[K+]": 41,
142
+ "[K]": 42,
143
+ "[P]": 43,
144
+ "B": 44,
145
+ "[C@]": 45,
146
+ "[C@@]": 46,
147
+ "[Cl-]": 47,
148
+ "6": 48,
149
+ "[OH-]": 49,
150
+ "\\": 50,
151
+ "[N-]": 51,
152
+ "[Li]": 52,
153
+ "[H]": 53,
154
+ "[2H]": 54,
155
+ "[NH4+]": 55,
156
+ "[c-]": 56,
157
+ "[P-]": 57,
158
+ "[Cs+]": 58,
159
+ "[Li+]": 59,
160
+ "[Cs]": 60,
161
+ "[NaH]": 61,
162
+ "[H-]": 62,
163
+ "[O+]": 63,
164
+ "[BH4-]": 64,
165
+ "[Cu]": 65,
166
+ "7": 66,
167
+ "[Mg]": 67,
168
+ "[Fe+2]": 68,
169
+ "[n+]": 69,
170
+ "[Sn]": 70,
171
+ "[BH-]": 71,
172
+ "[Pd+2]": 72,
173
+ "[CH]": 73,
174
+ "[I-]": 74,
175
+ "[Br-]": 75,
176
+ "[C-]": 76,
177
+ "[Zn]": 77,
178
+ "[B-]": 78,
179
+ "[F-]": 79,
180
+ "[Al]": 80,
181
+ "[P+]": 81,
182
+ "[BH3-]": 82,
183
+ "[Fe]": 83,
184
+ "[C]": 84,
185
+ "[AlH4]": 85,
186
+ "[Ni]": 86,
187
+ "[SiH]": 87,
188
+ "8": 88,
189
+ "[Cu+2]": 89,
190
+ "[Mn]": 90,
191
+ "[AlH]": 91,
192
+ "[nH+]": 92,
193
+ "[AlH4-]": 93,
194
+ "[O-2]": 94,
195
+ "[Cr]": 95,
196
+ "[Mg+2]": 96,
197
+ "[NH3+]": 97,
198
+ "[S@]": 98,
199
+ "[Pt]": 99,
200
+ "[Al+3]": 100,
201
+ "[S@@]": 101,
202
+ "[S-]": 102,
203
+ "[Ti]": 103,
204
+ "[Zn+2]": 104,
205
+ "[PH]": 105,
206
+ "[NH2+]": 106,
207
+ "[Ru]": 107,
208
+ "[Ag+]": 108,
209
+ "[S+]": 109,
210
+ "[I+3]": 110,
211
+ "[NH+]": 111,
212
+ "[Ca+2]": 112,
213
+ "[Ag]": 113,
214
+ "9": 114,
215
+ "[Os]": 115,
216
+ "[Se]": 116,
217
+ "[SiH2]": 117,
218
+ "[Ca]": 118,
219
+ "[Ti+4]": 119,
220
+ "[Ac]": 120,
221
+ "[Cu+]": 121,
222
+ "[S]": 122,
223
+ "[Rh]": 123,
224
+ "[Cl+3]": 124,
225
+ "[cH-]": 125,
226
+ "[Zn+]": 126,
227
+ "[O]": 127,
228
+ "[Cl+]": 128,
229
+ "[SH]": 129,
230
+ "[H+]": 130,
231
+ "[Pd+]": 131,
232
+ "[se]": 132,
233
+ "[PH+]": 133,
234
+ "[I]": 134,
235
+ "[Pt+2]": 135,
236
+ "[C+]": 136,
237
+ "[Mg+]": 137,
238
+ "[Hg]": 138,
239
+ "[W]": 139,
240
+ "[SnH]": 140,
241
+ "[SiH3]": 141,
242
+ "[Fe+3]": 142,
243
+ "[NH]": 143,
244
+ "[Mo]": 144,
245
+ "[CH2+]": 145,
246
+ "%10": 146,
247
+ "[CH2-]": 147,
248
+ "[CH2]": 148,
249
+ "[n-]": 149,
250
+ "[Ce+4]": 150,
251
+ "[NH-]": 151,
252
+ "[Co]": 152,
253
+ "[I+]": 153,
254
+ "[PH2]": 154,
255
+ "[Pt+4]": 155,
256
+ "[Ce]": 156,
257
+ "[B]": 157,
258
+ "[Sn+2]": 158,
259
+ "[Ba+2]": 159,
260
+ "%11": 160,
261
+ "[Fe-3]": 161,
262
+ "[18F]": 162,
263
+ "[SH-]": 163,
264
+ "[Pb+2]": 164,
265
+ "[Os-2]": 165,
266
+ "[Zr+4]": 166,
267
+ "[N]": 167,
268
+ "[Ir]": 168,
269
+ "[Bi]": 169,
270
+ "[Ni+2]": 170,
271
+ "[P@]": 171,
272
+ "[Co+2]": 172,
273
+ "[s+]": 173,
274
+ "[As]": 174,
275
+ "[P+3]": 175,
276
+ "[Hg+2]": 176,
277
+ "[Yb+3]": 177,
278
+ "[CH-]": 178,
279
+ "[Zr+2]": 179,
280
+ "[Mn+2]": 180,
281
+ "[CH+]": 181,
282
+ "[In]": 182,
283
+ "[KH]": 183,
284
+ "[Ce+3]": 184,
285
+ "[Zr]": 185,
286
+ "[AlH2-]": 186,
287
+ "[OH2+]": 187,
288
+ "[Ti+3]": 188,
289
+ "[Rh+2]": 189,
290
+ "[Sb]": 190,
291
+ "[S-2]": 191,
292
+ "%12": 192,
293
+ "[P@@]": 193,
294
+ "[Si@H]": 194,
295
+ "[Mn+4]": 195,
296
+ "p": 196,
297
+ "[Ba]": 197,
298
+ "[NH2-]": 198,
299
+ "[Ge]": 199,
300
+ "[Pb+4]": 200,
301
+ "[Cr+3]": 201,
302
+ "[Au]": 202,
303
+ "[LiH]": 203,
304
+ "[Sc+3]": 204,
305
+ "[o+]": 205,
306
+ "[Rh-3]": 206,
307
+ "%13": 207,
308
+ "[Br]": 208,
309
+ "[Sb-]": 209,
310
+ "[S@+]": 210,
311
+ "[I+2]": 211,
312
+ "[Ar]": 212,
313
+ "[V]": 213,
314
+ "[Cu-]": 214,
315
+ "[Al-]": 215,
316
+ "[Te]": 216,
317
+ "[13c]": 217,
318
+ "[13C]": 218,
319
+ "[Cl]": 219,
320
+ "[PH4+]": 220,
321
+ "[SiH4]": 221,
322
+ "[te]": 222,
323
+ "[CH3-]": 223,
324
+ "[S@@+]": 224,
325
+ "[Rh+3]": 225,
326
+ "[SH+]": 226,
327
+ "[Bi+3]": 227,
328
+ "[Br+2]": 228,
329
+ "[La]": 229,
330
+ "[La+3]": 230,
331
+ "[Pt-2]": 231,
332
+ "[N@@]": 232,
333
+ "[PH3+]": 233,
334
+ "[N@]": 234,
335
+ "[Si+4]": 235,
336
+ "[Sr+2]": 236,
337
+ "[Al+]": 237,
338
+ "[Pb]": 238,
339
+ "[SeH]": 239,
340
+ "[Si-]": 240,
341
+ "[V+5]": 241,
342
+ "[Y+3]": 242,
343
+ "[Re]": 243,
344
+ "[Ru+]": 244,
345
+ "[Sm]": 245,
346
+ "*": 246,
347
+ "[3H]": 247,
348
+ "[NH2]": 248,
349
+ "[Ag-]": 249,
350
+ "[13CH3]": 250,
351
+ "[OH+]": 251,
352
+ "[Ru+3]": 252,
353
+ "[OH]": 253,
354
+ "[Gd+3]": 254,
355
+ "[13CH2]": 255,
356
+ "[In+3]": 256,
357
+ "[Si@@]": 257,
358
+ "[Si@]": 258,
359
+ "[Ti+2]": 259,
360
+ "[Sn+]": 260,
361
+ "[Cl+2]": 261,
362
+ "[AlH-]": 262,
363
+ "[Pd-2]": 263,
364
+ "[SnH3]": 264,
365
+ "[B+3]": 265,
366
+ "[Cu-2]": 266,
367
+ "[Nd+3]": 267,
368
+ "[Pb+3]": 268,
369
+ "[13cH]": 269,
370
+ "[Fe-4]": 270,
371
+ "[Ga]": 271,
372
+ "[Sn+4]": 272,
373
+ "[Hg+]": 273,
374
+ "[11CH3]": 274,
375
+ "[Hf]": 275,
376
+ "[Pr]": 276,
377
+ "[Y]": 277,
378
+ "[S+2]": 278,
379
+ "[Cd]": 279,
380
+ "[Cr+6]": 280,
381
+ "[Zr+3]": 281,
382
+ "[Rh+]": 282,
383
+ "[CH3]": 283,
384
+ "[N-3]": 284,
385
+ "[Hf+2]": 285,
386
+ "[Th]": 286,
387
+ "[Sb+3]": 287,
388
+ "%14": 288,
389
+ "[Cr+2]": 289,
390
+ "[Ru+2]": 290,
391
+ "[Hf+4]": 291,
392
+ "[14C]": 292,
393
+ "[Ta]": 293,
394
+ "[Tl+]": 294,
395
+ "[B+]": 295,
396
+ "[Os+4]": 296,
397
+ "[PdH2]": 297,
398
+ "[Pd-]": 298,
399
+ "[Cd+2]": 299,
400
+ "[Co+3]": 300,
401
+ "[S+4]": 301,
402
+ "[Nb+5]": 302,
403
+ "[123I]": 303,
404
+ "[c+]": 304,
405
+ "[Rb+]": 305,
406
+ "[V+2]": 306,
407
+ "[CH3+]": 307,
408
+ "[Ag+2]": 308,
409
+ "[cH+]": 309,
410
+ "[Mn+3]": 310,
411
+ "[Se-]": 311,
412
+ "[As-]": 312,
413
+ "[Eu+3]": 313,
414
+ "[SH2]": 314,
415
+ "[Sm+3]": 315,
416
+ "[IH+]": 316,
417
+ "%15": 317,
418
+ "[OH3+]": 318,
419
+ "[PH3]": 319,
420
+ "[IH2+]": 320,
421
+ "[SH2+]": 321,
422
+ "[Ir+3]": 322,
423
+ "[AlH3]": 323,
424
+ "[Sc]": 324,
425
+ "[Yb]": 325,
426
+ "[15NH2]": 326,
427
+ "[Lu]": 327,
428
+ "[sH+]": 328,
429
+ "[Gd]": 329,
430
+ "[18F-]": 330,
431
+ "[SH3+]": 331,
432
+ "[SnH4]": 332,
433
+ "[TeH]": 333,
434
+ "[Si@@H]": 334,
435
+ "[Ga+3]": 335,
436
+ "[CaH2]": 336,
437
+ "[Tl]": 337,
438
+ "[Ta+5]": 338,
439
+ "[GeH]": 339,
440
+ "[Br+]": 340,
441
+ "[Sr]": 341,
442
+ "[Tl+3]": 342,
443
+ "[Sm+2]": 343,
444
+ "[PH5]": 344,
445
+ "%16": 345,
446
+ "[N@@+]": 346,
447
+ "[Au+3]": 347,
448
+ "[C-4]": 348,
449
+ "[Nd]": 349,
450
+ "[Ti+]": 350,
451
+ "[IH]": 351,
452
+ "[N@+]": 352,
453
+ "[125I]": 353,
454
+ "[Eu]": 354,
455
+ "[Sn+3]": 355,
456
+ "[Nb]": 356,
457
+ "[Er+3]": 357,
458
+ "[123I-]": 358,
459
+ "[14c]": 359,
460
+ "%17": 360,
461
+ "[SnH2]": 361,
462
+ "[YH]": 362,
463
+ "[Sb+5]": 363,
464
+ "[Pr+3]": 364,
465
+ "[Ir+]": 365,
466
+ "[N+3]": 366,
467
+ "[AlH2]": 367,
468
+ "[19F]": 368,
469
+ "%18": 369,
470
+ "[Tb]": 370,
471
+ "[14CH]": 371,
472
+ "[Mo+4]": 372,
473
+ "[Si+]": 373,
474
+ "[BH]": 374,
475
+ "[Be]": 375,
476
+ "[Rb]": 376,
477
+ "[pH]": 377,
478
+ "%19": 378,
479
+ "%20": 379,
480
+ "[Xe]": 380,
481
+ "[Ir-]": 381,
482
+ "[Be+2]": 382,
483
+ "[C+4]": 383,
484
+ "[RuH2]": 384,
485
+ "[15NH]": 385,
486
+ "[U+2]": 386,
487
+ "[Au-]": 387,
488
+ "%21": 388,
489
+ "%22": 389,
490
+ "[Au+]": 390,
491
+ "[15n]": 391,
492
+ "[Al+2]": 392,
493
+ "[Tb+3]": 393,
494
+ "[15N]": 394,
495
+ "[V+3]": 395,
496
+ "[W+6]": 396,
497
+ "[14CH3]": 397,
498
+ "[Cr+4]": 398,
499
+ "[ClH+]": 399,
500
+ "b": 400,
501
+ "[Ti+6]": 401,
502
+ "[Nd+]": 402,
503
+ "[Zr+]": 403,
504
+ "[PH2+]": 404,
505
+ "[Fm]": 405,
506
+ "[N@H+]": 406,
507
+ "[RuH]": 407,
508
+ "[Dy+3]": 408,
509
+ "%23": 409,
510
+ "[Hf+3]": 410,
511
+ "[W+4]": 411,
512
+ "[11C]": 412,
513
+ "[13CH]": 413,
514
+ "[Er]": 414,
515
+ "[124I]": 415,
516
+ "[LaH]": 416,
517
+ "[F]": 417,
518
+ "[siH]": 418,
519
+ "[Ga+]": 419,
520
+ "[Cm]": 420,
521
+ "[GeH3]": 421,
522
+ "[IH-]": 422,
523
+ "[U+6]": 423,
524
+ "[SeH+]": 424,
525
+ "[32P]": 425,
526
+ "[SeH-]": 426,
527
+ "[Pt-]": 427,
528
+ "[Ir+2]": 428,
529
+ "[se+]": 429,
530
+ "[U]": 430,
531
+ "[F+]": 431,
532
+ "[BH2]": 432,
533
+ "[As+]": 433,
534
+ "[Cf]": 434,
535
+ "[ClH2+]": 435,
536
+ "[Ni+]": 436,
537
+ "[TeH3]": 437,
538
+ "[SbH2]": 438,
539
+ "[Ag+3]": 439,
540
+ "%24": 440,
541
+ "[18O]": 441,
542
+ "[PH4]": 442,
543
+ "[Os+2]": 443,
544
+ "[Na-]": 444,
545
+ "[Sb+2]": 445,
546
+ "[V+4]": 446,
547
+ "[Ho+3]": 447,
548
+ "[68Ga]": 448,
549
+ "[PH-]": 449,
550
+ "[Bi+2]": 450,
551
+ "[Ce+2]": 451,
552
+ "[Pd+3]": 452,
553
+ "[99Tc]": 453,
554
+ "[13C@@H]": 454,
555
+ "[Fe+6]": 455,
556
+ "[c]": 456,
557
+ "[GeH2]": 457,
558
+ "[10B]": 458,
559
+ "[Cu+3]": 459,
560
+ "[Mo+2]": 460,
561
+ "[Cr+]": 461,
562
+ "[Pd+4]": 462,
563
+ "[Dy]": 463,
564
+ "[AsH]": 464,
565
+ "[Ba+]": 465,
566
+ "[SeH2]": 466,
567
+ "[In+]": 467,
568
+ "[TeH2]": 468,
569
+ "[BrH+]": 469,
570
+ "[14cH]": 470,
571
+ "[W+]": 471,
572
+ "[13C@H]": 472,
573
+ "[AsH2]": 473,
574
+ "[In+2]": 474,
575
+ "[N+2]": 475,
576
+ "[N@@H+]": 476,
577
+ "[SbH]": 477,
578
+ "[60Co]": 478,
579
+ "[AsH4+]": 479,
580
+ "[AsH3]": 480,
581
+ "[18OH]": 481,
582
+ "[Ru-2]": 482,
583
+ "[Na-2]": 483,
584
+ "[CuH2]": 484,
585
+ "[31P]": 485,
586
+ "[Ti+5]": 486,
587
+ "[35S]": 487,
588
+ "[P@@H]": 488,
589
+ "[ArH]": 489,
590
+ "[Co+]": 490,
591
+ "[Zr-2]": 491,
592
+ "[BH2-]": 492,
593
+ "[131I]": 493,
594
+ "[SH5]": 494,
595
+ "[VH]": 495,
596
+ "[B+2]": 496,
597
+ "[Yb+2]": 497,
598
+ "[14C@H]": 498,
599
+ "[211At]": 499,
600
+ "[NH3+2]": 500,
601
+ "[IrH]": 501,
602
+ "[IrH2]": 502,
603
+ "[Rh-]": 503,
604
+ "[Cr-]": 504,
605
+ "[Sb+]": 505,
606
+ "[Ni+3]": 506,
607
+ "[TaH3]": 507,
608
+ "[Tl+2]": 508,
609
+ "[64Cu]": 509,
610
+ "[Tc]": 510,
611
+ "[Cd+]": 511,
612
+ "[1H]": 512,
613
+ "[15nH]": 513,
614
+ "[AlH2+]": 514,
615
+ "[FH+2]": 515,
616
+ "[BiH3]": 516,
617
+ "[Ru-]": 517,
618
+ "[Mo+6]": 518,
619
+ "[AsH+]": 519,
620
+ "[BaH2]": 520,
621
+ "[BaH]": 521,
622
+ "[Fe+4]": 522,
623
+ "[229Th]": 523,
624
+ "[Th+4]": 524,
625
+ "[As+3]": 525,
626
+ "[NH+3]": 526,
627
+ "[P@H]": 527,
628
+ "[Li-]": 528,
629
+ "[7NaH]": 529,
630
+ "[Bi+]": 530,
631
+ "[PtH+2]": 531,
632
+ "[p-]": 532,
633
+ "[Re+5]": 533,
634
+ "[NiH]": 534,
635
+ "[Ni-]": 535,
636
+ "[Xe+]": 536,
637
+ "[Ca+]": 537,
638
+ "[11c]": 538,
639
+ "[Rh+4]": 539,
640
+ "[AcH]": 540,
641
+ "[HeH]": 541,
642
+ "[Sc+2]": 542,
643
+ "[Mn+]": 543,
644
+ "[UH]": 544,
645
+ "[14CH2]": 545,
646
+ "[SiH4+]": 546,
647
+ "[18OH2]": 547,
648
+ "[Ac-]": 548,
649
+ "[Re+4]": 549,
650
+ "[118Sn]": 550,
651
+ "[153Sm]": 551,
652
+ "[P+2]": 552,
653
+ "[9CH]": 553,
654
+ "[9CH3]": 554,
655
+ "[Y-]": 555,
656
+ "[NiH2]": 556,
657
+ "[Si+2]": 557,
658
+ "[Mn+6]": 558,
659
+ "[ZrH2]": 559,
660
+ "[C-2]": 560,
661
+ "[Bi+5]": 561,
662
+ "[24NaH]": 562,
663
+ "[Fr]": 563,
664
+ "[15CH]": 564,
665
+ "[Se+]": 565,
666
+ "[At]": 566,
667
+ "[P-3]": 567,
668
+ "[124I-]": 568,
669
+ "[CuH2-]": 569,
670
+ "[Nb+4]": 570,
671
+ "[Nb+3]": 571,
672
+ "[MgH]": 572,
673
+ "[Ir+4]": 573,
674
+ "[67Ga+3]": 574,
675
+ "[67Ga]": 575,
676
+ "[13N]": 576,
677
+ "[15OH2]": 577,
678
+ "[2NH]": 578,
679
+ "[Ho]": 579,
680
+ "[Cn]": 580
681
+ },
682
+ "merges": []
683
+ }
684
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": false,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "extra_special_tokens": {},
51
+ "mask_token": "<mask>",
52
+ "max_len": 128,
53
+ "model_max_length": 128,
54
+ "pad_token": "<pad>",
55
+ "sep_token": "</s>",
56
+ "tokenizer_class": "RobertaTokenizer",
57
+ "trim_offsets": true,
58
+ "unk_token": "<unk>"
59
+ }
vocab.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"<s>":0,"<pad>":1,"</s>":2,"<unk>":3,"<mask>":4,"c":5,"C":6,"(":7,")":8,"O":9,"1":10,"2":11,"=":12,"N":13,".":14,"n":15,"3":16,"F":17,"Cl":18,">>":19,"~":20,"-":21,"4":22,"[C@H]":23,"S":24,"[C@@H]":25,"[O-]":26,"Br":27,"#":28,"/":29,"[nH]":30,"[N+]":31,"s":32,"5":33,"o":34,"P":35,"[Na+]":36,"[Si]":37,"I":38,"[Na]":39,"[Pd]":40,"[K+]":41,"[K]":42,"[P]":43,"B":44,"[C@]":45,"[C@@]":46,"[Cl-]":47,"6":48,"[OH-]":49,"\\":50,"[N-]":51,"[Li]":52,"[H]":53,"[2H]":54,"[NH4+]":55,"[c-]":56,"[P-]":57,"[Cs+]":58,"[Li+]":59,"[Cs]":60,"[NaH]":61,"[H-]":62,"[O+]":63,"[BH4-]":64,"[Cu]":65,"7":66,"[Mg]":67,"[Fe+2]":68,"[n+]":69,"[Sn]":70,"[BH-]":71,"[Pd+2]":72,"[CH]":73,"[I-]":74,"[Br-]":75,"[C-]":76,"[Zn]":77,"[B-]":78,"[F-]":79,"[Al]":80,"[P+]":81,"[BH3-]":82,"[Fe]":83,"[C]":84,"[AlH4]":85,"[Ni]":86,"[SiH]":87,"8":88,"[Cu+2]":89,"[Mn]":90,"[AlH]":91,"[nH+]":92,"[AlH4-]":93,"[O-2]":94,"[Cr]":95,"[Mg+2]":96,"[NH3+]":97,"[S@]":98,"[Pt]":99,"[Al+3]":100,"[S@@]":101,"[S-]":102,"[Ti]":103,"[Zn+2]":104,"[PH]":105,"[NH2+]":106,"[Ru]":107,"[Ag+]":108,"[S+]":109,"[I+3]":110,"[NH+]":111,"[Ca+2]":112,"[Ag]":113,"9":114,"[Os]":115,"[Se]":116,"[SiH2]":117,"[Ca]":118,"[Ti+4]":119,"[Ac]":120,"[Cu+]":121,"[S]":122,"[Rh]":123,"[Cl+3]":124,"[cH-]":125,"[Zn+]":126,"[O]":127,"[Cl+]":128,"[SH]":129,"[H+]":130,"[Pd+]":131,"[se]":132,"[PH+]":133,"[I]":134,"[Pt+2]":135,"[C+]":136,"[Mg+]":137,"[Hg]":138,"[W]":139,"[SnH]":140,"[SiH3]":141,"[Fe+3]":142,"[NH]":143,"[Mo]":144,"[CH2+]":145,"%10":146,"[CH2-]":147,"[CH2]":148,"[n-]":149,"[Ce+4]":150,"[NH-]":151,"[Co]":152,"[I+]":153,"[PH2]":154,"[Pt+4]":155,"[Ce]":156,"[B]":157,"[Sn+2]":158,"[Ba+2]":159,"%11":160,"[Fe-3]":161,"[18F]":162,"[SH-]":163,"[Pb+2]":164,"[Os-2]":165,"[Zr+4]":166,"[N]":167,"[Ir]":168,"[Bi]":169,"[Ni+2]":170,"[P@]":171,"[Co+2]":172,"[s+]":173,"[As]":174,"[P+3]":175,"[Hg+2]":176,"[Yb+3]":177,"[CH-]":178,"[Zr+2]":179,"[Mn+2]":180,"[CH+]":181,"[In]":182,"[KH]":183,"[Ce+3]":184,"[Zr]":185,"[AlH2-]":186,"[OH2+]":187,"[Ti+3]":188,"[Rh+2]":189,"[Sb]":190,"[S-2]":191,"%12":192,"[P@@]":193,"[Si@H]":194,"[Mn+4]":195,"p":196,"[Ba]":197,"[NH2-]":198,"[Ge]":199,"[Pb+4]":200,"[Cr+3]":201,"[Au]":202,"[LiH]":203,"[Sc+3]":204,"[o+]":205,"[Rh-3]":206,"%13":207,"[Br]":208,"[Sb-]":209,"[S@+]":210,"[I+2]":211,"[Ar]":212,"[V]":213,"[Cu-]":214,"[Al-]":215,"[Te]":216,"[13c]":217,"[13C]":218,"[Cl]":219,"[PH4+]":220,"[SiH4]":221,"[te]":222,"[CH3-]":223,"[S@@+]":224,"[Rh+3]":225,"[SH+]":226,"[Bi+3]":227,"[Br+2]":228,"[La]":229,"[La+3]":230,"[Pt-2]":231,"[N@@]":232,"[PH3+]":233,"[N@]":234,"[Si+4]":235,"[Sr+2]":236,"[Al+]":237,"[Pb]":238,"[SeH]":239,"[Si-]":240,"[V+5]":241,"[Y+3]":242,"[Re]":243,"[Ru+]":244,"[Sm]":245,"*":246,"[3H]":247,"[NH2]":248,"[Ag-]":249,"[13CH3]":250,"[OH+]":251,"[Ru+3]":252,"[OH]":253,"[Gd+3]":254,"[13CH2]":255,"[In+3]":256,"[Si@@]":257,"[Si@]":258,"[Ti+2]":259,"[Sn+]":260,"[Cl+2]":261,"[AlH-]":262,"[Pd-2]":263,"[SnH3]":264,"[B+3]":265,"[Cu-2]":266,"[Nd+3]":267,"[Pb+3]":268,"[13cH]":269,"[Fe-4]":270,"[Ga]":271,"[Sn+4]":272,"[Hg+]":273,"[11CH3]":274,"[Hf]":275,"[Pr]":276,"[Y]":277,"[S+2]":278,"[Cd]":279,"[Cr+6]":280,"[Zr+3]":281,"[Rh+]":282,"[CH3]":283,"[N-3]":284,"[Hf+2]":285,"[Th]":286,"[Sb+3]":287,"%14":288,"[Cr+2]":289,"[Ru+2]":290,"[Hf+4]":291,"[14C]":292,"[Ta]":293,"[Tl+]":294,"[B+]":295,"[Os+4]":296,"[PdH2]":297,"[Pd-]":298,"[Cd+2]":299,"[Co+3]":300,"[S+4]":301,"[Nb+5]":302,"[123I]":303,"[c+]":304,"[Rb+]":305,"[V+2]":306,"[CH3+]":307,"[Ag+2]":308,"[cH+]":309,"[Mn+3]":310,"[Se-]":311,"[As-]":312,"[Eu+3]":313,"[SH2]":314,"[Sm+3]":315,"[IH+]":316,"%15":317,"[OH3+]":318,"[PH3]":319,"[IH2+]":320,"[SH2+]":321,"[Ir+3]":322,"[AlH3]":323,"[Sc]":324,"[Yb]":325,"[15NH2]":326,"[Lu]":327,"[sH+]":328,"[Gd]":329,"[18F-]":330,"[SH3+]":331,"[SnH4]":332,"[TeH]":333,"[Si@@H]":334,"[Ga+3]":335,"[CaH2]":336,"[Tl]":337,"[Ta+5]":338,"[GeH]":339,"[Br+]":340,"[Sr]":341,"[Tl+3]":342,"[Sm+2]":343,"[PH5]":344,"%16":345,"[N@@+]":346,"[Au+3]":347,"[C-4]":348,"[Nd]":349,"[Ti+]":350,"[IH]":351,"[N@+]":352,"[125I]":353,"[Eu]":354,"[Sn+3]":355,"[Nb]":356,"[Er+3]":357,"[123I-]":358,"[14c]":359,"%17":360,"[SnH2]":361,"[YH]":362,"[Sb+5]":363,"[Pr+3]":364,"[Ir+]":365,"[N+3]":366,"[AlH2]":367,"[19F]":368,"%18":369,"[Tb]":370,"[14CH]":371,"[Mo+4]":372,"[Si+]":373,"[BH]":374,"[Be]":375,"[Rb]":376,"[pH]":377,"%19":378,"%20":379,"[Xe]":380,"[Ir-]":381,"[Be+2]":382,"[C+4]":383,"[RuH2]":384,"[15NH]":385,"[U+2]":386,"[Au-]":387,"%21":388,"%22":389,"[Au+]":390,"[15n]":391,"[Al+2]":392,"[Tb+3]":393,"[15N]":394,"[V+3]":395,"[W+6]":396,"[14CH3]":397,"[Cr+4]":398,"[ClH+]":399,"b":400,"[Ti+6]":401,"[Nd+]":402,"[Zr+]":403,"[PH2+]":404,"[Fm]":405,"[N@H+]":406,"[RuH]":407,"[Dy+3]":408,"%23":409,"[Hf+3]":410,"[W+4]":411,"[11C]":412,"[13CH]":413,"[Er]":414,"[124I]":415,"[LaH]":416,"[F]":417,"[siH]":418,"[Ga+]":419,"[Cm]":420,"[GeH3]":421,"[IH-]":422,"[U+6]":423,"[SeH+]":424,"[32P]":425,"[SeH-]":426,"[Pt-]":427,"[Ir+2]":428,"[se+]":429,"[U]":430,"[F+]":431,"[BH2]":432,"[As+]":433,"[Cf]":434,"[ClH2+]":435,"[Ni+]":436,"[TeH3]":437,"[SbH2]":438,"[Ag+3]":439,"%24":440,"[18O]":441,"[PH4]":442,"[Os+2]":443,"[Na-]":444,"[Sb+2]":445,"[V+4]":446,"[Ho+3]":447,"[68Ga]":448,"[PH-]":449,"[Bi+2]":450,"[Ce+2]":451,"[Pd+3]":452,"[99Tc]":453,"[13C@@H]":454,"[Fe+6]":455,"[c]":456,"[GeH2]":457,"[10B]":458,"[Cu+3]":459,"[Mo+2]":460,"[Cr+]":461,"[Pd+4]":462,"[Dy]":463,"[AsH]":464,"[Ba+]":465,"[SeH2]":466,"[In+]":467,"[TeH2]":468,"[BrH+]":469,"[14cH]":470,"[W+]":471,"[13C@H]":472,"[AsH2]":473,"[In+2]":474,"[N+2]":475,"[N@@H+]":476,"[SbH]":477,"[60Co]":478,"[AsH4+]":479,"[AsH3]":480,"[18OH]":481,"[Ru-2]":482,"[Na-2]":483,"[CuH2]":484,"[31P]":485,"[Ti+5]":486,"[35S]":487,"[P@@H]":488,"[ArH]":489,"[Co+]":490,"[Zr-2]":491,"[BH2-]":492,"[131I]":493,"[SH5]":494,"[VH]":495,"[B+2]":496,"[Yb+2]":497,"[14C@H]":498,"[211At]":499,"[NH3+2]":500,"[IrH]":501,"[IrH2]":502,"[Rh-]":503,"[Cr-]":504,"[Sb+]":505,"[Ni+3]":506,"[TaH3]":507,"[Tl+2]":508,"[64Cu]":509,"[Tc]":510,"[Cd+]":511,"[1H]":512,"[15nH]":513,"[AlH2+]":514,"[FH+2]":515,"[BiH3]":516,"[Ru-]":517,"[Mo+6]":518,"[AsH+]":519,"[BaH2]":520,"[BaH]":521,"[Fe+4]":522,"[229Th]":523,"[Th+4]":524,"[As+3]":525,"[NH+3]":526,"[P@H]":527,"[Li-]":528,"[7NaH]":529,"[Bi+]":530,"[PtH+2]":531,"[p-]":532,"[Re+5]":533,"[NiH]":534,"[Ni-]":535,"[Xe+]":536,"[Ca+]":537,"[11c]":538,"[Rh+4]":539,"[AcH]":540,"[HeH]":541,"[Sc+2]":542,"[Mn+]":543,"[UH]":544,"[14CH2]":545,"[SiH4+]":546,"[18OH2]":547,"[Ac-]":548,"[Re+4]":549,"[118Sn]":550,"[153Sm]":551,"[P+2]":552,"[9CH]":553,"[9CH3]":554,"[Y-]":555,"[NiH2]":556,"[Si+2]":557,"[Mn+6]":558,"[ZrH2]":559,"[C-2]":560,"[Bi+5]":561,"[24NaH]":562,"[Fr]":563,"[15CH]":564,"[Se+]":565,"[At]":566,"[P-3]":567,"[124I-]":568,"[CuH2-]":569,"[Nb+4]":570,"[Nb+3]":571,"[MgH]":572,"[Ir+4]":573,"[67Ga+3]":574,"[67Ga]":575,"[13N]":576,"[15OH2]":577,"[2NH]":578,"[Ho]":579,"[Cn]":580}