eacortes commited on
Commit
f919ea5
·
verified ·
1 Parent(s): 3cd7c1b

Upload 14 files

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,590 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - sentence-transformers
5
+ - modchembert
6
+ - cheminformatics
7
+ - smiles
8
+ - molecular-similarity
9
+ - feature-extraction
10
+ - dense
11
+ - generated_from_trainer
12
+ - dataset_size:19381001
13
+ - loss:Matryoshka2dLoss
14
+ - loss:MatryoshkaLoss
15
+ - loss:TanimotoSentLoss
16
+ base_model: Derify/ModChemBERT-IR-BASE
17
+ widget:
18
+ - source_sentence: COC(=O)c1sc(-c2ccc(C)cc2)c2c1NC(=O)C2(c1ccccc1)c1ccccc1
19
+ sentences:
20
+ - COC(=O)c1sc(Nc2ccc(Br)cn2)c2c1NC(=O)C2(c1ccccc1)c1ccccc1
21
+ - CC[NH+]1CCOC(C(NN)c2ccccc2Br)C1
22
+ - CC([NH2+]C(C)c1ccccc1)C(=O)P(C)C(C)(C)C
23
+ - source_sentence: O=C(C=Cc1ccccc1)CC(=O)c1ccccc1O
24
+ sentences:
25
+ - COCCN(NCc1c(C)n(C(C)=O)c2ccc(OC)cc12)c1nccs1
26
+ - CCN(CCC(N)=O)C(=O)c1ccc(=O)[nH]n1
27
+ - N=CCC(=Cc1ccccc1)C(=O)COc1ccccc1O
28
+ - source_sentence: COc1cccc(-c2sc3ccccc3c2C#N)c1
29
+ sentences:
30
+ - COCC(C)(C)c1cnnn1CCCI
31
+ - N#Cc1c(-c2cccc(CN)c2)sc2ccccc12
32
+ - COc1ccccc1NC(=O)c1cc(NCc2ccco2)cc[nH+]1
33
+ - source_sentence: Nc1nc(-c2ccccc2)c2nc(N)c(N)nc2n1
34
+ sentences:
35
+ - CC(C)CC1NC(=O)C(Cc2ccccc2)NC(=O)c2ccc(cc2)CN(C(=O)CC2CCOCC2)CCCCNC(=O)C(C)NC1=O
36
+ - O=Nc1cccc(OCCC(F)F)c1
37
+ - CCCCNCc1nc(N)nc2nc(N)c(N)nc12
38
+ - source_sentence: OCCCc1cc(F)cc(F)c1
39
+ sentences:
40
+ - CCC(C)C(=O)C1(C(NN)C(C)C)CCCC1
41
+ - Cc1[nH]c2c(C(N)=O)ccc(C(=O)N3CCCCC3)c2c1C
42
+ - Fc1cc(F)cc(-n2cc[o+]n2)c1
43
+ datasets:
44
+ - Derify/pubchem_10m_genmol_similarity
45
+ pipeline_tag: sentence-similarity
46
+ library_name: sentence-transformers
47
+ metrics:
48
+ - spearman
49
+ co2_eq_emissions:
50
+ emissions: 4039.5232961852894
51
+ energy_consumed: 19.679154905865374
52
+ source: codecarbon
53
+ training_type: fine-tuning
54
+ on_cloud: false
55
+ cpu_model: AMD Ryzen 7 3700X 8-Core Processor
56
+ ram_total_size: 62.69887161254883
57
+ hours_used: 74.966
58
+ hardware_used: 2 x NVIDIA GeForce RTX 3090
59
+ model-index:
60
+ - name: 'ChemMRL: SMILES Matryoshka Representation Learning Embedding Transformer'
61
+ results:
62
+ - task:
63
+ type: semantic-similarity
64
+ name: Semantic Similarity
65
+ dataset:
66
+ name: pubchem 10m genmol similarity (validation)
67
+ type: pubchem_10m_genmol_similarity_validation
68
+ metrics:
69
+ - type: spearman
70
+ value: 0.9881056976837288
71
+ name: Spearman
72
+ - task:
73
+ type: semantic-similarity
74
+ name: Semantic Similarity
75
+ dataset:
76
+ name: pubchem 10m genmol similarity (test)
77
+ type: pubchem_10m_genmol_similarity_test
78
+ metrics:
79
+ - type: spearman
80
+ value: 0.988127555600757
81
+ name: Spearman
82
+ new_version: Derify/ChemMRL
83
+ ---
84
+
85
+ # ChemMRL: SMILES Matryoshka Representation Learning Embedding Transformer
86
+
87
+ This is a [Chem-MRL](https://github.com/emapco/chem-mrl) ([sentence-transformers](https://www.SBERT.net)) model finetuned from [Derify/ModChemBERT-IR-BASE](https://huggingface.co/Derify/ModChemBERT-IR-BASE) on the [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity) dataset. It maps SMILES to a 1024-dimensional dense vector space and can be used for molecular similarity, semantic search, database indexing, molecular classification, clustering, and more.
88
+
89
+ ## Model Details
90
+
91
+ ### Model Description
92
+ - **Model Type:** ChemMRL (Sentence Transformer)
93
+ - **Base model:** [Derify/ModChemBERT-IR-BASE](https://huggingface.co/Derify/ModChemBERT-IR-BASE) <!-- at revision fde8c1ed2606783be3ff621be0a4fde825f12169 -->
94
+ - **Maximum Sequence Length:** 512 tokens
95
+ - **Output Dimensionality:** 1024 dimensions
96
+ - **Similarity Function:** Tanimoto
97
+ - **Training Dataset:**
98
+ - [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity)
99
+ - **License:** apache-2.0
100
+
101
+ ### Model Sources
102
+
103
+ - **Repository:** [Chem-MRL on GitHub](https://github.com/emapco/chem-mrl)
104
+ - **Demo App Repository:** [Chem-MRL-demo on GitHub](https://github.com/emapco/chem-mrl-demo)
105
+
106
+ ### Full Model Architecture
107
+
108
+ ```
109
+ SentenceTransformer(
110
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'ModChemBertModel'})
111
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
112
+ (2): Normalize()
113
+ )
114
+ ```
115
+
116
+ ## Usage
117
+
118
+ ### Direct Usage (Chem-MRL)
119
+
120
+ First install the Chem-MRL library:
121
+
122
+ ```bash
123
+ pip install -U chem-mrl>=0.7.3
124
+ ```
125
+
126
+ Then you can load this model and run inference.
127
+ ```python
128
+ from chem_mrl import ChemMRL
129
+
130
+ # Download from the 🤗 Hub
131
+ model = ChemMRL(
132
+ "Derify/ChemMRL",
133
+ trust_remote_code=True,
134
+ model_kwargs={"torch_dtype": "bfloat16"},
135
+ )
136
+ # Run inference
137
+ sentences = [
138
+ 'OCCCc1cc(F)cc(F)c1',
139
+ 'Fc1cc(F)cc(-n2cc[o+]n2)c1',
140
+ 'CCC(C)C(=O)C1(C(NN)C(C)C)CCCC1',
141
+ ]
142
+ embeddings = model.backbone.encode(sentences)
143
+ print(embeddings.shape)
144
+ # [3, 1024]
145
+
146
+ # Get the similarity scores for the embeddings
147
+ similarities = model.backbone.similarity(embeddings, embeddings)
148
+ print(similarities)
149
+ # tensor([[1.0000, 0.4184, 0.0166],
150
+ # [0.4158, 1.0000, 0.0136],
151
+ # [0.0167, 0.0137, 1.0000]])
152
+ ```
153
+
154
+ ### Direct Usage (Sentence Transformers)
155
+
156
+ <details><summary>Click to see the direct usage in Transformers</summary>
157
+
158
+ First install the Sentence Transformers library:
159
+
160
+ ```bash
161
+ pip install -U sentence-transformers
162
+ ```
163
+
164
+ Then you can load this model and run inference.
165
+ ```python
166
+ from sentence_transformers import SentenceTransformer
167
+
168
+ # Download from the 🤗 Hub
169
+ model = SentenceTransformer(
170
+ "Derify/ChemMRL",
171
+ # SentenceTransformer doesn't support tanimoto similarity natively so we set a different similarity function here
172
+ similarity_fn_name="cosine",
173
+ trust_remote_code=True,
174
+ model_kwargs={"torch_dtype": "bfloat16"},
175
+ )
176
+ # Run inference
177
+ sentences = [
178
+ 'OCCCc1cc(F)cc(F)c1',
179
+ 'Fc1cc(F)cc(-n2cc[o+]n2)c1',
180
+ 'CCC(C)C(=O)C1(C(NN)C(C)C)CCCC1',
181
+ ]
182
+ embeddings = model.encode(sentences)
183
+ print(embeddings.shape)
184
+ # [3, 1024]
185
+
186
+ # Get the similarity scores for the embeddings
187
+ similarities = model.similarity(embeddings, embeddings)
188
+ print(similarities)
189
+ # tensor([[1.0000, 0.5887, 0.0327],
190
+ # [0.5887, 1.0000, 0.0269],
191
+ # [0.0327, 0.0269, 1.0000]])
192
+ ```
193
+
194
+ </details>
195
+
196
+ ## Evaluation
197
+
198
+ ### Metrics
199
+
200
+ #### Semantic Similarity
201
+
202
+ * Dataset: `pubchem_10m_genmol_similarity`
203
+ * Evaluated with <code>chem_mrl.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator</code> with these parameters:
204
+ ```json
205
+ {
206
+ "precision": "float32"
207
+ }
208
+ ```
209
+
210
+ | Split | Metric | Value |
211
+ | :------------- | :----------- | :---------- |
212
+ | **validation** | **spearman** | **0.98811** |
213
+ | **test** | **spearman** | **0.98813** |
214
+
215
+ ## Training Details
216
+
217
+ ### Training Dataset
218
+
219
+ #### pubchem_10m_genmol_similarity
220
+
221
+ * Dataset: [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity) at [9aec8fd](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity/tree/9aec8fd3ed70c21a0e39a3164830879a9929b052)
222
+ * Size: 19,381,001 training samples
223
+ * Columns: <code>smiles_a</code>, <code>smiles_b</code>, and <code>label</code>
224
+ * Approximate statistics based on the first 1000 samples:
225
+ | | smiles_a | smiles_b | label |
226
+ | :------ | :---------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------- | :-------------------------------------------------------------- |
227
+ | type | string | string | float |
228
+ | details | <ul><li>min: 17 tokens</li><li>mean: 42.36 tokens</li><li>max: 122 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 40.93 tokens</li><li>max: 122 tokens</li></ul> | <ul><li>min: 0.02</li><li>mean: 0.56</li><li>max: 1.0</li></ul> |
229
+ * Samples:
230
+ | smiles_a | smiles_b | label |
231
+ | :--------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------- | :------------------------------ |
232
+ | <code>COc1ccc(NC(=O)C2CC[NH+](C(C)C(=O)Nc3ccc(C(=O)Nc4ccc(F)c(F)c4)cc3C)CC2)cc1NC(=O)C1CCCCC1</code> | <code>Cc1cc(C(=O)Nc2ccc(F)c(F)c2)ccc1NC(=O)C(C)[NH+]1CCC(C(=O)Nc2cccc(NC(=O)C3CCCCC3)c2)CC1</code> | <code>0.8495575189590454</code> |
233
+ | <code>OCCN1CC[NH+](Cc2ccccc2OC2CC2)CC1</code> | <code>OCCN1CC[NH+](Cc2ccccc2On2cccn2)CC1</code> | <code>0.6615384817123413</code> |
234
+ | <code>CC1CN(C(=O)C2CC[NH+](Cc3cccc(C(N)=O)c3)CC2)CC(C)O1</code> | <code>CC1CN(C(=O)C2CC[NH+](Cc3ccccc3)CC2)CC(C)O1</code> | <code>0.7123287916183472</code> |
235
+ * Loss: [<code>Matryoshka2dLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshka2dloss) with these parameters:
236
+ ```json
237
+ {
238
+ "loss": "TanimotoSentLoss",
239
+ "n_layers_per_step": 11,
240
+ "last_layer_weight": 1.0,
241
+ "prior_layers_weight": 1.5,
242
+ "kl_div_weight": 0.5,
243
+ "kl_temperature": 0.3,
244
+ "matryoshka_dims": [
245
+ 1024,
246
+ 512,
247
+ 256,
248
+ 128,
249
+ 64,
250
+ 32,
251
+ 16,
252
+ 8
253
+ ],
254
+ "matryoshka_weights": [
255
+ 1,
256
+ 1,
257
+ 1,
258
+ 1,
259
+ 1,
260
+ 1,
261
+ 1,
262
+ 1
263
+ ],
264
+ "n_dims_per_step": 4
265
+ }
266
+ ```
267
+
268
+ ### Evaluation Dataset
269
+
270
+ #### pubchem_10m_genmol_similarity
271
+
272
+ * Dataset: [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity) at [9aec8fd](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity/tree/9aec8fd3ed70c21a0e39a3164830879a9929b052)
273
+ * Size: 1,080,394 evaluation samples
274
+ * Columns: <code>smiles_a</code>, <code>smiles_b</code>, and <code>label</code>
275
+ * Approximate statistics based on the first 1000 samples:
276
+ | | smiles_a | smiles_b | label |
277
+ | :------ | :---------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------- | :------------------------------------------------------------- |
278
+ | type | string | string | float |
279
+ | details | <ul><li>min: 16 tokens</li><li>mean: 42.05 tokens</li><li>max: 101 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 40.23 tokens</li><li>max: 104 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.57</li><li>max: 1.0</li></ul> |
280
+ * Samples:
281
+ | smiles_a | smiles_b | label |
282
+ | :------------------------------------- | :---------------------------------------- | :------------------------------ |
283
+ | <code>N#CCCN(Cc1cnc(N)cn1)C1CC1</code> | <code>N#CCCN(Cc1cnc(N)cn1)C1CCCC1</code> | <code>0.8600000143051147</code> |
284
+ | <code>N#CCCN(Cc1cnc(N)cn1)C1CC1</code> | <code>N#CCCN(Cc1cnc(N)cn1)C1CCOCC1</code> | <code>0.7962962985038757</code> |
285
+ | <code>N#CCCN(Cc1cnc(N)cn1)C1CC1</code> | <code>N#CCCN(Cc1cnc(N)cn1)CC(F)F</code> | <code>0.5517241358757019</code> |
286
+ * Loss: [<code>Matryoshka2dLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshka2dloss) with these parameters:
287
+ ```json
288
+ {
289
+ "loss": "TanimotoSentLoss",
290
+ "n_layers_per_step": 11,
291
+ "last_layer_weight": 1.0,
292
+ "prior_layers_weight": 1.5,
293
+ "kl_div_weight": 0.5,
294
+ "kl_temperature": 0.3,
295
+ "matryoshka_dims": [
296
+ 1024,
297
+ 512,
298
+ 256,
299
+ 128,
300
+ 64,
301
+ 32,
302
+ 16,
303
+ 8
304
+ ],
305
+ "matryoshka_weights": [
306
+ 1,
307
+ 1,
308
+ 1,
309
+ 1,
310
+ 1,
311
+ 1,
312
+ 1,
313
+ 1
314
+ ],
315
+ "n_dims_per_step": 4
316
+ }
317
+ ```
318
+
319
+ ### Training Hyperparameters
320
+ #### Non-Default Hyperparameters
321
+
322
+ - `eval_strategy`: steps
323
+ - `per_device_train_batch_size`: 192
324
+ - `per_device_eval_batch_size`: 512
325
+ - `learning_rate`: 8e-06
326
+ - `weight_decay`: 1e-05
327
+ - `max_grad_norm`: None
328
+ - `lr_scheduler_type`: warmup_stable_decay
329
+ - `lr_scheduler_kwargs`: {'num_decay_steps': 100943, 'warmup_type': 'linear', 'decay_type': '1-sqrt'}
330
+ - `warmup_steps`: 100943
331
+ - `data_seed`: 42
332
+ - `bf16`: True
333
+ - `bf16_full_eval`: True
334
+ - `tf32`: True
335
+ - `optim`: stable_adamw
336
+ - `optim_args`: decouple_lr=True,max_lr=8.0e-6
337
+ - `dataloader_pin_memory`: False
338
+ - `eval_on_start`: True
339
+
340
+ #### All Hyperparameters
341
+ <details><summary>Click to expand</summary>
342
+
343
+ - `overwrite_output_dir`: False
344
+ - `do_predict`: False
345
+ - `eval_strategy`: steps
346
+ - `prediction_loss_only`: True
347
+ - `per_device_train_batch_size`: 192
348
+ - `per_device_eval_batch_size`: 512
349
+ - `per_gpu_train_batch_size`: None
350
+ - `per_gpu_eval_batch_size`: None
351
+ - `gradient_accumulation_steps`: 1
352
+ - `eval_accumulation_steps`: None
353
+ - `torch_empty_cache_steps`: None
354
+ - `learning_rate`: 8e-06
355
+ - `weight_decay`: 1e-05
356
+ - `adam_beta1`: 0.9
357
+ - `adam_beta2`: 0.999
358
+ - `adam_epsilon`: 1e-08
359
+ - `max_grad_norm`: None
360
+ - `num_train_epochs`: 3
361
+ - `max_steps`: -1
362
+ - `lr_scheduler_type`: warmup_stable_decay
363
+ - `lr_scheduler_kwargs`: {'num_decay_steps': 100943, 'warmup_type': 'linear', 'decay_type': '1-sqrt'}
364
+ - `warmup_ratio`: 0.0
365
+ - `warmup_steps`: 100943
366
+ - `log_level`: passive
367
+ - `log_level_replica`: warning
368
+ - `log_on_each_node`: True
369
+ - `logging_nan_inf_filter`: True
370
+ - `save_safetensors`: True
371
+ - `save_on_each_node`: False
372
+ - `save_only_model`: False
373
+ - `restore_callback_states_from_checkpoint`: False
374
+ - `no_cuda`: False
375
+ - `use_cpu`: False
376
+ - `use_mps_device`: False
377
+ - `seed`: 42
378
+ - `data_seed`: 42
379
+ - `jit_mode_eval`: False
380
+ - `bf16`: True
381
+ - `fp16`: False
382
+ - `fp16_opt_level`: O1
383
+ - `half_precision_backend`: auto
384
+ - `bf16_full_eval`: True
385
+ - `fp16_full_eval`: False
386
+ - `tf32`: True
387
+ - `local_rank`: 0
388
+ - `ddp_backend`: None
389
+ - `tpu_num_cores`: None
390
+ - `tpu_metrics_debug`: False
391
+ - `debug`: []
392
+ - `dataloader_drop_last`: False
393
+ - `dataloader_num_workers`: 0
394
+ - `dataloader_prefetch_factor`: None
395
+ - `past_index`: -1
396
+ - `disable_tqdm`: False
397
+ - `remove_unused_columns`: True
398
+ - `label_names`: None
399
+ - `load_best_model_at_end`: False
400
+ - `ignore_data_skip`: False
401
+ - `fsdp`: []
402
+ - `fsdp_min_num_params`: 0
403
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
404
+ - `fsdp_transformer_layer_cls_to_wrap`: None
405
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
406
+ - `parallelism_config`: None
407
+ - `deepspeed`: None
408
+ - `label_smoothing_factor`: 0.0
409
+ - `optim`: stable_adamw
410
+ - `optim_args`: decouple_lr=True,max_lr=8.0e-6
411
+ - `adafactor`: False
412
+ - `group_by_length`: False
413
+ - `length_column_name`: length
414
+ - `project`: huggingface
415
+ - `trackio_space_id`: trackio
416
+ - `ddp_find_unused_parameters`: None
417
+ - `ddp_bucket_cap_mb`: None
418
+ - `ddp_broadcast_buffers`: False
419
+ - `dataloader_pin_memory`: False
420
+ - `dataloader_persistent_workers`: False
421
+ - `skip_memory_metrics`: True
422
+ - `use_legacy_prediction_loop`: False
423
+ - `push_to_hub`: False
424
+ - `resume_from_checkpoint`: None
425
+ - `hub_model_id`: None
426
+ - `hub_strategy`: every_save
427
+ - `hub_private_repo`: None
428
+ - `hub_always_push`: False
429
+ - `hub_revision`: None
430
+ - `gradient_checkpointing`: False
431
+ - `gradient_checkpointing_kwargs`: None
432
+ - `include_inputs_for_metrics`: False
433
+ - `include_for_metrics`: []
434
+ - `eval_do_concat_batches`: True
435
+ - `fp16_backend`: auto
436
+ - `push_to_hub_model_id`: None
437
+ - `push_to_hub_organization`: None
438
+ - `mp_parameters`:
439
+ - `auto_find_batch_size`: False
440
+ - `full_determinism`: False
441
+ - `torchdynamo`: None
442
+ - `ray_scope`: last
443
+ - `ddp_timeout`: 1800
444
+ - `torch_compile`: False
445
+ - `torch_compile_backend`: None
446
+ - `torch_compile_mode`: None
447
+ - `include_tokens_per_second`: False
448
+ - `include_num_input_tokens_seen`: no
449
+ - `neftune_noise_alpha`: None
450
+ - `optim_target_modules`: None
451
+ - `batch_eval_metrics`: False
452
+ - `eval_on_start`: True
453
+ - `use_liger_kernel`: False
454
+ - `liger_kernel_config`: None
455
+ - `eval_use_gather_object`: False
456
+ - `average_tokens_across_devices`: True
457
+ - `prompts`: None
458
+ - `batch_sampler`: batch_sampler
459
+ - `multi_dataset_batch_sampler`: proportional
460
+ - `router_mapping`: {}
461
+ - `learning_rate_mapping`: {}
462
+
463
+ </details>
464
+
465
+ ### Training Logs
466
+ <details><summary>Click to expand</summary>
467
+
468
+ | Epoch | Step | Training Loss | pubchem 10m genmol similarity loss | pubchem_10m_genmol_similarity_spearman |
469
+ | :----: | :----: | :-----------: | :--------------------------------: | :------------------------------------: |
470
+ | 0 | 0 | - | 85.7997 | 0.7261 |
471
+ | 0.0000 | 1 | 69.0605 | - | - |
472
+ | 0.2477 | 25000 | 47.1696 | - | - |
473
+ | 0.2500 | 25235 | - | 56.9634 | 0.8997 |
474
+ | 0.4978 | 50250 | 45.6212 | - | - |
475
+ | 0.5000 | 50470 | - | 55.4366 | 0.9599 |
476
+ | 0.7479 | 75500 | 45.1404 | - | - |
477
+ | 0.7500 | 75705 | - | 54.5667 | 0.9755 |
478
+ | 0.9981 | 100750 | 44.5023 | - | - |
479
+ | 1.0000 | 100940 | - | 54.1244 | 0.9810 |
480
+ | 1.2482 | 126000 | 43.7545 | - | - |
481
+ | 1.2500 | 126175 | - | 53.6974 | 0.9838 |
482
+ | 1.4984 | 151250 | 43.7865 | - | - |
483
+ | 1.5000 | 151410 | - | 53.4775 | 0.9855 |
484
+ | 1.7485 | 176500 | 43.3512 | - | - |
485
+ | 1.7499 | 176645 | - | 53.3775 | 0.9866 |
486
+ | 1.9987 | 201750 | 43.5808 | - | - |
487
+ | 1.9999 | 201880 | - | 53.3119 | 0.9874 |
488
+ | 2.2488 | 227000 | 43.281 | - | - |
489
+ | 2.2499 | 227115 | - | 53.1854 | 0.9879 |
490
+ | 2.4989 | 252250 | 43.3097 | - | - |
491
+ | 2.4999 | 252350 | - | 53.1972 | 0.9880 |
492
+ | 2.7491 | 277500 | 43.2376 | - | - |
493
+ | 2.7499 | 277585 | - | 53.1833 | 0.9881 |
494
+ | 2.9992 | 302750 | 43.2006 | - | - |
495
+ | 2.9999 | 302820 | - | 53.1241 | 0.9881 |
496
+ | 3.0000 | 302829 | - | - | 0.98811 |
497
+
498
+ </details>
499
+
500
+ ### Environmental Impact
501
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
502
+ - **Energy Consumed**: 19.679 kWh
503
+ - **Carbon Emitted**: 4.040 kg of CO2
504
+ - **Hours Used**: 74.966 hours
505
+
506
+ ### Training Hardware
507
+ - **On Cloud**: No
508
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
509
+ - **CPU Model**: AMD Ryzen 7 3700X 8-Core Processor
510
+ - **RAM Size**: 62.70 GB
511
+
512
+ ### Framework Versions
513
+ - Python: 3.13.7
514
+ - Sentence Transformers: 5.1.1
515
+ - Transformers: 4.57.1
516
+ - PyTorch: 2.8.0+cu128
517
+ - Accelerate: 1.10.1
518
+ - Datasets: 3.6.0
519
+ - Tokenizers: 0.22.1
520
+
521
+ ## Citation
522
+
523
+ ### BibTeX
524
+
525
+ #### Sentence Transformers
526
+ ```bibtex
527
+ @inproceedings{reimers-2019-sentence-bert,
528
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
529
+ author = "Reimers, Nils and Gurevych, Iryna",
530
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
531
+ month = "11",
532
+ year = "2019",
533
+ publisher = "Association for Computational Linguistics",
534
+ url = "https://arxiv.org/abs/1908.10084",
535
+ }
536
+ ```
537
+
538
+ #### Matryoshka2dLoss
539
+ ```bibtex
540
+ @misc{li20242d,
541
+ title={2D Matryoshka Sentence Embeddings},
542
+ author={Xianming Li and Zongxi Li and Jing Li and Haoran Xie and Qing Li},
543
+ year={2024},
544
+ eprint={2402.14776},
545
+ archivePrefix={arXiv},
546
+ primaryClass={cs.CL}
547
+ }
548
+ ```
549
+
550
+ #### MatryoshkaLoss
551
+ ```bibtex
552
+ @misc{kusupati2024matryoshka,
553
+ title={Matryoshka Representation Learning},
554
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
555
+ year={2024},
556
+ eprint={2205.13147},
557
+ archivePrefix={arXiv},
558
+ primaryClass={cs.LG}
559
+ }
560
+ ```
561
+
562
+ #### CoSENTLoss
563
+ ```bibtex
564
+ @online{kexuefm-8847,
565
+ title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
566
+ author={Su Jianlin},
567
+ year={2022},
568
+ month={Jan},
569
+ url={https://kexue.fm/archives/8847},
570
+ }
571
+ ```
572
+
573
+ #### TanimotoSentLoss
574
+ ```bibtex
575
+ @online{cortes-2025-tanimotosentloss,
576
+ title={TanimotoSentLoss: Tanimoto Loss for SMILES Embeddings},
577
+ author={Emmanuel Cortes},
578
+ year={2025},
579
+ month={Jan},
580
+ url={https://github.com/emapco/chem-mrl},
581
+ }
582
+ ```
583
+
584
+ ## Model Card Authors
585
+
586
+ [@eacortes](https://huggingface.co/eacortes)
587
+
588
+ ## Model Card Contact
589
+
590
+ Manny Cortes (manny@derifyai.com)
config.json ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModChemBertModel"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.1,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_modchembert.ModChemBertConfig",
9
+ "AutoModel": "modeling_modchembert.ModChemBertModel",
10
+ "AutoModelForMaskedLM": "modeling_modchembert.ModChemBertForMaskedLM",
11
+ "AutoModelForSequenceClassification": "modeling_modchembert.ModChemBertForSequenceClassification"
12
+ },
13
+ "bos_token_id": 0,
14
+ "classifier_activation": "gelu",
15
+ "classifier_bias": false,
16
+ "classifier_dropout": 0.0,
17
+ "classifier_pooling": "max_seq_mha",
18
+ "classifier_pooling_attention_dropout": 0.1,
19
+ "classifier_pooling_last_k": 5,
20
+ "classifier_pooling_num_attention_heads": 4,
21
+ "cls_token_id": 0,
22
+ "decoder_bias": true,
23
+ "deterministic_flash_attn": false,
24
+ "dtype": "bfloat16",
25
+ "embedding_dropout": 0.1,
26
+ "eos_token_id": 1,
27
+ "global_attn_every_n_layers": 3,
28
+ "global_rope_theta": 160000.0,
29
+ "hidden_activation": "gelu",
30
+ "hidden_size": 1024,
31
+ "initializer_cutoff_factor": 2.0,
32
+ "initializer_range": 0.02,
33
+ "intermediate_size": 1536,
34
+ "layer_norm_eps": 1e-05,
35
+ "local_attention": 8,
36
+ "local_rope_theta": 10000.0,
37
+ "max_position_embeddings": 512,
38
+ "mlp_bias": false,
39
+ "mlp_dropout": 0.1,
40
+ "model_type": "modchembert",
41
+ "norm_bias": false,
42
+ "norm_eps": 1e-05,
43
+ "num_attention_heads": 16,
44
+ "num_hidden_layers": 22,
45
+ "pad_token_id": 2,
46
+ "position_embedding_type": "absolute",
47
+ "repad_logits_with_grad": false,
48
+ "sep_token_id": 1,
49
+ "sparse_pred_ignore_index": -100,
50
+ "sparse_prediction": false,
51
+ "transformers_version": "4.57.1",
52
+ "vocab_size": 2362
53
+ }
config_chem_mrl.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": "0.7.4",
3
+ "embedding_pooling": "mean",
4
+ "eval_metric": "spearman",
5
+ "eval_similarity_fct": "tanimoto",
6
+ "kl_div_weight": 0.5,
7
+ "kl_temperature": 0.3,
8
+ "last_layer_weight": 1.0,
9
+ "loss_func": "tanimotosentloss",
10
+ "model_name": "Derify/ModChemBERT-IR-BASE",
11
+ "mrl_dimension_weights": [
12
+ 1,
13
+ 1,
14
+ 1,
15
+ 1,
16
+ 1,
17
+ 1,
18
+ 1,
19
+ 1
20
+ ],
21
+ "mrl_dimensions": [
22
+ 1024,
23
+ 512,
24
+ 256,
25
+ 128,
26
+ 64,
27
+ 32,
28
+ 16,
29
+ 8
30
+ ],
31
+ "n_dims_per_step": 4,
32
+ "n_layers_per_step": 11,
33
+ "prior_layers_weight": 1.5,
34
+ "tanimoto_similarity_loss_func": null,
35
+ "use_2d_matryoshka": true,
36
+ "use_query_tokenizer": false
37
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.1",
5
+ "transformers": "4.57.1",
6
+ "pytorch": "2.8.0+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "tanimoto"
14
+ }
configuration_modchembert.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2025 Emmanuel Cortes, All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ from typing import Literal
16
+
17
+ from transformers.models.modernbert.configuration_modernbert import ModernBertConfig
18
+
19
+
20
+ class ModChemBertConfig(ModernBertConfig):
21
+ """
22
+ Configuration class for ModChemBert models.
23
+
24
+ This configuration class extends ModernBertConfig with additional parameters specific to
25
+ chemical molecule modeling and custom pooling strategies for classification/regression tasks.
26
+ It accepts all arguments and keyword arguments from ModernBertConfig.
27
+
28
+ Args:
29
+ classifier_pooling (str, optional): Pooling strategy for sequence classification.
30
+ Available options:
31
+ - "cls": Use CLS token representation
32
+ - "mean": Attention-weighted average pooling
33
+ - "sum_mean": Sum all hidden states across layers, then mean pool over sequence (ChemLM approach)
34
+ - "sum_sum": Sum all hidden states across layers, then sum pool over sequence
35
+ - "mean_mean": Mean all hidden states across layers, then mean pool over sequence
36
+ - "mean_sum": Mean all hidden states across layers, then sum pool over sequence
37
+ - "max_cls": Element-wise max pooling over last k hidden states, then take CLS token
38
+ - "cls_mha": Multi-head attention with CLS token as query and full sequence as keys/values
39
+ - "max_seq_mha": Max pooling over last k states + multi-head attention with CLS as query
40
+ - "max_seq_mean": Max pooling over last k hidden states, then mean pooling over sequence
41
+ Defaults to "sum_mean".
42
+ classifier_pooling_num_attention_heads (int, optional): Number of attention heads for multi-head attention
43
+ pooling strategies (cls_mha, max_seq_mha). Defaults to 4.
44
+ classifier_pooling_attention_dropout (float, optional): Dropout probability for multi-head attention
45
+ pooling strategies (cls_mha, max_seq_mha). Defaults to 0.0.
46
+ classifier_pooling_last_k (int, optional): Number of last hidden layers to use for max pooling
47
+ strategies (max_cls, max_seq_mha, max_seq_mean). Defaults to 8.
48
+ *args: Variable length argument list passed to ModernBertConfig.
49
+ **kwargs: Arbitrary keyword arguments passed to ModernBertConfig.
50
+
51
+ Note:
52
+ This class inherits all configuration parameters from ModernBertConfig including
53
+ hidden_size, num_hidden_layers, num_attention_heads, intermediate_size, etc.
54
+ """
55
+
56
+ model_type = "modchembert"
57
+
58
+ def __init__(
59
+ self,
60
+ *args,
61
+ classifier_pooling: Literal[
62
+ "cls",
63
+ "mean",
64
+ "sum_mean",
65
+ "sum_sum",
66
+ "mean_mean",
67
+ "mean_sum",
68
+ "max_cls",
69
+ "cls_mha",
70
+ "max_seq_mha",
71
+ "max_seq_mean",
72
+ ] = "max_seq_mha",
73
+ classifier_pooling_num_attention_heads: int = 4,
74
+ classifier_pooling_attention_dropout: float = 0.0,
75
+ classifier_pooling_last_k: int = 8,
76
+ **kwargs,
77
+ ):
78
+ # Pass classifier_pooling="cls" to circumvent ValueError in ModernBertConfig init
79
+ super().__init__(*args, classifier_pooling="cls", **kwargs)
80
+ # Override with custom value
81
+ self.classifier_pooling = classifier_pooling
82
+ self.classifier_pooling_num_attention_heads = classifier_pooling_num_attention_heads
83
+ self.classifier_pooling_attention_dropout = classifier_pooling_attention_dropout
84
+ self.classifier_pooling_last_k = classifier_pooling_last_k
85
+ self.auto_map = {
86
+ "AutoConfig": "configuration_modchembert.ModChemBertConfig",
87
+ "AutoModel": "modeling_modchembert.ModChemBertModel",
88
+ "AutoModelForMaskedLM": "modeling_modchembert.ModChemBertForMaskedLM",
89
+ "AutoModelForSequenceClassification": "modeling_modchembert.ModChemBertForSequenceClassification",
90
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4bdfc920fcc3c65314ef0cf0f5129884443d23748f09e23467015d54d5338ce4
3
+ size 397110232
modeling_modchembert.py ADDED
@@ -0,0 +1,780 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2025 Emmanuel Cortes, All Rights Reserved.
2
+ #
3
+ # Copyright 2024 Answer.AI, LightOn, and contributors, and the HuggingFace Inc. team. All rights reserved.
4
+ #
5
+ #
6
+ # Licensed under the Apache License, Version 2.0 (the "License");
7
+ # you may not use this file except in compliance with the License.
8
+ # You may obtain a copy of the License at
9
+ #
10
+ # http://www.apache.org/licenses/LICENSE-2.0
11
+ #
12
+ # Unless required by applicable law or agreed to in writing, software
13
+ # distributed under the License is distributed on an "AS IS" BASIS,
14
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15
+ # See the License for the specific language governing permissions and
16
+ # limitations under the License.
17
+
18
+ # This file is adapted from the transformers library.
19
+ # Modifications include:
20
+ # - Additional classifier_pooling options for ModChemBertForSequenceClassification
21
+ # - sum_mean, sum_sum, mean_sum, mean_mean: from ChemLM (utilizes all hidden states)
22
+ # - max_cls, cls_mha, max_seq_mha: from MaxPoolBERT (utilizes last k hidden states)
23
+ # - max_seq_mean: a merge between sum_mean and max_cls (utilizes last k hidden states)
24
+ # - Addition of ModChemBertPoolingAttention for cls_mha and max_seq_mha pooling options
25
+
26
+ import copy
27
+ import math
28
+ import typing
29
+ from contextlib import nullcontext
30
+
31
+ import torch
32
+ import torch.nn as nn
33
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
34
+ from transformers.modeling_attn_mask_utils import _prepare_4d_attention_mask
35
+ from transformers.modeling_outputs import BaseModelOutput, MaskedLMOutput, SequenceClassifierOutput
36
+ from transformers.models.modernbert.modeling_modernbert import (
37
+ MODERNBERT_ATTENTION_FUNCTION,
38
+ ModernBertEmbeddings,
39
+ ModernBertEncoderLayer,
40
+ ModernBertModel,
41
+ ModernBertPredictionHead,
42
+ ModernBertPreTrainedModel,
43
+ ModernBertRotaryEmbedding,
44
+ _pad_modernbert_output,
45
+ _unpad_modernbert_input,
46
+ )
47
+ from transformers.utils import logging
48
+
49
+ from .configuration_modchembert import ModChemBertConfig
50
+
51
+ logger = logging.get_logger(__name__)
52
+
53
+
54
+ class InitWeightsMixin:
55
+ def _init_weights(self, module: nn.Module):
56
+ super()._init_weights(module) # type: ignore
57
+
58
+ cutoff_factor = self.config.initializer_cutoff_factor # type: ignore
59
+ if cutoff_factor is None:
60
+ cutoff_factor = 3
61
+
62
+ def init_weight(module: nn.Module, std: float):
63
+ if isinstance(module, nn.Linear):
64
+ nn.init.trunc_normal_(
65
+ module.weight,
66
+ mean=0.0,
67
+ std=std,
68
+ a=-cutoff_factor * std,
69
+ b=cutoff_factor * std,
70
+ )
71
+ if module.bias is not None:
72
+ nn.init.zeros_(module.bias)
73
+
74
+ stds = {
75
+ "in": self.config.initializer_range, # type: ignore
76
+ "out": self.config.initializer_range / math.sqrt(2.0 * self.config.num_hidden_layers), # type: ignore
77
+ "final_out": self.config.hidden_size**-0.5, # type: ignore
78
+ }
79
+
80
+ if isinstance(module, ModChemBertForMaskedLM):
81
+ init_weight(module.decoder, stds["out"])
82
+ elif isinstance(module, ModChemBertForSequenceClassification):
83
+ init_weight(module.classifier, stds["final_out"])
84
+ elif isinstance(module, ModChemBertPoolingAttention):
85
+ init_weight(module.Wq, stds["in"])
86
+ init_weight(module.Wk, stds["in"])
87
+ init_weight(module.Wv, stds["in"])
88
+ init_weight(module.Wo, stds["out"])
89
+
90
+
91
+ class ModChemBertPoolingAttention(nn.Module):
92
+ """Performs multi-headed self attention on a batch of sequences."""
93
+
94
+ def __init__(self, config: ModChemBertConfig):
95
+ super().__init__()
96
+ self.config = copy.deepcopy(config)
97
+ # Override num_attention_heads to use classifier_pooling_num_attention_heads
98
+ self.config.num_attention_heads = config.classifier_pooling_num_attention_heads
99
+ # Override attention_dropout to use classifier_pooling_attention_dropout
100
+ self.config.attention_dropout = config.classifier_pooling_attention_dropout
101
+
102
+ if config.hidden_size % config.num_attention_heads != 0:
103
+ raise ValueError(
104
+ f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention heads "
105
+ f"({config.num_attention_heads})"
106
+ )
107
+
108
+ self.attention_dropout = config.attention_dropout
109
+ self.num_heads = config.num_attention_heads
110
+ self.head_dim = config.hidden_size // config.num_attention_heads
111
+ self.all_head_size = self.head_dim * self.num_heads
112
+ self.Wq = nn.Linear(config.hidden_size, self.all_head_size, bias=config.attention_bias)
113
+ self.Wk = nn.Linear(config.hidden_size, self.all_head_size, bias=config.attention_bias)
114
+ self.Wv = nn.Linear(config.hidden_size, self.all_head_size, bias=config.attention_bias)
115
+
116
+ # Use global attention
117
+ self.local_attention = (-1, -1)
118
+ rope_theta = config.global_rope_theta
119
+ # sdpa path from original ModernBert implementation
120
+ config_copy = copy.deepcopy(config)
121
+ config_copy.rope_theta = rope_theta
122
+ self.rotary_emb = ModernBertRotaryEmbedding(config=config_copy)
123
+
124
+ self.Wo = nn.Linear(config.hidden_size, config.hidden_size, bias=config.attention_bias)
125
+ self.out_drop = (
126
+ nn.Dropout(config.attention_dropout)
127
+ if config.attention_dropout > 0.0
128
+ else nn.Identity()
129
+ )
130
+ self.pruned_heads = set()
131
+
132
+ def forward(
133
+ self,
134
+ q: torch.Tensor,
135
+ kv: torch.Tensor,
136
+ attention_mask: torch.Tensor | None = None,
137
+ **kwargs,
138
+ ) -> torch.Tensor:
139
+ bs, seq_len = kv.shape[:2]
140
+ q_proj: torch.Tensor = self.Wq(q)
141
+ k_proj: torch.Tensor = self.Wk(kv)
142
+ v_proj: torch.Tensor = self.Wv(kv)
143
+ qkv = torch.stack(
144
+ (
145
+ q_proj.reshape(bs, seq_len, self.num_heads, self.head_dim),
146
+ k_proj.reshape(bs, seq_len, self.num_heads, self.head_dim),
147
+ v_proj.reshape(bs, seq_len, self.num_heads, self.head_dim),
148
+ ),
149
+ dim=2,
150
+ ) # (bs, seq_len, 3, num_heads, head_dim)
151
+
152
+ device = kv.device
153
+ if attention_mask is None:
154
+ attention_mask = torch.ones((bs, seq_len), device=device, dtype=torch.bool)
155
+ position_ids = torch.arange(seq_len, device=device).unsqueeze(0).long()
156
+
157
+ attn_outputs = MODERNBERT_ATTENTION_FUNCTION["sdpa"](
158
+ self,
159
+ qkv=qkv,
160
+ attention_mask=_prepare_4d_attention_mask(attention_mask, kv.dtype),
161
+ sliding_window_mask=None, # not needed when using global attention
162
+ position_ids=position_ids,
163
+ local_attention=self.local_attention,
164
+ bs=bs,
165
+ dim=self.all_head_size,
166
+ **kwargs,
167
+ )
168
+ hidden_states = attn_outputs[0]
169
+ hidden_states = self.out_drop(self.Wo(hidden_states))
170
+
171
+ return hidden_states
172
+
173
+
174
+ class ModChemBertModel(ModernBertPreTrainedModel):
175
+ config_class = ModChemBertConfig
176
+
177
+ def __init__(self, config: ModChemBertConfig):
178
+ super().__init__(config)
179
+ self.config = config
180
+ self.embeddings = ModernBertEmbeddings(config)
181
+ self.layers = nn.ModuleList(
182
+ [
183
+ ModernBertEncoderLayer(config, layer_id)
184
+ for layer_id in range(config.num_hidden_layers)
185
+ ]
186
+ )
187
+ self.final_norm = nn.LayerNorm(
188
+ config.hidden_size, eps=config.norm_eps, bias=config.norm_bias
189
+ )
190
+ self.gradient_checkpointing = False
191
+ self.post_init()
192
+
193
+ def get_input_embeddings(self):
194
+ return self.embeddings.tok_embeddings
195
+
196
+ def set_input_embeddings(self, value):
197
+ self.embeddings.tok_embeddings = value # type: ignore
198
+
199
+ def forward(
200
+ self,
201
+ input_ids: torch.LongTensor | None = None,
202
+ attention_mask: torch.Tensor | None = None,
203
+ sliding_window_mask: torch.Tensor | None = None,
204
+ position_ids: torch.LongTensor | None = None,
205
+ inputs_embeds: torch.Tensor | None = None,
206
+ indices: torch.Tensor | None = None,
207
+ cu_seqlens: torch.Tensor | None = None,
208
+ max_seqlen: int | None = None,
209
+ batch_size: int | None = None,
210
+ seq_len: int | None = None,
211
+ output_attentions: bool | None = None,
212
+ output_hidden_states: bool | None = None,
213
+ return_dict: bool | None = None,
214
+ ) -> tuple[torch.Tensor, ...] | BaseModelOutput:
215
+ r"""
216
+ sliding_window_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
217
+ Mask to avoid performing attention on padding or far-away tokens. In ModernBert, only every few layers
218
+ perform global attention, while the rest perform local attention. This mask is used to avoid attending to
219
+ far-away tokens in the local attention layers when not using Flash Attention.
220
+ indices (`torch.Tensor` of shape `(total_unpadded_tokens,)`, *optional*):
221
+ Indices of the non-padding tokens in the input sequence. Used for unpadding the output.
222
+ cu_seqlens (`torch.Tensor` of shape `(batch + 1,)`, *optional*):
223
+ Cumulative sequence lengths of the input sequences. Used to index the unpadded tensors.
224
+ max_seqlen (`int`, *optional*):
225
+ Maximum sequence length in the batch excluding padding tokens. Used to unpad input_ids and pad output tensors.
226
+ batch_size (`int`, *optional*):
227
+ Batch size of the input sequences. Used to pad the output tensors.
228
+ seq_len (`int`, *optional*):
229
+ Sequence length of the input sequences including padding tokens. Used to pad the output tensors.
230
+ """ # noqa: E501
231
+ output_attentions = (
232
+ output_attentions if output_attentions is not None else self.config.output_attentions
233
+ )
234
+ output_hidden_states = (
235
+ output_hidden_states
236
+ if output_hidden_states is not None
237
+ else self.config.output_hidden_states
238
+ )
239
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
240
+
241
+ if (input_ids is None) ^ (inputs_embeds is not None):
242
+ raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
243
+
244
+ all_hidden_states = () if output_hidden_states else None
245
+ all_self_attentions = () if output_attentions else None
246
+
247
+ self._maybe_set_compile()
248
+
249
+ if input_ids is not None:
250
+ self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
251
+
252
+ if batch_size is None and seq_len is None:
253
+ if inputs_embeds is not None:
254
+ batch_size, seq_len = inputs_embeds.shape[:2]
255
+ else:
256
+ batch_size, seq_len = input_ids.shape[:2] # type: ignore
257
+ device = input_ids.device if input_ids is not None else inputs_embeds.device # type: ignore
258
+
259
+ if attention_mask is None:
260
+ attention_mask = torch.ones((batch_size, seq_len), device=device, dtype=torch.bool) # type: ignore
261
+
262
+ repad = False
263
+ if self.config._attn_implementation == "flash_attention_2":
264
+ if indices is None and cu_seqlens is None and max_seqlen is None:
265
+ repad = True
266
+ if inputs_embeds is None:
267
+ with torch.no_grad():
268
+ input_ids, indices, cu_seqlens, max_seqlen, *_ = _unpad_modernbert_input(
269
+ inputs=input_ids, # type: ignore
270
+ attention_mask=attention_mask, # type: ignore
271
+ )
272
+ else:
273
+ inputs_embeds, indices, cu_seqlens, max_seqlen, *_ = _unpad_modernbert_input(
274
+ inputs=inputs_embeds,
275
+ attention_mask=attention_mask, # type: ignore
276
+ )
277
+ else:
278
+ if position_ids is None:
279
+ position_ids = torch.arange(seq_len, device=device).unsqueeze(0) # type: ignore
280
+
281
+ attention_mask, sliding_window_mask = self._update_attention_mask(
282
+ attention_mask, # type: ignore
283
+ output_attentions=output_attentions, # type: ignore
284
+ )
285
+
286
+ hidden_states = self.embeddings(input_ids=input_ids, inputs_embeds=inputs_embeds)
287
+
288
+ for encoder_layer in self.layers:
289
+ if output_hidden_states:
290
+ all_hidden_states = all_hidden_states + (hidden_states,) # type: ignore
291
+
292
+ layer_outputs = encoder_layer(
293
+ hidden_states,
294
+ attention_mask=attention_mask,
295
+ sliding_window_mask=sliding_window_mask,
296
+ position_ids=position_ids,
297
+ cu_seqlens=cu_seqlens,
298
+ max_seqlen=max_seqlen,
299
+ output_attentions=output_attentions,
300
+ )
301
+ hidden_states = layer_outputs[0]
302
+ if output_attentions and len(layer_outputs) > 1:
303
+ all_self_attentions = all_self_attentions + (layer_outputs[1],) # type: ignore
304
+
305
+ if output_hidden_states:
306
+ all_hidden_states = all_hidden_states + (hidden_states,) # type: ignore
307
+
308
+ hidden_states = self.final_norm(hidden_states)
309
+
310
+ if repad:
311
+ hidden_states = _pad_modernbert_output(
312
+ inputs=hidden_states,
313
+ indices=indices, # type: ignore
314
+ batch=batch_size, # type: ignore
315
+ seqlen=seq_len, # type: ignore
316
+ )
317
+ if all_hidden_states is not None:
318
+ all_hidden_states = tuple(
319
+ _pad_modernbert_output(
320
+ inputs=hs, indices=indices, batch=batch_size, seqlen=seq_len
321
+ ) # type: ignore
322
+ for hs in all_hidden_states
323
+ )
324
+
325
+ if not return_dict:
326
+ return tuple(
327
+ v for v in [hidden_states, all_hidden_states, all_self_attentions] if v is not None
328
+ )
329
+ return BaseModelOutput(
330
+ last_hidden_state=hidden_states, # type: ignore
331
+ hidden_states=all_hidden_states, # type: ignore
332
+ attentions=all_self_attentions,
333
+ )
334
+
335
+ def _update_attention_mask(
336
+ self, attention_mask: torch.Tensor, output_attentions: bool
337
+ ) -> torch.Tensor:
338
+ if output_attentions:
339
+ if self.config._attn_implementation == "sdpa":
340
+ logger.warning_once( # type: ignore
341
+ "Outputting attentions is only supported with the 'eager' attention implementation, "
342
+ 'not with "sdpa". Falling back to `attn_implementation="eager"`.'
343
+ )
344
+ self.config._attn_implementation = "eager"
345
+ elif self.config._attn_implementation != "eager":
346
+ logger.warning_once( # type: ignore
347
+ "Outputting attentions is only supported with the eager attention implementation, "
348
+ f'not with {self.config._attn_implementation}. Consider setting `attn_implementation="eager"`.'
349
+ " Setting `output_attentions=False`."
350
+ )
351
+
352
+ global_attention_mask = _prepare_4d_attention_mask(attention_mask, self.dtype)
353
+
354
+ # Create position indices
355
+ rows = torch.arange(global_attention_mask.shape[2]).unsqueeze(0)
356
+ # Calculate distance between positions
357
+ distance = torch.abs(rows - rows.T)
358
+
359
+ # Create sliding window mask (1 for positions within window, 0 outside)
360
+ window_mask = (
361
+ (distance <= self.config.local_attention // 2)
362
+ .unsqueeze(0)
363
+ .unsqueeze(0)
364
+ .to(attention_mask.device)
365
+ )
366
+ # Combine with existing mask
367
+ sliding_window_mask = global_attention_mask.masked_fill(
368
+ window_mask.logical_not(), torch.finfo(self.dtype).min
369
+ )
370
+
371
+ return global_attention_mask, sliding_window_mask # type: ignore
372
+
373
+
374
+ class ModChemBertForMaskedLM(InitWeightsMixin, ModernBertPreTrainedModel):
375
+ config_class = ModChemBertConfig
376
+ _tied_weights_keys = ["decoder.weight"]
377
+
378
+ def __init__(self, config: ModChemBertConfig):
379
+ super().__init__(config)
380
+ self.config = config
381
+ self.model = ModChemBertModel(config)
382
+ self.head = ModernBertPredictionHead(config)
383
+ self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=config.decoder_bias)
384
+
385
+ self.sparse_prediction = self.config.sparse_prediction
386
+ self.sparse_pred_ignore_index = self.config.sparse_pred_ignore_index
387
+
388
+ # Initialize weights and apply final processing
389
+ self.post_init()
390
+
391
+ def get_output_embeddings(self):
392
+ return self.decoder
393
+
394
+ def set_output_embeddings(self, new_embeddings: nn.Linear):
395
+ self.decoder = new_embeddings
396
+
397
+ @torch.compile(dynamic=True)
398
+ def compiled_head(self, output: torch.Tensor) -> torch.Tensor:
399
+ return self.decoder(self.head(output))
400
+
401
+ def forward(
402
+ self,
403
+ input_ids: torch.LongTensor | None = None,
404
+ attention_mask: torch.Tensor | None = None,
405
+ sliding_window_mask: torch.Tensor | None = None,
406
+ position_ids: torch.Tensor | None = None,
407
+ inputs_embeds: torch.Tensor | None = None,
408
+ labels: torch.Tensor | None = None,
409
+ indices: torch.Tensor | None = None,
410
+ cu_seqlens: torch.Tensor | None = None,
411
+ max_seqlen: int | None = None,
412
+ batch_size: int | None = None,
413
+ seq_len: int | None = None,
414
+ output_attentions: bool | None = None,
415
+ output_hidden_states: bool | None = None,
416
+ return_dict: bool | None = None,
417
+ **kwargs,
418
+ ) -> tuple[torch.Tensor] | tuple[torch.Tensor, typing.Any] | MaskedLMOutput:
419
+ r"""
420
+ sliding_window_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
421
+ Mask to avoid performing attention on padding or far-away tokens. In ModernBert, only every few layers
422
+ perform global attention, while the rest perform local attention. This mask is used to avoid attending to
423
+ far-away tokens in the local attention layers when not using Flash Attention.
424
+ indices (`torch.Tensor` of shape `(total_unpadded_tokens,)`, *optional*):
425
+ Indices of the non-padding tokens in the input sequence. Used for unpadding the output.
426
+ cu_seqlens (`torch.Tensor` of shape `(batch + 1,)`, *optional*):
427
+ Cumulative sequence lengths of the input sequences. Used to index the unpadded tensors.
428
+ max_seqlen (`int`, *optional*):
429
+ Maximum sequence length in the batch excluding padding tokens. Used to unpad input_ids & pad output tensors.
430
+ batch_size (`int`, *optional*):
431
+ Batch size of the input sequences. Used to pad the output tensors.
432
+ seq_len (`int`, *optional*):
433
+ Sequence length of the input sequences including padding tokens. Used to pad the output tensors.
434
+ """
435
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
436
+ self._maybe_set_compile()
437
+
438
+ if self.config._attn_implementation == "flash_attention_2": # noqa: SIM102
439
+ if indices is None and cu_seqlens is None and max_seqlen is None:
440
+ if batch_size is None and seq_len is None:
441
+ if inputs_embeds is not None:
442
+ batch_size, seq_len = inputs_embeds.shape[:2]
443
+ else:
444
+ batch_size, seq_len = input_ids.shape[:2] # type: ignore
445
+ device = input_ids.device if input_ids is not None else inputs_embeds.device # type: ignore
446
+
447
+ if attention_mask is None:
448
+ attention_mask = torch.ones(
449
+ (batch_size, seq_len), device=device, dtype=torch.bool
450
+ ) # type: ignore
451
+
452
+ if inputs_embeds is None:
453
+ with torch.no_grad():
454
+ input_ids, indices, cu_seqlens, max_seqlen, position_ids, labels = (
455
+ _unpad_modernbert_input(
456
+ inputs=input_ids, # type: ignore
457
+ attention_mask=attention_mask, # type: ignore
458
+ position_ids=position_ids,
459
+ labels=labels,
460
+ )
461
+ )
462
+ else:
463
+ inputs_embeds, indices, cu_seqlens, max_seqlen, position_ids, labels = (
464
+ _unpad_modernbert_input(
465
+ inputs=inputs_embeds,
466
+ attention_mask=attention_mask, # type: ignore
467
+ position_ids=position_ids,
468
+ labels=labels,
469
+ )
470
+ )
471
+
472
+ outputs = self.model(
473
+ input_ids=input_ids,
474
+ attention_mask=attention_mask,
475
+ sliding_window_mask=sliding_window_mask,
476
+ position_ids=position_ids,
477
+ inputs_embeds=inputs_embeds,
478
+ indices=indices,
479
+ cu_seqlens=cu_seqlens,
480
+ max_seqlen=max_seqlen,
481
+ batch_size=batch_size,
482
+ seq_len=seq_len,
483
+ output_attentions=output_attentions,
484
+ output_hidden_states=output_hidden_states,
485
+ return_dict=return_dict,
486
+ )
487
+ last_hidden_state = outputs[0]
488
+
489
+ if self.sparse_prediction and labels is not None:
490
+ # flatten labels and output first
491
+ labels = labels.view(-1)
492
+ last_hidden_state = last_hidden_state.view(labels.shape[0], -1)
493
+
494
+ # then filter out the non-masked tokens
495
+ mask_tokens = labels != self.sparse_pred_ignore_index
496
+ last_hidden_state = last_hidden_state[mask_tokens]
497
+ labels = labels[mask_tokens]
498
+
499
+ logits = (
500
+ self.compiled_head(last_hidden_state)
501
+ if self.config.reference_compile
502
+ else self.decoder(self.head(last_hidden_state))
503
+ )
504
+
505
+ loss = None
506
+ if labels is not None:
507
+ loss = self.loss_function(logits, labels, vocab_size=self.config.vocab_size, **kwargs)
508
+
509
+ if self.config._attn_implementation == "flash_attention_2":
510
+ with (
511
+ nullcontext()
512
+ if self.config.repad_logits_with_grad or labels is None
513
+ else torch.no_grad()
514
+ ):
515
+ logits = _pad_modernbert_output(
516
+ inputs=logits, indices=indices, batch=batch_size, seqlen=seq_len
517
+ ) # type: ignore
518
+
519
+ if not return_dict:
520
+ output = (logits,)
521
+ return ((loss,) + output) if loss is not None else output
522
+
523
+ return MaskedLMOutput(
524
+ loss=loss,
525
+ logits=typing.cast(torch.FloatTensor, logits),
526
+ hidden_states=outputs.hidden_states,
527
+ attentions=outputs.attentions,
528
+ )
529
+
530
+
531
+ class ModChemBertForSequenceClassification(InitWeightsMixin, ModernBertPreTrainedModel):
532
+ config_class = ModChemBertConfig
533
+
534
+ def __init__(self, config: ModChemBertConfig):
535
+ super().__init__(config)
536
+ self.num_labels = config.num_labels
537
+ self.config = config
538
+
539
+ self.model = ModernBertModel(config)
540
+ if self.config.classifier_pooling in {"cls_mha", "max_seq_mha"}:
541
+ self.pooling_attn = ModChemBertPoolingAttention(config=self.config)
542
+ else:
543
+ self.pooling_attn = None
544
+ self.head = ModernBertPredictionHead(config)
545
+ self.drop = torch.nn.Dropout(config.classifier_dropout)
546
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
547
+
548
+ # Initialize weights and apply final processing
549
+ self.post_init()
550
+
551
+ def forward(
552
+ self,
553
+ input_ids: torch.LongTensor | None = None,
554
+ attention_mask: torch.Tensor | None = None,
555
+ sliding_window_mask: torch.Tensor | None = None,
556
+ position_ids: torch.Tensor | None = None,
557
+ inputs_embeds: torch.Tensor | None = None,
558
+ labels: torch.Tensor | None = None,
559
+ indices: torch.Tensor | None = None,
560
+ cu_seqlens: torch.Tensor | None = None,
561
+ max_seqlen: int | None = None,
562
+ batch_size: int | None = None,
563
+ seq_len: int | None = None,
564
+ output_attentions: bool | None = None,
565
+ output_hidden_states: bool | None = None,
566
+ return_dict: bool | None = None,
567
+ **kwargs,
568
+ ) -> tuple[torch.Tensor] | tuple[torch.Tensor, typing.Any] | SequenceClassifierOutput:
569
+ r"""
570
+ sliding_window_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
571
+ Mask to avoid performing attention on padding or far-away tokens. In ModernBert, only every few layers
572
+ perform global attention, while the rest perform local attention. This mask is used to avoid attending to
573
+ far-away tokens in the local attention layers when not using Flash Attention.
574
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
575
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
576
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
577
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
578
+ indices (`torch.Tensor` of shape `(total_unpadded_tokens,)`, *optional*):
579
+ Indices of the non-padding tokens in the input sequence. Used for unpadding the output.
580
+ cu_seqlens (`torch.Tensor` of shape `(batch + 1,)`, *optional*):
581
+ Cumulative sequence lengths of the input sequences. Used to index the unpadded tensors.
582
+ max_seqlen (`int`, *optional*):
583
+ Maximum sequence length in the batch excluding padding tokens. Used to unpad input_ids & pad output tensors.
584
+ batch_size (`int`, *optional*):
585
+ Batch size of the input sequences. Used to pad the output tensors.
586
+ seq_len (`int`, *optional*):
587
+ Sequence length of the input sequences including padding tokens. Used to pad the output tensors.
588
+ """
589
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
590
+ self._maybe_set_compile()
591
+
592
+ if input_ids is not None:
593
+ self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
594
+
595
+ if batch_size is None and seq_len is None:
596
+ if inputs_embeds is not None:
597
+ batch_size, seq_len = inputs_embeds.shape[:2]
598
+ else:
599
+ batch_size, seq_len = input_ids.shape[:2] # type: ignore
600
+ device = input_ids.device if input_ids is not None else inputs_embeds.device # type: ignore
601
+
602
+ if attention_mask is None:
603
+ attention_mask = torch.ones((batch_size, seq_len), device=device, dtype=torch.bool) # type: ignore
604
+
605
+ # Ensure output_hidden_states is True in case pooling mode requires all hidden states
606
+ output_hidden_states = True
607
+
608
+ outputs = self.model(
609
+ input_ids=input_ids,
610
+ attention_mask=attention_mask,
611
+ sliding_window_mask=sliding_window_mask,
612
+ position_ids=position_ids,
613
+ inputs_embeds=inputs_embeds,
614
+ indices=indices,
615
+ cu_seqlens=cu_seqlens,
616
+ max_seqlen=max_seqlen,
617
+ batch_size=batch_size,
618
+ seq_len=seq_len,
619
+ output_attentions=output_attentions,
620
+ output_hidden_states=output_hidden_states,
621
+ return_dict=return_dict,
622
+ )
623
+ last_hidden_state = outputs[0]
624
+ hidden_states = outputs[1]
625
+
626
+ last_hidden_state = _pool_modchembert_output(
627
+ self,
628
+ last_hidden_state,
629
+ hidden_states,
630
+ typing.cast(torch.Tensor, attention_mask),
631
+ )
632
+ pooled_output = self.head(last_hidden_state)
633
+ pooled_output = self.drop(pooled_output)
634
+ logits = self.classifier(pooled_output)
635
+
636
+ loss = None
637
+ if labels is not None:
638
+ if self.config.problem_type is None:
639
+ if self.num_labels == 1:
640
+ self.config.problem_type = "regression"
641
+ elif self.num_labels > 1 and (
642
+ labels.dtype == torch.long or labels.dtype == torch.int
643
+ ):
644
+ self.config.problem_type = "single_label_classification"
645
+ else:
646
+ self.config.problem_type = "multi_label_classification"
647
+
648
+ if self.config.problem_type == "regression":
649
+ loss_fct = MSELoss()
650
+ if self.num_labels == 1:
651
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
652
+ else:
653
+ loss = loss_fct(logits, labels)
654
+ elif self.config.problem_type == "single_label_classification":
655
+ loss_fct = CrossEntropyLoss()
656
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
657
+ elif self.config.problem_type == "multi_label_classification":
658
+ loss_fct = BCEWithLogitsLoss()
659
+ loss = loss_fct(logits, labels)
660
+
661
+ if not return_dict:
662
+ output = (logits,)
663
+ return ((loss,) + output) if loss is not None else output
664
+
665
+ return SequenceClassifierOutput(
666
+ loss=loss,
667
+ logits=logits,
668
+ hidden_states=outputs.hidden_states,
669
+ attentions=outputs.attentions,
670
+ )
671
+
672
+
673
+ def _pool_modchembert_output(
674
+ module: ModChemBertForSequenceClassification,
675
+ last_hidden_state: torch.Tensor,
676
+ hidden_states: list[torch.Tensor],
677
+ attention_mask: torch.Tensor,
678
+ ):
679
+ """
680
+ Apply pooling strategy to hidden states for sequence-level classification/regression tasks.
681
+
682
+ This function implements various pooling strategies to aggregate sequence representations
683
+ into a single vector for downstream classification or regression tasks. The pooling method
684
+ is determined by the `classifier_pooling` configuration parameter.
685
+
686
+ Available pooling strategies:
687
+ - cls: Use the CLS token ([CLS]) representation from the last hidden state
688
+ - mean: Average pooling over all tokens in the sequence (attention-weighted)
689
+ - max_cls: Element-wise max pooling over the last k hidden states, then take CLS token
690
+ - cls_mha: Multi-head attention with CLS token as query and full sequence as keys/values
691
+ - max_seq_mha: Max pooling over last k states + multi-head attention with CLS as query
692
+ - max_seq_mean: Max pooling over last k hidden states, then mean pooling over sequence
693
+ - sum_mean: Sum all hidden states across layers, then mean pool over sequence
694
+ - sum_sum: Sum all hidden states across layers, then sum pool over sequence
695
+ - mean_sum: Mean all hidden states across layers, then sum pool over sequence
696
+ - mean_mean: Mean all hidden states across layers, then mean pool over sequence
697
+
698
+ Args:
699
+ module: The model instance containing configuration and pooling attention if needed
700
+ last_hidden_state: Final layer hidden states of shape (batch_size, seq_len, hidden_size)
701
+ hidden_states: List of hidden states from all layers, each of shape (batch_size, seq_len, hidden_size)
702
+ attention_mask: Attention mask of shape (batch_size, seq_len) indicating valid tokens
703
+
704
+ Returns:
705
+ torch.Tensor: Pooled representation of shape (batch_size, hidden_size)
706
+
707
+ Note:
708
+ Some pooling strategies (cls_mha, max_seq_mha) require the module to have a pooling_attn
709
+ attribute containing a ModChemBertPoolingAttention instance.
710
+ """
711
+ config = typing.cast(ModChemBertConfig, module.config)
712
+ if config.classifier_pooling == "cls":
713
+ last_hidden_state = last_hidden_state[:, 0]
714
+ elif config.classifier_pooling == "mean":
715
+ last_hidden_state = (last_hidden_state * attention_mask.unsqueeze(-1)).sum(
716
+ dim=1
717
+ ) / attention_mask.sum(dim=1, keepdim=True)
718
+ elif config.classifier_pooling == "max_cls":
719
+ k_hidden_states = hidden_states[-config.classifier_pooling_last_k :]
720
+ theta = torch.stack(k_hidden_states, dim=1) # (batch, k, seq_len, hidden)
721
+ pooled_seq = torch.max(
722
+ theta, dim=1
723
+ ).values # Element-wise max over k -> (batch, seq_len, hidden)
724
+ last_hidden_state = pooled_seq[:, 0, :] # (batch, hidden)
725
+ elif config.classifier_pooling == "cls_mha":
726
+ # Similar to max_seq_mha but without the max pooling step
727
+ # Query is CLS token (position 0); Keys/Values are full sequence
728
+ q = last_hidden_state[:, 0, :].unsqueeze(1) # (batch, 1, hidden)
729
+ q = q.expand(-1, last_hidden_state.shape[1], -1) # (batch, seq_len, hidden)
730
+ attn_out: torch.Tensor = module.pooling_attn( # type: ignore
731
+ q=q, kv=last_hidden_state, attention_mask=attention_mask
732
+ ) # (batch, seq_len, hidden)
733
+ last_hidden_state = torch.mean(attn_out, dim=1)
734
+ elif config.classifier_pooling == "max_seq_mha":
735
+ k_hidden_states = hidden_states[-config.classifier_pooling_last_k :]
736
+ theta = torch.stack(k_hidden_states, dim=1) # (batch, k, seq_len, hidden)
737
+ pooled_seq = torch.max(
738
+ theta, dim=1
739
+ ).values # Element-wise max over k -> (batch, seq_len, hidden)
740
+ # Query is pooled CLS token (position 0); Keys/Values are pooled sequence
741
+ q = pooled_seq[:, 0, :].unsqueeze(1) # (batch, 1, hidden)
742
+ q = q.expand(-1, pooled_seq.shape[1], -1) # (batch, seq_len, hidden)
743
+ attn_out: torch.Tensor = module.pooling_attn( # type: ignore
744
+ q=q, kv=pooled_seq, attention_mask=attention_mask
745
+ ) # (batch, seq_len, hidden)
746
+ last_hidden_state = torch.mean(attn_out, dim=1)
747
+ elif config.classifier_pooling == "max_seq_mean":
748
+ k_hidden_states = hidden_states[-config.classifier_pooling_last_k :]
749
+ theta = torch.stack(k_hidden_states, dim=1) # (batch, k, seq_len, hidden)
750
+ pooled_seq = torch.max(
751
+ theta, dim=1
752
+ ).values # Element-wise max over k -> (batch, seq_len, hidden)
753
+ last_hidden_state = torch.mean(pooled_seq, dim=1) # Mean over sequence length
754
+ elif config.classifier_pooling == "sum_mean":
755
+ # ChemLM uses the mean of all hidden states
756
+ # which outperforms using just the last layer mean or the cls embedding
757
+ # https://doi.org/10.1038/s42004-025-01484-4
758
+ # https://static-content.springer.com/esm/art%3A10.1038%2Fs42004-025-01484-4/MediaObjects/42004_2025_1484_MOESM2_ESM.pdf
759
+ all_hidden_states = torch.stack(hidden_states)
760
+ w = torch.sum(all_hidden_states, dim=0)
761
+ last_hidden_state = torch.mean(w, dim=1)
762
+ elif config.classifier_pooling == "sum_sum":
763
+ all_hidden_states = torch.stack(hidden_states)
764
+ w = torch.sum(all_hidden_states, dim=0)
765
+ last_hidden_state = torch.sum(w, dim=1)
766
+ elif config.classifier_pooling == "mean_sum":
767
+ all_hidden_states = torch.stack(hidden_states)
768
+ w = torch.mean(all_hidden_states, dim=0)
769
+ last_hidden_state = torch.sum(w, dim=1)
770
+ elif config.classifier_pooling == "mean_mean":
771
+ all_hidden_states = torch.stack(hidden_states)
772
+ w = torch.mean(all_hidden_states, dim=0)
773
+ last_hidden_state = torch.mean(w, dim=1)
774
+ return last_hidden_state
775
+
776
+
777
+ __all__ = [
778
+ "ModChemBertForMaskedLM",
779
+ "ModChemBertForSequenceClassification",
780
+ ]
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
similarity_evaluation_pubchem_10m_genmol_similarity_float32_results.csv ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,steps,spearman
2
+ 0,0,0.7261446896400275
3
+ 0.2499925700642937,25235,0.899727524994741
4
+ 0.4999851401285874,50470,0.9599428082697957
5
+ 0.7499777101928812,75705,0.9755030703217896
6
+ 0.9999702802571748,100940,0.9809624466313892
7
+ 1.2499628503214686,126175,0.9838128954121899
8
+ 1.4999554203857621,151410,0.9854756886661312
9
+ 1.749947990450056,176645,0.9865980464822579
10
+ 1.9999405605143497,201880,0.9873943693937194
11
+ 2.2499331305786434,227115,0.9878659546563734
12
+ 2.499925700642937,252350,0.9879865870047979
13
+ 2.749918270707231,277585,0.9881075350289332
14
+ 2.9999108407715243,302820,0.9881056976837288
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
@@ -0,0 +1,2554 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 512,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": "BatchLongest",
11
+ "direction": "Right",
12
+ "pad_to_multiple_of": null,
13
+ "pad_id": 2,
14
+ "pad_type_id": 0,
15
+ "pad_token": "[PAD]"
16
+ },
17
+ "added_tokens": [
18
+ {
19
+ "id": 0,
20
+ "content": "[CLS]",
21
+ "single_word": false,
22
+ "lstrip": false,
23
+ "rstrip": false,
24
+ "normalized": false,
25
+ "special": true
26
+ },
27
+ {
28
+ "id": 1,
29
+ "content": "[SEP]",
30
+ "single_word": false,
31
+ "lstrip": false,
32
+ "rstrip": false,
33
+ "normalized": false,
34
+ "special": true
35
+ },
36
+ {
37
+ "id": 2,
38
+ "content": "[PAD]",
39
+ "single_word": false,
40
+ "lstrip": false,
41
+ "rstrip": false,
42
+ "normalized": false,
43
+ "special": true
44
+ },
45
+ {
46
+ "id": 3,
47
+ "content": "[MASK]",
48
+ "single_word": false,
49
+ "lstrip": true,
50
+ "rstrip": false,
51
+ "normalized": false,
52
+ "special": true
53
+ },
54
+ {
55
+ "id": 2361,
56
+ "content": "[UNK]",
57
+ "single_word": false,
58
+ "lstrip": false,
59
+ "rstrip": false,
60
+ "normalized": false,
61
+ "special": true
62
+ }
63
+ ],
64
+ "normalizer": null,
65
+ "pre_tokenizer": {
66
+ "type": "ByteLevel",
67
+ "add_prefix_space": false,
68
+ "trim_offsets": true,
69
+ "use_regex": true
70
+ },
71
+ "post_processor": {
72
+ "type": "TemplateProcessing",
73
+ "single": [
74
+ {
75
+ "SpecialToken": {
76
+ "id": "[CLS]",
77
+ "type_id": 0
78
+ }
79
+ },
80
+ {
81
+ "Sequence": {
82
+ "id": "A",
83
+ "type_id": 0
84
+ }
85
+ },
86
+ {
87
+ "SpecialToken": {
88
+ "id": "[SEP]",
89
+ "type_id": 0
90
+ }
91
+ }
92
+ ],
93
+ "pair": [
94
+ {
95
+ "SpecialToken": {
96
+ "id": "[CLS]",
97
+ "type_id": 0
98
+ }
99
+ },
100
+ {
101
+ "Sequence": {
102
+ "id": "A",
103
+ "type_id": 0
104
+ }
105
+ },
106
+ {
107
+ "SpecialToken": {
108
+ "id": "[SEP]",
109
+ "type_id": 0
110
+ }
111
+ },
112
+ {
113
+ "Sequence": {
114
+ "id": "B",
115
+ "type_id": 0
116
+ }
117
+ },
118
+ {
119
+ "SpecialToken": {
120
+ "id": "[SEP]",
121
+ "type_id": 0
122
+ }
123
+ }
124
+ ],
125
+ "special_tokens": {
126
+ "[CLS]": {
127
+ "id": "[CLS]",
128
+ "ids": [
129
+ 0
130
+ ],
131
+ "tokens": [
132
+ "[CLS]"
133
+ ]
134
+ },
135
+ "[MASK]": {
136
+ "id": "[MASK]",
137
+ "ids": [
138
+ 3
139
+ ],
140
+ "tokens": [
141
+ "[MASK]"
142
+ ]
143
+ },
144
+ "[PAD]": {
145
+ "id": "[PAD]",
146
+ "ids": [
147
+ 2
148
+ ],
149
+ "tokens": [
150
+ "[PAD]"
151
+ ]
152
+ },
153
+ "[SEP]": {
154
+ "id": "[SEP]",
155
+ "ids": [
156
+ 1
157
+ ],
158
+ "tokens": [
159
+ "[SEP]"
160
+ ]
161
+ },
162
+ "[UNK]": {
163
+ "id": "[UNK]",
164
+ "ids": [
165
+ 2361
166
+ ],
167
+ "tokens": [
168
+ "[UNK]"
169
+ ]
170
+ }
171
+ }
172
+ },
173
+ "decoder": {
174
+ "type": "ByteLevel",
175
+ "add_prefix_space": false,
176
+ "trim_offsets": true,
177
+ "use_regex": true
178
+ },
179
+ "model": {
180
+ "type": "BPE",
181
+ "dropout": null,
182
+ "unk_token": "[UNK]",
183
+ "continuing_subword_prefix": null,
184
+ "end_of_word_suffix": null,
185
+ "fuse_unk": false,
186
+ "byte_fallback": false,
187
+ "ignore_merges": false,
188
+ "vocab": {
189
+ "[CLS]": 0,
190
+ "[SEP]": 1,
191
+ "[PAD]": 2,
192
+ "[MASK]": 3,
193
+ "C": 4,
194
+ "c": 5,
195
+ "(": 6,
196
+ ")": 7,
197
+ "1": 8,
198
+ "O": 9,
199
+ "N": 10,
200
+ "2": 11,
201
+ "=": 12,
202
+ "n": 13,
203
+ "3": 14,
204
+ "[C@H]": 15,
205
+ "[C@@H]": 16,
206
+ "F": 17,
207
+ "S": 18,
208
+ "4": 19,
209
+ "Cl": 20,
210
+ "-": 21,
211
+ "o": 22,
212
+ "s": 23,
213
+ "[nH]": 24,
214
+ "#": 25,
215
+ "/": 26,
216
+ "Br": 27,
217
+ "[C@]": 28,
218
+ "[C@@]": 29,
219
+ "[N+]": 30,
220
+ "[O-]": 31,
221
+ "5": 32,
222
+ "\\": 33,
223
+ ".": 34,
224
+ "I": 35,
225
+ "6": 36,
226
+ "[S@]": 37,
227
+ "[S@@]": 38,
228
+ "P": 39,
229
+ "[N-]": 40,
230
+ "[Si]": 41,
231
+ "7": 42,
232
+ "[n+]": 43,
233
+ "[2H]": 44,
234
+ "8": 45,
235
+ "[NH+]": 46,
236
+ "B": 47,
237
+ "9": 48,
238
+ "[C-]": 49,
239
+ "[Na+]": 50,
240
+ "[Cl-]": 51,
241
+ "[c-]": 52,
242
+ "[CH]": 53,
243
+ "%10": 54,
244
+ "[NH2+]": 55,
245
+ "[P+]": 56,
246
+ "[B]": 57,
247
+ "[I-]": 58,
248
+ "%11": 59,
249
+ "[CH2-]": 60,
250
+ "[O+]": 61,
251
+ "[NH3+]": 62,
252
+ "[C]": 63,
253
+ "[Br-]": 64,
254
+ "[IH2]": 65,
255
+ "[S-]": 66,
256
+ "[cH-]": 67,
257
+ "%12": 68,
258
+ "[nH+]": 69,
259
+ "[B-]": 70,
260
+ "[K+]": 71,
261
+ "[Sn]": 72,
262
+ "[Se]": 73,
263
+ "[CH-]": 74,
264
+ "[HH]": 75,
265
+ "[Y]": 76,
266
+ "[n-]": 77,
267
+ "[CH3-]": 78,
268
+ "[SiH]": 79,
269
+ "[S+]": 80,
270
+ "%13": 81,
271
+ "[SiH2]": 82,
272
+ "[Li+]": 83,
273
+ "[NH-]": 84,
274
+ "%14": 85,
275
+ "[Na]": 86,
276
+ "[CH2]": 87,
277
+ "[O-2]": 88,
278
+ "[U+2]": 89,
279
+ "[W]": 90,
280
+ "[Al]": 91,
281
+ "[P@]": 92,
282
+ "[Fe+2]": 93,
283
+ "[PH+]": 94,
284
+ "%15": 95,
285
+ "[Cl+3]": 96,
286
+ "[Zn+2]": 97,
287
+ "[Ir]": 98,
288
+ "[Mg+2]": 99,
289
+ "[Pt+2]": 100,
290
+ "[OH2+]": 101,
291
+ "[As]": 102,
292
+ "[Fe]": 103,
293
+ "[OH+]": 104,
294
+ "[Zr+2]": 105,
295
+ "[3H]": 106,
296
+ "[Ge]": 107,
297
+ "[SiH3]": 108,
298
+ "[OH-]": 109,
299
+ "[NH4+]": 110,
300
+ "[Cu+2]": 111,
301
+ "[P@@]": 112,
302
+ "p": 113,
303
+ "[Pt]": 114,
304
+ "%16": 115,
305
+ "[Ca+2]": 116,
306
+ "[Zr]": 117,
307
+ "[F-]": 118,
308
+ "[C+]": 119,
309
+ "[Ti]": 120,
310
+ "[P-]": 121,
311
+ "[V]": 122,
312
+ "[se]": 123,
313
+ "[U]": 124,
314
+ "[O]": 125,
315
+ "[Ni+2]": 126,
316
+ "[Zn]": 127,
317
+ "[Co]": 128,
318
+ "[Ni]": 129,
319
+ "[Pd+2]": 130,
320
+ "[Cu]": 131,
321
+ "%17": 132,
322
+ "[Cu+]": 133,
323
+ "[Te]": 134,
324
+ "[H+]": 135,
325
+ "[CH+]": 136,
326
+ "[Li]": 137,
327
+ "[Pd]": 138,
328
+ "[Mo]": 139,
329
+ "[Ru+2]": 140,
330
+ "[o+]": 141,
331
+ "[Re]": 142,
332
+ "[SH+]": 143,
333
+ "%18": 144,
334
+ "[Ac]": 145,
335
+ "[Cr]": 146,
336
+ "[NH2-]": 147,
337
+ "[K]": 148,
338
+ "[13CH2]": 149,
339
+ "[c]": 150,
340
+ "[Zr+4]": 151,
341
+ "[Tl]": 152,
342
+ "[13C]": 153,
343
+ "[Mn]": 154,
344
+ "[N@+]": 155,
345
+ "[Hg]": 156,
346
+ "[Rh]": 157,
347
+ "[Ti+4]": 158,
348
+ "[Sb]": 159,
349
+ "[Co+2]": 160,
350
+ "[Ag+]": 161,
351
+ "[Ru]": 162,
352
+ "%19": 163,
353
+ "[N@@+]": 164,
354
+ "[Ti+2]": 165,
355
+ "[Al+3]": 166,
356
+ "[Pb]": 167,
357
+ "[I+]": 168,
358
+ "[18F]": 169,
359
+ "[s+]": 170,
360
+ "[Rb+]": 171,
361
+ "[Ba+2]": 172,
362
+ "[H-]": 173,
363
+ "[Fe+3]": 174,
364
+ "[Ir+3]": 175,
365
+ "[13cH]": 176,
366
+ "%20": 177,
367
+ "[AlH2]": 178,
368
+ "[Au+]": 179,
369
+ "[13c]": 180,
370
+ "[SH2+]": 181,
371
+ "[Sn+2]": 182,
372
+ "[Mn+2]": 183,
373
+ "[Si-]": 184,
374
+ "[Ag]": 185,
375
+ "[N]": 186,
376
+ "[Bi]": 187,
377
+ "%21": 188,
378
+ "[In]": 189,
379
+ "[CH2+]": 190,
380
+ "[Y+3]": 191,
381
+ "[Ga]": 192,
382
+ "%22": 193,
383
+ "[Co+3]": 194,
384
+ "[Au]": 195,
385
+ "[13CH3]": 196,
386
+ "[Mg]": 197,
387
+ "[Cs+]": 198,
388
+ "[W+2]": 199,
389
+ "[Hf]": 200,
390
+ "[Zn+]": 201,
391
+ "[Se-]": 202,
392
+ "[S-2]": 203,
393
+ "[Ca]": 204,
394
+ "[pH]": 205,
395
+ "[ClH+]": 206,
396
+ "[Ti+3]": 207,
397
+ "%23": 208,
398
+ "[Ru+]": 209,
399
+ "[SH-]": 210,
400
+ "[13CH]": 211,
401
+ "[IH+]": 212,
402
+ "[Hf+4]": 213,
403
+ "[Rf]": 214,
404
+ "[OH3+]": 215,
405
+ "%24": 216,
406
+ "[Pt+4]": 217,
407
+ "[Zr+3]": 218,
408
+ "[PH3+]": 219,
409
+ "[Sr+2]": 220,
410
+ "[Cd+2]": 221,
411
+ "[Cd]": 222,
412
+ "%25": 223,
413
+ "[Os]": 224,
414
+ "[BH-]": 225,
415
+ "[Sn+4]": 226,
416
+ "[Cr+3]": 227,
417
+ "[Ru+3]": 228,
418
+ "[PH2+]": 229,
419
+ "[Rh+2]": 230,
420
+ "[V+2]": 231,
421
+ "%26": 232,
422
+ "[Gd+3]": 233,
423
+ "[Pb+2]": 234,
424
+ "[PH]": 235,
425
+ "[Hg+]": 236,
426
+ "[Mo+2]": 237,
427
+ "[AlH]": 238,
428
+ "[Sn+]": 239,
429
+ "%27": 240,
430
+ "[Pd+]": 241,
431
+ "b": 242,
432
+ "[Rh+3]": 243,
433
+ "[Hg+2]": 244,
434
+ "[15NH]": 245,
435
+ "[14C]": 246,
436
+ "%28": 247,
437
+ "[Mn+3]": 248,
438
+ "[Si+]": 249,
439
+ "[SeH]": 250,
440
+ "[13C@H]": 251,
441
+ "[NH]": 252,
442
+ "[Ga+3]": 253,
443
+ "[SiH-]": 254,
444
+ "[13C@@H]": 255,
445
+ "[Ce]": 256,
446
+ "[Au+3]": 257,
447
+ "[Bi+3]": 258,
448
+ "[15N]": 259,
449
+ "%29": 260,
450
+ "[BH3-]": 261,
451
+ "[14cH]": 262,
452
+ "[Ti+]": 263,
453
+ "[Gd]": 264,
454
+ "[cH+]": 265,
455
+ "[Cr+2]": 266,
456
+ "[Sb-]": 267,
457
+ "%30": 268,
458
+ "[Be+2]": 269,
459
+ "[Al+]": 270,
460
+ "[te]": 271,
461
+ "[11CH3]": 272,
462
+ "[Sm]": 273,
463
+ "[Pr]": 274,
464
+ "[La]": 275,
465
+ "%31": 276,
466
+ "[Al-]": 277,
467
+ "[Ta]": 278,
468
+ "[125I]": 279,
469
+ "[BH2-]": 280,
470
+ "[Nb]": 281,
471
+ "[Si@]": 282,
472
+ "%32": 283,
473
+ "[14c]": 284,
474
+ "[Sb+3]": 285,
475
+ "[Ba]": 286,
476
+ "%33": 287,
477
+ "[Os+2]": 288,
478
+ "[Si@@]": 289,
479
+ "[La+3]": 290,
480
+ "[15n]": 291,
481
+ "[15NH2]": 292,
482
+ "[Nd+3]": 293,
483
+ "%34": 294,
484
+ "[14CH2]": 295,
485
+ "[18O]": 296,
486
+ "[Nd]": 297,
487
+ "[GeH]": 298,
488
+ "[Ni+3]": 299,
489
+ "[Eu]": 300,
490
+ "[Dy+3]": 301,
491
+ "[Sc]": 302,
492
+ "%36": 303,
493
+ "[Se-2]": 304,
494
+ "[As+]": 305,
495
+ "%35": 306,
496
+ "[AsH]": 307,
497
+ "[Tb]": 308,
498
+ "[Sb+5]": 309,
499
+ "[Se+]": 310,
500
+ "[Ce+3]": 311,
501
+ "[c+]": 312,
502
+ "[In+3]": 313,
503
+ "[SnH]": 314,
504
+ "[Mo+4]": 315,
505
+ "%37": 316,
506
+ "[V+4]": 317,
507
+ "[Eu+3]": 318,
508
+ "[Hf+2]": 319,
509
+ "%38": 320,
510
+ "[Pt+]": 321,
511
+ "[p+]": 322,
512
+ "[123I]": 323,
513
+ "[Tl+]": 324,
514
+ "[Sm+3]": 325,
515
+ "%39": 326,
516
+ "[Yb+3]": 327,
517
+ "%40": 328,
518
+ "[Yb]": 329,
519
+ "[Os+]": 330,
520
+ "%41": 331,
521
+ "[10B]": 332,
522
+ "[Sc+3]": 333,
523
+ "[Al+2]": 334,
524
+ "%42": 335,
525
+ "[Sr]": 336,
526
+ "[Tb+3]": 337,
527
+ "[Po]": 338,
528
+ "[Tc]": 339,
529
+ "[PH-]": 340,
530
+ "[AlH3]": 341,
531
+ "[Ar]": 342,
532
+ "[U+4]": 343,
533
+ "[SnH2]": 344,
534
+ "[Cl+2]": 345,
535
+ "[si]": 346,
536
+ "[Fe+]": 347,
537
+ "[14CH3]": 348,
538
+ "[U+3]": 349,
539
+ "[Cl+]": 350,
540
+ "%43": 351,
541
+ "[GeH2]": 352,
542
+ "%44": 353,
543
+ "[Er+3]": 354,
544
+ "[Mo+3]": 355,
545
+ "[I+2]": 356,
546
+ "[Fe+4]": 357,
547
+ "[99Tc]": 358,
548
+ "%45": 359,
549
+ "[11C]": 360,
550
+ "%46": 361,
551
+ "[SnH3]": 362,
552
+ "[S]": 363,
553
+ "[Te+]": 364,
554
+ "[Er]": 365,
555
+ "[Lu+3]": 366,
556
+ "[11B]": 367,
557
+ "%47": 368,
558
+ "%48": 369,
559
+ "[P]": 370,
560
+ "[Tm]": 371,
561
+ "[Th]": 372,
562
+ "[Dy]": 373,
563
+ "[Pr+3]": 374,
564
+ "[Ta+5]": 375,
565
+ "[Nb+5]": 376,
566
+ "[Rb]": 377,
567
+ "[GeH3]": 378,
568
+ "[Br+2]": 379,
569
+ "%49": 380,
570
+ "[131I]": 381,
571
+ "[Fm]": 382,
572
+ "[Cs]": 383,
573
+ "[BH4-]": 384,
574
+ "[Lu]": 385,
575
+ "[15nH]": 386,
576
+ "%50": 387,
577
+ "[Ru+6]": 388,
578
+ "[b-]": 389,
579
+ "[Ho]": 390,
580
+ "[Th+4]": 391,
581
+ "[Ru+4]": 392,
582
+ "%52": 393,
583
+ "[14CH]": 394,
584
+ "%51": 395,
585
+ "[Cr+6]": 396,
586
+ "[18OH]": 397,
587
+ "[Ho+3]": 398,
588
+ "[Ce+4]": 399,
589
+ "[Bi+2]": 400,
590
+ "[Co+]": 401,
591
+ "%53": 402,
592
+ "[Yb+2]": 403,
593
+ "[Fe+6]": 404,
594
+ "[Be]": 405,
595
+ "%54": 406,
596
+ "[SH3+]": 407,
597
+ "[Np]": 408,
598
+ "[As-]": 409,
599
+ "%55": 410,
600
+ "[14C@@H]": 411,
601
+ "[Ir+2]": 412,
602
+ "[GaH3]": 413,
603
+ "[p-]": 414,
604
+ "[GeH4]": 415,
605
+ "[Sn+3]": 416,
606
+ "[Os+4]": 417,
607
+ "%56": 418,
608
+ "[14C@H]": 419,
609
+ "[sH+]": 420,
610
+ "[19F]": 421,
611
+ "[Eu+2]": 422,
612
+ "[TlH]": 423,
613
+ "%57": 424,
614
+ "[Cr+4]": 425,
615
+ "%58": 426,
616
+ "[B@@-]": 427,
617
+ "[SiH+]": 428,
618
+ "[At]": 429,
619
+ "[Am]": 430,
620
+ "[Fe+5]": 431,
621
+ "[AsH2]": 432,
622
+ "[Si+4]": 433,
623
+ "[B@-]": 434,
624
+ "[Pu]": 435,
625
+ "[SbH]": 436,
626
+ "[P-2]": 437,
627
+ "[Tm+3]": 438,
628
+ "*": 439,
629
+ "%59": 440,
630
+ "[se+]": 441,
631
+ "%60": 442,
632
+ "[oH+]": 443,
633
+ "[1H]": 444,
634
+ "[15N+]": 445,
635
+ "[124I]": 446,
636
+ "[S@@+]": 447,
637
+ "[P-3]": 448,
638
+ "[H]": 449,
639
+ "[IH2+]": 450,
640
+ "[TeH]": 451,
641
+ "[Xe]": 452,
642
+ "[PH4+]": 453,
643
+ "[Cr+]": 454,
644
+ "[Cm]": 455,
645
+ "[I+3]": 456,
646
+ "%61": 457,
647
+ "[Nb+2]": 458,
648
+ "[Ru+5]": 459,
649
+ "%62": 460,
650
+ "[Ta+2]": 461,
651
+ "[Tc+4]": 462,
652
+ "[CH3+]": 463,
653
+ "[Pm]": 464,
654
+ "[Si@H]": 465,
655
+ "[No]": 466,
656
+ "%63": 467,
657
+ "[Cr+5]": 468,
658
+ "[Th+2]": 469,
659
+ "[Zn-2]": 470,
660
+ "[13C@]": 471,
661
+ "[Lr]": 472,
662
+ "%64": 473,
663
+ "[99Tc+3]": 474,
664
+ "%65": 475,
665
+ "[13C@@]": 476,
666
+ "%66": 477,
667
+ "[Fe-]": 478,
668
+ "[17O]": 479,
669
+ "[siH]": 480,
670
+ "[Sb+]": 481,
671
+ "[OH]": 482,
672
+ "[IH]": 483,
673
+ "[11CH2]": 484,
674
+ "[Cf]": 485,
675
+ "[SiH2+]": 486,
676
+ "[Gd+2]": 487,
677
+ "[In+]": 488,
678
+ "[Si@@H]": 489,
679
+ "[Mn+]": 490,
680
+ "[99Tc+4]": 491,
681
+ "[Ga-]": 492,
682
+ "%67": 493,
683
+ "[S@+]": 494,
684
+ "[Ge+4]": 495,
685
+ "[Tl+3]": 496,
686
+ "[16OH]": 497,
687
+ "%68": 498,
688
+ "[2H-]": 499,
689
+ "[Ra]": 500,
690
+ "[si-]": 501,
691
+ "[NiH2]": 502,
692
+ "[P@@H]": 503,
693
+ "[Rh+]": 504,
694
+ "[12C]": 505,
695
+ "[35S]": 506,
696
+ "[32P]": 507,
697
+ "[SiH2-]": 508,
698
+ "[AlH2+]": 509,
699
+ "[16O]": 510,
700
+ "%69": 511,
701
+ "[BiH]": 512,
702
+ "[BiH2]": 513,
703
+ "[Zn-]": 514,
704
+ "[BH]": 515,
705
+ "[Tc+3]": 516,
706
+ "[Ir+]": 517,
707
+ "[Ni+]": 518,
708
+ "%70": 519,
709
+ "[InH2]": 520,
710
+ "[InH]": 521,
711
+ "[Nb+3]": 522,
712
+ "[PbH]": 523,
713
+ "[Bi+]": 524,
714
+ "%71": 525,
715
+ "[As+3]": 526,
716
+ "%72": 527,
717
+ "[18O-]": 528,
718
+ "[68Ga+3]": 529,
719
+ "%73": 530,
720
+ "[Pa]": 531,
721
+ "[76Br]": 532,
722
+ "[Tc+5]": 533,
723
+ "[pH+]": 534,
724
+ "[64Cu+2]": 535,
725
+ "[Ru+8]": 536,
726
+ "%74": 537,
727
+ "[PH2-]": 538,
728
+ "[Si+2]": 539,
729
+ "[17OH]": 540,
730
+ "[RuH]": 541,
731
+ "[111In+3]": 542,
732
+ "[AlH+]": 543,
733
+ "%75": 544,
734
+ "%76": 545,
735
+ "[W+]": 546,
736
+ "[SbH2]": 547,
737
+ "[PoH]": 548,
738
+ "[Ru-]": 549,
739
+ "[XeH]": 550,
740
+ "[Tc+2]": 551,
741
+ "[13C-]": 552,
742
+ "[Br+]": 553,
743
+ "[Pt-2]": 554,
744
+ "[Es]": 555,
745
+ "[Cu-]": 556,
746
+ "[Mg+]": 557,
747
+ "[3HH]": 558,
748
+ "[P@H]": 559,
749
+ "[ClH2+]": 560,
750
+ "%77": 561,
751
+ "[SH]": 562,
752
+ "[Au-]": 563,
753
+ "[2HH]": 564,
754
+ "%78": 565,
755
+ "[Sn-]": 566,
756
+ "[11CH]": 567,
757
+ "[PdH2]": 568,
758
+ "0": 569,
759
+ "[Os+6]": 570,
760
+ "%79": 571,
761
+ "[Mo+]": 572,
762
+ "%80": 573,
763
+ "[al]": 574,
764
+ "[PbH2]": 575,
765
+ "[64Cu]": 576,
766
+ "[Cl]": 577,
767
+ "[12CH3]": 578,
768
+ "%81": 579,
769
+ "[Tc+7]": 580,
770
+ "[11c]": 581,
771
+ "%82": 582,
772
+ "[Li-]": 583,
773
+ "[99Tc+5]": 584,
774
+ "[He]": 585,
775
+ "[12c]": 586,
776
+ "[Kr]": 587,
777
+ "[RuH+2]": 588,
778
+ "[35Cl]": 589,
779
+ "[Pd-2]": 590,
780
+ "[GaH2]": 591,
781
+ "[4H]": 592,
782
+ "[Sg]": 593,
783
+ "[Cu-2]": 594,
784
+ "[Br+3]": 595,
785
+ "%83": 596,
786
+ "[37Cl]": 597,
787
+ "[211At]": 598,
788
+ "[IrH+2]": 599,
789
+ "[Mt]": 600,
790
+ "[Ir-2]": 601,
791
+ "[In-]": 602,
792
+ "[12cH]": 603,
793
+ "[12CH2]": 604,
794
+ "[RuH2]": 605,
795
+ "[99Tc+7]": 606,
796
+ "%84": 607,
797
+ "[15n+]": 608,
798
+ "[ClH2+2]": 609,
799
+ "[16N]": 610,
800
+ "[111In]": 611,
801
+ "[Tc+]": 612,
802
+ "[Ru-2]": 613,
803
+ "[12CH]": 614,
804
+ "[si+]": 615,
805
+ "[Tc+6]": 616,
806
+ "%85": 617,
807
+ "%86": 618,
808
+ "[90Y]": 619,
809
+ "[Pd-]": 620,
810
+ "[188Re]": 621,
811
+ "[RuH+]": 622,
812
+ "[NiH]": 623,
813
+ "[SiH3-]": 624,
814
+ "[14n]": 625,
815
+ "[CH3]": 626,
816
+ "[14N]": 627,
817
+ "[10BH2]": 628,
818
+ "%88": 629,
819
+ "%89": 630,
820
+ "%90": 631,
821
+ "[34S]": 632,
822
+ "[77Br]": 633,
823
+ "[GaH]": 634,
824
+ "[Br]": 635,
825
+ "[Ge@]": 636,
826
+ "[B@@H-]": 637,
827
+ "[CuH]": 638,
828
+ "[SiH4]": 639,
829
+ "[3H-]": 640,
830
+ "%87": 641,
831
+ "%91": 642,
832
+ "%92": 643,
833
+ "[67Cu]": 644,
834
+ "[I]": 645,
835
+ "[177Lu]": 646,
836
+ "[ReH]": 647,
837
+ "[67Ga+3]": 648,
838
+ "[Db]": 649,
839
+ "[177Lu+3]": 650,
840
+ "[AlH2-]": 651,
841
+ "[Si+3]": 652,
842
+ "[Ti-2]": 653,
843
+ "[RuH+3]": 654,
844
+ "[al+]": 655,
845
+ "[68Ga]": 656,
846
+ "[2H+]": 657,
847
+ "[B@H-]": 658,
848
+ "[WH2]": 659,
849
+ "[OsH]": 660,
850
+ "[Ir-3]": 661,
851
+ "[AlH-]": 662,
852
+ "[Bk]": 663,
853
+ "[75Se]": 664,
854
+ "[14C@]": 665,
855
+ "[Pt-]": 666,
856
+ "[N@@H+]": 667,
857
+ "[Nb-]": 668,
858
+ "[13NH2]": 669,
859
+ "%93": 670,
860
+ "[186Re]": 671,
861
+ "[Tb+4]": 672,
862
+ "[PtH]": 673,
863
+ "[IrH2]": 674,
864
+ "[Hg-2]": 675,
865
+ "[AlH3-]": 676,
866
+ "[PdH+]": 677,
867
+ "[Md]": 678,
868
+ "[RhH+2]": 679,
869
+ "[11cH]": 680,
870
+ "[Co-2]": 681,
871
+ "[15N-]": 682,
872
+ "[ZrH2]": 683,
873
+ "%94": 684,
874
+ "[Hg-]": 685,
875
+ "[127I]": 686,
876
+ "[AsH2+]": 687,
877
+ "[MoH2]": 688,
878
+ "[Te+4]": 689,
879
+ "[14C@@]": 690,
880
+ "[As+5]": 691,
881
+ "[SnH+3]": 692,
882
+ "[Ge@@]": 693,
883
+ "[6Li+]": 694,
884
+ "[WH]": 695,
885
+ "[Ne]": 696,
886
+ "[14NH2]": 697,
887
+ "[14NH]": 698,
888
+ "[12C@@H]": 699,
889
+ "[Os+7]": 700,
890
+ "[RhH]": 701,
891
+ "[Al-3]": 702,
892
+ "[SnH+]": 703,
893
+ "[15NH3+]": 704,
894
+ "[Zr+]": 705,
895
+ "[197Hg+]": 706,
896
+ "%95": 707,
897
+ "%96": 708,
898
+ "[90Y+3]": 709,
899
+ "[Os-2]": 710,
900
+ "[98Tc+5]": 711,
901
+ "[15NH3]": 712,
902
+ "[bH-]": 713,
903
+ "[33P]": 714,
904
+ "[Zr-2]": 715,
905
+ "[15O]": 716,
906
+ "[Rh-]": 717,
907
+ "[PbH3]": 718,
908
+ "[PH2]": 719,
909
+ "[Ni-]": 720,
910
+ "[CuH+]": 721,
911
+ "%97": 722,
912
+ "%98": 723,
913
+ "%99": 724,
914
+ "[Os+5]": 725,
915
+ "[PtH+]": 726,
916
+ "[ReH4]": 727,
917
+ "[16NH]": 728,
918
+ "[82Br]": 729,
919
+ "[W-]": 730,
920
+ "[18F-]": 731,
921
+ "[15NH4+]": 732,
922
+ "[Se+4]": 733,
923
+ "[SeH-]": 734,
924
+ "[67Cu+2]": 735,
925
+ "[12C@H]": 736,
926
+ "[AsH3]": 737,
927
+ "[HgH]": 738,
928
+ "[10B-]": 739,
929
+ "[99Tc+6]": 740,
930
+ "[117Sn+4]": 741,
931
+ "[Te@]": 742,
932
+ "[P@+]": 743,
933
+ "[35SH]": 744,
934
+ "[SeH+]": 745,
935
+ "[Ni-2]": 746,
936
+ "[Al-2]": 747,
937
+ "[TeH2]": 748,
938
+ "[Bh]": 749,
939
+ "[99Tc+2]": 750,
940
+ "[Os+8]": 751,
941
+ "[PH-2]": 752,
942
+ "[7Li+]": 753,
943
+ "[14nH]": 754,
944
+ "[AlH+2]": 755,
945
+ "[18FH]": 756,
946
+ "[SnH4]": 757,
947
+ "[18O-2]": 758,
948
+ "[IrH]": 759,
949
+ "[13N]": 760,
950
+ "[Te@@]": 761,
951
+ "[Rh-3]": 762,
952
+ "[15NH+]": 763,
953
+ "[AsH3+]": 764,
954
+ "[SeH2]": 765,
955
+ "[AsH+]": 766,
956
+ "[CoH2]": 767,
957
+ "[16NH2]": 768,
958
+ "[AsH-]": 769,
959
+ "[203Hg+]": 770,
960
+ "[P@@+]": 771,
961
+ "[166Ho+3]": 772,
962
+ "[60Co+3]": 773,
963
+ "[13CH2-]": 774,
964
+ "[SeH2+]": 775,
965
+ "[75Br]": 776,
966
+ "[TlH2]": 777,
967
+ "[80Br]": 778,
968
+ "[siH+]": 779,
969
+ "[Ca+]": 780,
970
+ "[153Sm+3]": 781,
971
+ "[PdH]": 782,
972
+ "[225Ac]": 783,
973
+ "[13CH3-]": 784,
974
+ "[AlH4-]": 785,
975
+ "[FeH]": 786,
976
+ "[13CH-]": 787,
977
+ "[14C-]": 788,
978
+ "[11C-]": 789,
979
+ "[153Sm]": 790,
980
+ "[Re-]": 791,
981
+ "[te+]": 792,
982
+ "[13CH4]": 793,
983
+ "[ClH+2]": 794,
984
+ "[8CH2]": 795,
985
+ "[99Mo]": 796,
986
+ "[ClH3+3]": 797,
987
+ "[SbH3]": 798,
988
+ "[25Mg+2]": 799,
989
+ "[16N+]": 800,
990
+ "[SnH2+]": 801,
991
+ "[11C@H]": 802,
992
+ "[122I]": 803,
993
+ "[Re-2]": 804,
994
+ "[RuH2+2]": 805,
995
+ "[ZrH]": 806,
996
+ "[Bi-]": 807,
997
+ "[Pr+]": 808,
998
+ "[Rn]": 809,
999
+ "[Fr]": 810,
1000
+ "[36Cl]": 811,
1001
+ "[18o]": 812,
1002
+ "[YH]": 813,
1003
+ "[79Br]": 814,
1004
+ "[121I]": 815,
1005
+ "[113In+3]": 816,
1006
+ "[TaH]": 817,
1007
+ "[RhH2]": 818,
1008
+ "[Ta-]": 819,
1009
+ "[67Ga]": 820,
1010
+ "[ZnH+]": 821,
1011
+ "[SnH2-]": 822,
1012
+ "[OsH2]": 823,
1013
+ "[16F]": 824,
1014
+ "[FeH2]": 825,
1015
+ "[14O]": 826,
1016
+ "[PbH2+2]": 827,
1017
+ "[BH2]": 828,
1018
+ "[6H]": 829,
1019
+ "[125Te]": 830,
1020
+ "[197Hg]": 831,
1021
+ "[TaH2]": 832,
1022
+ "[TaH3]": 833,
1023
+ "[76As]": 834,
1024
+ "[Nb-2]": 835,
1025
+ "[14N+]": 836,
1026
+ "[125I-]": 837,
1027
+ "[33S]": 838,
1028
+ "[IH2+2]": 839,
1029
+ "[NH2]": 840,
1030
+ "[PtH2]": 841,
1031
+ "[MnH]": 842,
1032
+ "[19C]": 843,
1033
+ "[17F]": 844,
1034
+ "[1H-]": 845,
1035
+ "[SnH4+2]": 846,
1036
+ "[Mn-2]": 847,
1037
+ "[15NH2+]": 848,
1038
+ "[TiH2]": 849,
1039
+ "[ReH7]": 850,
1040
+ "[Cd-2]": 851,
1041
+ "[Fe-3]": 852,
1042
+ "[SH2]": 853,
1043
+ "[17O-]": 854,
1044
+ "[siH-]": 855,
1045
+ "[CoH+]": 856,
1046
+ "[VH]": 857,
1047
+ "[10BH]": 858,
1048
+ "[Ru-3]": 859,
1049
+ "[13O]": 860,
1050
+ "[5H]": 861,
1051
+ "[15n-]": 862,
1052
+ "[153Gd]": 863,
1053
+ "[12C@]": 864,
1054
+ "[11CH3-]": 865,
1055
+ "[IrH3]": 866,
1056
+ "[RuH3]": 867,
1057
+ "[74Se]": 868,
1058
+ "[Se@]": 869,
1059
+ "[Hf+]": 870,
1060
+ "[77Se]": 871,
1061
+ "[166Ho]": 872,
1062
+ "[59Fe+2]": 873,
1063
+ "[203Hg]": 874,
1064
+ "[18OH-]": 875,
1065
+ "[8CH]": 876,
1066
+ "[12C@@]": 877,
1067
+ "[11CH4]": 878,
1068
+ "[15C]": 879,
1069
+ "[249Cf]": 880,
1070
+ "[PbH4]": 881,
1071
+ "[64Zn]": 882,
1072
+ "[99Tc+]": 883,
1073
+ "[14c-]": 884,
1074
+ "[149Pm]": 885,
1075
+ "[IrH4]": 886,
1076
+ "[Se@@]": 887,
1077
+ "[13OH]": 888,
1078
+ "[14CH3-]": 889,
1079
+ "[28Si]": 890,
1080
+ "[Rh-2]": 891,
1081
+ "[Fe-2]": 892,
1082
+ "[131I-]": 893,
1083
+ "[51Cr]": 894,
1084
+ "[62Cu+2]": 895,
1085
+ "[81Br]": 896,
1086
+ "[121Sb]": 897,
1087
+ "[7Li]": 898,
1088
+ "[89Zr+4]": 899,
1089
+ "[SbH3+]": 900,
1090
+ "[11C@@H]": 901,
1091
+ "[98Tc]": 902,
1092
+ "[59Fe+3]": 903,
1093
+ "[BiH2+]": 904,
1094
+ "[SbH+]": 905,
1095
+ "[TiH]": 906,
1096
+ "[14NH3]": 907,
1097
+ "[15OH]": 908,
1098
+ "[119Sn]": 909,
1099
+ "[201Hg]": 910,
1100
+ "[MnH+]": 911,
1101
+ "[201Tl]": 912,
1102
+ "[51Cr+3]": 913,
1103
+ "[123I-]": 914,
1104
+ "[MoH]": 915,
1105
+ "[AlH6-3]": 916,
1106
+ "[MnH2]": 917,
1107
+ "[WH3]": 918,
1108
+ "[213Bi+3]": 919,
1109
+ "[SnH2+2]": 920,
1110
+ "[123IH]": 921,
1111
+ "[13CH+]": 922,
1112
+ "[Zr-]": 923,
1113
+ "[74As]": 924,
1114
+ "[13C+]": 925,
1115
+ "[32P+]": 926,
1116
+ "[KrH]": 927,
1117
+ "[SiH+2]": 928,
1118
+ "[ClH3+2]": 929,
1119
+ "[13NH]": 930,
1120
+ "[9CH2]": 931,
1121
+ "[ZrH2+2]": 932,
1122
+ "[87Sr+2]": 933,
1123
+ "[35s]": 934,
1124
+ "[239Pu]": 935,
1125
+ "[198Au]": 936,
1126
+ "[241Am]": 937,
1127
+ "[203Hg+2]": 938,
1128
+ "[V+]": 939,
1129
+ "[YH2]": 940,
1130
+ "[195Pt]": 941,
1131
+ "[203Pb]": 942,
1132
+ "[RuH4]": 943,
1133
+ "[ThH2]": 944,
1134
+ "[AuH]": 945,
1135
+ "[66Ga+3]": 946,
1136
+ "[11B-]": 947,
1137
+ "[F]": 948,
1138
+ "[24Na+]": 949,
1139
+ "[85Sr+2]": 950,
1140
+ "[201Tl+]": 951,
1141
+ "[14CH4]": 952,
1142
+ "[32S]": 953,
1143
+ "[TeH2+]": 954,
1144
+ "[ClH2+3]": 955,
1145
+ "[AgH]": 956,
1146
+ "[Ge@H]": 957,
1147
+ "[44Ca+2]": 958,
1148
+ "[Os-]": 959,
1149
+ "[31P]": 960,
1150
+ "[15nH+]": 961,
1151
+ "[SbH4]": 962,
1152
+ "[TiH+]": 963,
1153
+ "[Ba+]": 964,
1154
+ "[57Co+2]": 965,
1155
+ "[Ta+]": 966,
1156
+ "[125IH]": 967,
1157
+ "[77As]": 968,
1158
+ "[129I]": 969,
1159
+ "[Fe-4]": 970,
1160
+ "[Ta-2]": 971,
1161
+ "[19O]": 972,
1162
+ "[12O]": 973,
1163
+ "[BiH3]": 974,
1164
+ "[237Np]": 975,
1165
+ "[252Cf]": 976,
1166
+ "[86Y]": 977,
1167
+ "[Cr-2]": 978,
1168
+ "[89Y]": 979,
1169
+ "[195Pt+2]": 980,
1170
+ "[si+2]": 981,
1171
+ "[58Fe+2]": 982,
1172
+ "[Hs]": 983,
1173
+ "[S@@H]": 984,
1174
+ "[8CH4]": 985,
1175
+ "[164Dy+3]": 986,
1176
+ "[47Ca+2]": 987,
1177
+ "[57Co]": 988,
1178
+ "[NbH2]": 989,
1179
+ "[ReH2]": 990,
1180
+ "[ZnH2]": 991,
1181
+ "[CrH2]": 992,
1182
+ "[17NH]": 993,
1183
+ "[ZrH3]": 994,
1184
+ "[RhH3]": 995,
1185
+ "[12C-]": 996,
1186
+ "[18O+]": 997,
1187
+ "[Bi-2]": 998,
1188
+ "[ClH4+3]": 999,
1189
+ "[Ni-3]": 1000,
1190
+ "[Ag-]": 1001,
1191
+ "[111In-]": 1002,
1192
+ "[Mo-2]": 1003,
1193
+ "[55Fe+3]": 1004,
1194
+ "[204Hg+]": 1005,
1195
+ "[35Cl-]": 1006,
1196
+ "[211Pb]": 1007,
1197
+ "[75Ge]": 1008,
1198
+ "[8B]": 1009,
1199
+ "[TeH3]": 1010,
1200
+ "[SnH3+]": 1011,
1201
+ "[Zr-3]": 1012,
1202
+ "[28F]": 1013,
1203
+ "[249Bk]": 1014,
1204
+ "[169Yb]": 1015,
1205
+ "[34SH]": 1016,
1206
+ "[6Li]": 1017,
1207
+ "[94Tc]": 1018,
1208
+ "[197Au]": 1019,
1209
+ "[195Pt+4]": 1020,
1210
+ "[169Yb+3]": 1021,
1211
+ "[32Cl]": 1022,
1212
+ "[82Se]": 1023,
1213
+ "[159Gd+3]": 1024,
1214
+ "[213Bi]": 1025,
1215
+ "[CoH+2]": 1026,
1216
+ "[36S]": 1027,
1217
+ "[35P]": 1028,
1218
+ "[Ru-4]": 1029,
1219
+ "[Cr-3]": 1030,
1220
+ "[60Co]": 1031,
1221
+ "[1H+]": 1032,
1222
+ "[18CH2]": 1033,
1223
+ "[Cd-]": 1034,
1224
+ "[152Sm+3]": 1035,
1225
+ "[106Ru]": 1036,
1226
+ "[238Pu]": 1037,
1227
+ "[220Rn]": 1038,
1228
+ "[45Ca+2]": 1039,
1229
+ "[89Sr+2]": 1040,
1230
+ "[239Np]": 1041,
1231
+ "[90Sr+2]": 1042,
1232
+ "[137Cs+]": 1043,
1233
+ "[165Dy]": 1044,
1234
+ "[68GaH3]": 1045,
1235
+ "[65Zn+2]": 1046,
1236
+ "[89Zr]": 1047,
1237
+ "[BiH2+2]": 1048,
1238
+ "[62Cu]": 1049,
1239
+ "[165Dy+3]": 1050,
1240
+ "[238U]": 1051,
1241
+ "[105Rh+3]": 1052,
1242
+ "[70Zn]": 1053,
1243
+ "[12B]": 1054,
1244
+ "[12OH]": 1055,
1245
+ "[18CH]": 1056,
1246
+ "[17CH]": 1057,
1247
+ "[42K]": 1058,
1248
+ "[76Br-]": 1059,
1249
+ "[71As]": 1060,
1250
+ "[NbH3]": 1061,
1251
+ "[ReH3]": 1062,
1252
+ "[OsH-]": 1063,
1253
+ "[WH4]": 1064,
1254
+ "[MoH3]": 1065,
1255
+ "[OsH4]": 1066,
1256
+ "[RuH6]": 1067,
1257
+ "[PtH3]": 1068,
1258
+ "[CuH2]": 1069,
1259
+ "[CoH3]": 1070,
1260
+ "[TiH4]": 1071,
1261
+ "[64Zn+2]": 1072,
1262
+ "[Si-2]": 1073,
1263
+ "[79BrH]": 1074,
1264
+ "[14CH2-]": 1075,
1265
+ "[PtH2+2]": 1076,
1266
+ "[Os-3]": 1077,
1267
+ "[29Si]": 1078,
1268
+ "[Ti-]": 1079,
1269
+ "[Se+6]": 1080,
1270
+ "[22Na+]": 1081,
1271
+ "[42K+]": 1082,
1272
+ "[131Cs+]": 1083,
1273
+ "[86Rb+]": 1084,
1274
+ "[134Cs+]": 1085,
1275
+ "[209Po]": 1086,
1276
+ "[208Po]": 1087,
1277
+ "[81Rb+]": 1088,
1278
+ "[203Tl+]": 1089,
1279
+ "[Zr-4]": 1090,
1280
+ "[148Sm]": 1091,
1281
+ "[147Sm]": 1092,
1282
+ "[37Cl-]": 1093,
1283
+ "[12CH4]": 1094,
1284
+ "[Ge@@H]": 1095,
1285
+ "[63Cu]": 1096,
1286
+ "[13CH2+]": 1097,
1287
+ "[AsH2-]": 1098,
1288
+ "[CeH]": 1099,
1289
+ "[SnH-]": 1100,
1290
+ "[UH]": 1101,
1291
+ "[9c]": 1102,
1292
+ "[21CH3]": 1103,
1293
+ "[TeH+]": 1104,
1294
+ "[57Co+3]": 1105,
1295
+ "[8BH2]": 1106,
1296
+ "[12BH2]": 1107,
1297
+ "[19BH2]": 1108,
1298
+ "[9BH2]": 1109,
1299
+ "[YbH2]": 1110,
1300
+ "[CrH+2]": 1111,
1301
+ "[208Bi]": 1112,
1302
+ "[152Gd]": 1113,
1303
+ "[61Cu]": 1114,
1304
+ "[115In]": 1115,
1305
+ "[60Co+2]": 1116,
1306
+ "[13NH2-]": 1117,
1307
+ "[120I]": 1118,
1308
+ "[18OH2]": 1119,
1309
+ "[75SeH]": 1120,
1310
+ "[SbH2+]": 1121,
1311
+ "[144Ce]": 1122,
1312
+ "[16n]": 1123,
1313
+ "[113In]": 1124,
1314
+ "[22nH]": 1125,
1315
+ "[129I-]": 1126,
1316
+ "[InH3]": 1127,
1317
+ "[32PH3]": 1128,
1318
+ "[234U]": 1129,
1319
+ "[235U]": 1130,
1320
+ "[59Fe]": 1131,
1321
+ "[82Rb+]": 1132,
1322
+ "[65Zn]": 1133,
1323
+ "[244Cm]": 1134,
1324
+ "[147Pm]": 1135,
1325
+ "[91Y]": 1136,
1326
+ "[237Pu]": 1137,
1327
+ "[231Pa]": 1138,
1328
+ "[253Cf]": 1139,
1329
+ "[127Te]": 1140,
1330
+ "[187Re]": 1141,
1331
+ "[236Np]": 1142,
1332
+ "[235Np]": 1143,
1333
+ "[72Zn]": 1144,
1334
+ "[253Es]": 1145,
1335
+ "[159Dy]": 1146,
1336
+ "[62Zn]": 1147,
1337
+ "[101Tc]": 1148,
1338
+ "[149Tb]": 1149,
1339
+ "[124I-]": 1150,
1340
+ "[SeH3+]": 1151,
1341
+ "[210Pb]": 1152,
1342
+ "[40K]": 1153,
1343
+ "[210Po]": 1154,
1344
+ "[214Pb]": 1155,
1345
+ "[218Po]": 1156,
1346
+ "[214Po]": 1157,
1347
+ "[7Be]": 1158,
1348
+ "[212Pb]": 1159,
1349
+ "[205Pb]": 1160,
1350
+ "[209Pb]": 1161,
1351
+ "[123Te]": 1162,
1352
+ "[202Pb]": 1163,
1353
+ "[72As]": 1164,
1354
+ "[201Pb]": 1165,
1355
+ "[70As]": 1166,
1356
+ "[73Ge]": 1167,
1357
+ "[200Pb]": 1168,
1358
+ "[198Pb]": 1169,
1359
+ "[66Ga]": 1170,
1360
+ "[73Se]": 1171,
1361
+ "[195Pb]": 1172,
1362
+ "[199Pb]": 1173,
1363
+ "[144Ce+3]": 1174,
1364
+ "[235U+2]": 1175,
1365
+ "[90Tc]": 1176,
1366
+ "[114In+3]": 1177,
1367
+ "[128I]": 1178,
1368
+ "[100Tc+]": 1179,
1369
+ "[82Br-]": 1180,
1370
+ "[191Pt+2]": 1181,
1371
+ "[191Pt+4]": 1182,
1372
+ "[193Pt+4]": 1183,
1373
+ "[31PH3]": 1184,
1374
+ "[125I+2]": 1185,
1375
+ "[131I+2]": 1186,
1376
+ "[125Te+4]": 1187,
1377
+ "[82Sr+2]": 1188,
1378
+ "[149Sm]": 1189,
1379
+ "[81BrH]": 1190,
1380
+ "[129Xe]": 1191,
1381
+ "[193Pt+2]": 1192,
1382
+ "[123I+2]": 1193,
1383
+ "[Cr-]": 1194,
1384
+ "[Co-]": 1195,
1385
+ "[227Th+4]": 1196,
1386
+ "[249Cf+3]": 1197,
1387
+ "[252Cf+3]": 1198,
1388
+ "[187Os]": 1199,
1389
+ "[16O-]": 1200,
1390
+ "[17O+]": 1201,
1391
+ "[16OH-]": 1202,
1392
+ "[98Tc+7]": 1203,
1393
+ "[58Co+2]": 1204,
1394
+ "[69Ga+3]": 1205,
1395
+ "[57Fe+2]": 1206,
1396
+ "[43K+]": 1207,
1397
+ "[16C]": 1208,
1398
+ "[52Fe+3]": 1209,
1399
+ "[SeH5]": 1210,
1400
+ "[194Pb]": 1211,
1401
+ "[196Pb]": 1212,
1402
+ "[197Pb]": 1213,
1403
+ "[213Pb]": 1214,
1404
+ "[9B]": 1215,
1405
+ "[19B]": 1216,
1406
+ "[11CH-]": 1217,
1407
+ "[9CH]": 1218,
1408
+ "[20OH]": 1219,
1409
+ "[25OH]": 1220,
1410
+ "[8cH]": 1221,
1411
+ "[TiH+3]": 1222,
1412
+ "[SnH6+3]": 1223,
1413
+ "[N@H+]": 1224,
1414
+ "[52Mn+2]": 1225,
1415
+ "[64Ga]": 1226,
1416
+ "[13B]": 1227,
1417
+ "[216Bi]": 1228,
1418
+ "[117Sn+2]": 1229,
1419
+ "[232Th]": 1230,
1420
+ "[SnH+2]": 1231,
1421
+ "[BiH5]": 1232,
1422
+ "[77Kr]": 1233,
1423
+ "[103Cd]": 1234,
1424
+ "[62Ni]": 1235,
1425
+ "[LaH3]": 1236,
1426
+ "[SmH3]": 1237,
1427
+ "[EuH3]": 1238,
1428
+ "[MoH5]": 1239,
1429
+ "[64Ni]": 1240,
1430
+ "[66Zn]": 1241,
1431
+ "[68Zn]": 1242,
1432
+ "[186W]": 1243,
1433
+ "[FeH4]": 1244,
1434
+ "[MoH4]": 1245,
1435
+ "[HgH2]": 1246,
1436
+ "[15NH2-]": 1247,
1437
+ "[UH2]": 1248,
1438
+ "[204Hg]": 1249,
1439
+ "[GaH4-]": 1250,
1440
+ "[ThH4]": 1251,
1441
+ "[WH6]": 1252,
1442
+ "[PtH4]": 1253,
1443
+ "[VH2]": 1254,
1444
+ "[UH3]": 1255,
1445
+ "[FeH3]": 1256,
1446
+ "[RuH5]": 1257,
1447
+ "[BiH4]": 1258,
1448
+ "[80Br-]": 1259,
1449
+ "[CeH3]": 1260,
1450
+ "[37ClH]": 1261,
1451
+ "[157Gd+3]": 1262,
1452
+ "[205Tl]": 1263,
1453
+ "[203Tl]": 1264,
1454
+ "[62Cu+]": 1265,
1455
+ "[64Cu+]": 1266,
1456
+ "[61Cu+]": 1267,
1457
+ "[37SH2]": 1268,
1458
+ "[30Si]": 1269,
1459
+ "[28Al]": 1270,
1460
+ "[19OH2]": 1271,
1461
+ "[8He]": 1272,
1462
+ "[6He]": 1273,
1463
+ "[153Pm]": 1274,
1464
+ "[209Bi]": 1275,
1465
+ "[66Zn+2]": 1276,
1466
+ "[10CH4]": 1277,
1467
+ "[191Ir]": 1278,
1468
+ "[66Cu]": 1279,
1469
+ "[16O+]": 1280,
1470
+ "[25O]": 1281,
1471
+ "[10c]": 1282,
1472
+ "[Co-3]": 1283,
1473
+ "[Sn@@]": 1284,
1474
+ "[17OH-]": 1285,
1475
+ "[206Po]": 1286,
1476
+ "[204Po]": 1287,
1477
+ "[202Po]": 1288,
1478
+ "[201Po]": 1289,
1479
+ "[200Po]": 1290,
1480
+ "[199Po]": 1291,
1481
+ "[198Po]": 1292,
1482
+ "[197Po]": 1293,
1483
+ "[196Po]": 1294,
1484
+ "[195Po]": 1295,
1485
+ "[194Po]": 1296,
1486
+ "[193Po]": 1297,
1487
+ "[192Po]": 1298,
1488
+ "[191Po]": 1299,
1489
+ "[190Po]": 1300,
1490
+ "[217Po]": 1301,
1491
+ "[BiH4-]": 1302,
1492
+ "[TeH4]": 1303,
1493
+ "[222Ra]": 1304,
1494
+ "[62Ga]": 1305,
1495
+ "[39Ar]": 1306,
1496
+ "[144Sm]": 1307,
1497
+ "[58Fe]": 1308,
1498
+ "[153Eu]": 1309,
1499
+ "[85Rb]": 1310,
1500
+ "[171Yb]": 1311,
1501
+ "[172Yb]": 1312,
1502
+ "[114Cd]": 1313,
1503
+ "[51Fe]": 1314,
1504
+ "[142Ce]": 1315,
1505
+ "[207Tl]": 1316,
1506
+ "[92Mo]": 1317,
1507
+ "[115Sn]": 1318,
1508
+ "[140Ce]": 1319,
1509
+ "[202Hg]": 1320,
1510
+ "[180W]": 1321,
1511
+ "[182W]": 1322,
1512
+ "[183W]": 1323,
1513
+ "[184W]": 1324,
1514
+ "[96Mo]": 1325,
1515
+ "[47Ti]": 1326,
1516
+ "[111Cd]": 1327,
1517
+ "[143Nd]": 1328,
1518
+ "[145Nd]": 1329,
1519
+ "[126Te]": 1330,
1520
+ "[128Te]": 1331,
1521
+ "[130Te]": 1332,
1522
+ "[185Re]": 1333,
1523
+ "[97Mo]": 1334,
1524
+ "[98Mo]": 1335,
1525
+ "[183Re]": 1336,
1526
+ "[52V]": 1337,
1527
+ "[80Se]": 1338,
1528
+ "[87Kr]": 1339,
1529
+ "[137Xe]": 1340,
1530
+ "[196Au]": 1341,
1531
+ "[146Ce]": 1342,
1532
+ "[88Kr]": 1343,
1533
+ "[51Ti]": 1344,
1534
+ "[138Xe]": 1345,
1535
+ "[112Cd]": 1346,
1536
+ "[116Sn]": 1347,
1537
+ "[120Sn]": 1348,
1538
+ "[28SiH3]": 1349,
1539
+ "[35S-]": 1350,
1540
+ "[15NH-]": 1351,
1541
+ "[13CH3+]": 1352,
1542
+ "[34S+]": 1353,
1543
+ "[34s]": 1354,
1544
+ "[SiH4-]": 1355,
1545
+ "[100Tc+5]": 1356,
1546
+ "[NiH2+2]": 1357,
1547
+ "[239Th]": 1358,
1548
+ "[186Lu]": 1359,
1549
+ "[AuH3]": 1360,
1550
+ "[I@@-]": 1361,
1551
+ "[XeH2]": 1362,
1552
+ "[B+]": 1363,
1553
+ "[16CH2]": 1364,
1554
+ "[8C]": 1365,
1555
+ "[TaH5]": 1366,
1556
+ "[FeH4-]": 1367,
1557
+ "[19C@H]": 1368,
1558
+ "[10NH]": 1369,
1559
+ "[FeH6-3]": 1370,
1560
+ "[22CH]": 1371,
1561
+ "[25N]": 1372,
1562
+ "[25N+]": 1373,
1563
+ "[25N-]": 1374,
1564
+ "[21CH2]": 1375,
1565
+ "[18cH]": 1376,
1566
+ "[113I]": 1377,
1567
+ "[ScH3]": 1378,
1568
+ "[30PH3]": 1379,
1569
+ "[43Ca+2]": 1380,
1570
+ "[41Ca+2]": 1381,
1571
+ "[106Cd]": 1382,
1572
+ "[122Sn]": 1383,
1573
+ "[18CH3]": 1384,
1574
+ "[58Co+3]": 1385,
1575
+ "[98Tc+4]": 1386,
1576
+ "[70Ge]": 1387,
1577
+ "[76Ge]": 1388,
1578
+ "[108Cd]": 1389,
1579
+ "[116Cd]": 1390,
1580
+ "[130Xe]": 1391,
1581
+ "[94Mo]": 1392,
1582
+ "[124Sn]": 1393,
1583
+ "[186Os]": 1394,
1584
+ "[188Os]": 1395,
1585
+ "[190Os]": 1396,
1586
+ "[192Os]": 1397,
1587
+ "[106Pd]": 1398,
1588
+ "[110Pd]": 1399,
1589
+ "[120Te]": 1400,
1590
+ "[132Ba]": 1401,
1591
+ "[134Ba]": 1402,
1592
+ "[136Ba]": 1403,
1593
+ "[136Ce]": 1404,
1594
+ "[138Ce]": 1405,
1595
+ "[156Dy]": 1406,
1596
+ "[158Dy]": 1407,
1597
+ "[160Dy]": 1408,
1598
+ "[163Dy]": 1409,
1599
+ "[162Er]": 1410,
1600
+ "[164Er]": 1411,
1601
+ "[167Er]": 1412,
1602
+ "[176Hf]": 1413,
1603
+ "[26Mg]": 1414,
1604
+ "[144Nd]": 1415,
1605
+ "[150Nd]": 1416,
1606
+ "[41K]": 1417,
1607
+ "[46Ti]": 1418,
1608
+ "[48Ti]": 1419,
1609
+ "[49Ti]": 1420,
1610
+ "[50Ti]": 1421,
1611
+ "[170Yb]": 1422,
1612
+ "[173Yb]": 1423,
1613
+ "[91Zr]": 1424,
1614
+ "[92Zr]": 1425,
1615
+ "[96Zr]": 1426,
1616
+ "[34S-]": 1427,
1617
+ "[CuH2-]": 1428,
1618
+ "[38Cl]": 1429,
1619
+ "[25Mg]": 1430,
1620
+ "[51V]": 1431,
1621
+ "[93Nb]": 1432,
1622
+ "[95Mo]": 1433,
1623
+ "[45Sc]": 1434,
1624
+ "[123Sb]": 1435,
1625
+ "[139La]": 1436,
1626
+ "[9Be]": 1437,
1627
+ "[99Y+3]": 1438,
1628
+ "[99Y]": 1439,
1629
+ "[156Ho]": 1440,
1630
+ "[67Zn]": 1441,
1631
+ "[144Ce+4]": 1442,
1632
+ "[210Tl]": 1443,
1633
+ "[42Ca]": 1444,
1634
+ "[54Fe]": 1445,
1635
+ "[193Ir]": 1446,
1636
+ "[92Nb]": 1447,
1637
+ "[141Cs]": 1448,
1638
+ "[52Cr]": 1449,
1639
+ "[35ClH]": 1450,
1640
+ "[46Ca]": 1451,
1641
+ "[139Cs]": 1452,
1642
+ "[65Cu]": 1453,
1643
+ "[71Ga]": 1454,
1644
+ "[60Ni]": 1455,
1645
+ "[16NH3]": 1456,
1646
+ "[148Nd]": 1457,
1647
+ "[72Ge]": 1458,
1648
+ "[161Dy]": 1459,
1649
+ "[49Ca]": 1460,
1650
+ "[43Ca]": 1461,
1651
+ "[8Be]": 1462,
1652
+ "[48Ca]": 1463,
1653
+ "[44Ca]": 1464,
1654
+ "[120Xe]": 1465,
1655
+ "[80Rb]": 1466,
1656
+ "[215At]": 1467,
1657
+ "[180Re]": 1468,
1658
+ "[146Sm]": 1469,
1659
+ "[19Ne]": 1470,
1660
+ "[74Kr]": 1471,
1661
+ "[134La]": 1472,
1662
+ "[76Kr]": 1473,
1663
+ "[219Fr]": 1474,
1664
+ "[121Xe]": 1475,
1665
+ "[220Fr]": 1476,
1666
+ "[216At]": 1477,
1667
+ "[223Ac]": 1478,
1668
+ "[218At]": 1479,
1669
+ "[37Ar]": 1480,
1670
+ "[135I]": 1481,
1671
+ "[110Cd]": 1482,
1672
+ "[94Tc+7]": 1483,
1673
+ "[86Y+3]": 1484,
1674
+ "[135I-]": 1485,
1675
+ "[15O-2]": 1486,
1676
+ "[151Eu+3]": 1487,
1677
+ "[161Tb+3]": 1488,
1678
+ "[197Hg+2]": 1489,
1679
+ "[109Cd+2]": 1490,
1680
+ "[191Os+4]": 1491,
1681
+ "[170Tm+3]": 1492,
1682
+ "[205Bi+3]": 1493,
1683
+ "[233U+4]": 1494,
1684
+ "[126Sb+3]": 1495,
1685
+ "[127Sb+3]": 1496,
1686
+ "[132Cs+]": 1497,
1687
+ "[136Eu+3]": 1498,
1688
+ "[136Eu]": 1499,
1689
+ "[125Sn+4]": 1500,
1690
+ "[175Yb+3]": 1501,
1691
+ "[100Mo]": 1502,
1692
+ "[22Ne]": 1503,
1693
+ "[13c-]": 1504,
1694
+ "[13NH4+]": 1505,
1695
+ "[17C]": 1506,
1696
+ "[9C]": 1507,
1697
+ "[31S]": 1508,
1698
+ "[31SH]": 1509,
1699
+ "[133I]": 1510,
1700
+ "[126I]": 1511,
1701
+ "[36SH]": 1512,
1702
+ "[30S]": 1513,
1703
+ "[32SH]": 1514,
1704
+ "[19CH2]": 1515,
1705
+ "[19c]": 1516,
1706
+ "[18c]": 1517,
1707
+ "[15F]": 1518,
1708
+ "[10C]": 1519,
1709
+ "[RuH-]": 1520,
1710
+ "[62Zn+2]": 1521,
1711
+ "[32ClH]": 1522,
1712
+ "[33ClH]": 1523,
1713
+ "[78BrH]": 1524,
1714
+ "[12Li+]": 1525,
1715
+ "[12Li]": 1526,
1716
+ "[233Ra]": 1527,
1717
+ "[68Ge+4]": 1528,
1718
+ "[44Sc+3]": 1529,
1719
+ "[91Y+3]": 1530,
1720
+ "[106Ru+3]": 1531,
1721
+ "[PoH2]": 1532,
1722
+ "[AtH]": 1533,
1723
+ "[55Fe]": 1534,
1724
+ "[233U]": 1535,
1725
+ "[210PoH2]": 1536,
1726
+ "[230Th]": 1537,
1727
+ "[228Th]": 1538,
1728
+ "[222Rn]": 1539,
1729
+ "[35SH2]": 1540,
1730
+ "[227Th]": 1541,
1731
+ "[192Ir]": 1542,
1732
+ "[133Xe]": 1543,
1733
+ "[81Kr]": 1544,
1734
+ "[95Zr]": 1545,
1735
+ "[240Pu]": 1546,
1736
+ "[54Mn]": 1547,
1737
+ "[103Ru]": 1548,
1738
+ "[95Nb]": 1549,
1739
+ "[109Cd]": 1550,
1740
+ "[141Ce]": 1551,
1741
+ "[85Kr]": 1552,
1742
+ "[110Ag]": 1553,
1743
+ "[58Co]": 1554,
1744
+ "[241Pu]": 1555,
1745
+ "[234Th]": 1556,
1746
+ "[140La]": 1557,
1747
+ "[63Ni]": 1558,
1748
+ "[152Eu]": 1559,
1749
+ "[132IH]": 1560,
1750
+ "[226Rn]": 1561,
1751
+ "[154Eu]": 1562,
1752
+ "[36ClH]": 1563,
1753
+ "[228Ac]": 1564,
1754
+ "[155Eu]": 1565,
1755
+ "[106Rh]": 1566,
1756
+ "[243Am]": 1567,
1757
+ "[227Ac]": 1568,
1758
+ "[243Cm]": 1569,
1759
+ "[236U]": 1570,
1760
+ "[144Pr]": 1571,
1761
+ "[232U]": 1572,
1762
+ "[32SH2]": 1573,
1763
+ "[88Y]": 1574,
1764
+ "[82BrH]": 1575,
1765
+ "[135IH]": 1576,
1766
+ "[242Cm]": 1577,
1767
+ "[115Cd]": 1578,
1768
+ "[242Pu]": 1579,
1769
+ "[46Sc]": 1580,
1770
+ "[56Mn]": 1581,
1771
+ "[234Pa]": 1582,
1772
+ "[41Ar]": 1583,
1773
+ "[147Nd]": 1584,
1774
+ "[187W]": 1585,
1775
+ "[151Sm]": 1586,
1776
+ "[59Ni]": 1587,
1777
+ "[233Pa]": 1588,
1778
+ "[52Mn]": 1589,
1779
+ "[94Nb]": 1590,
1780
+ "[219Rn]": 1591,
1781
+ "[236Pu]": 1592,
1782
+ "[13NH3]": 1593,
1783
+ "[93Zr]": 1594,
1784
+ "[51Cr+6]": 1595,
1785
+ "[TlH3]": 1596,
1786
+ "[123Xe]": 1597,
1787
+ "[160Tb]": 1598,
1788
+ "[170Tm]": 1599,
1789
+ "[182Ta]": 1600,
1790
+ "[175Yb]": 1601,
1791
+ "[93Mo]": 1602,
1792
+ "[143Ce]": 1603,
1793
+ "[191Os]": 1604,
1794
+ "[126IH]": 1605,
1795
+ "[48V]": 1606,
1796
+ "[113Cd]": 1607,
1797
+ "[47Sc]": 1608,
1798
+ "[181Hf]": 1609,
1799
+ "[185W]": 1610,
1800
+ "[143Pr]": 1611,
1801
+ "[191Pt]": 1612,
1802
+ "[181W]": 1613,
1803
+ "[33PH3]": 1614,
1804
+ "[97Ru]": 1615,
1805
+ "[97Tc]": 1616,
1806
+ "[111Ag]": 1617,
1807
+ "[169Er]": 1618,
1808
+ "[107Pd]": 1619,
1809
+ "[103Ru+2]": 1620,
1810
+ "[34SH2]": 1621,
1811
+ "[137Ce]": 1622,
1812
+ "[242Am]": 1623,
1813
+ "[117SnH2]": 1624,
1814
+ "[57Ni]": 1625,
1815
+ "[239U]": 1626,
1816
+ "[60Cu]": 1627,
1817
+ "[250Cf]": 1628,
1818
+ "[193Au]": 1629,
1819
+ "[69Zn]": 1630,
1820
+ "[55Co]": 1631,
1821
+ "[139Ce]": 1632,
1822
+ "[127Xe]": 1633,
1823
+ "[159Gd]": 1634,
1824
+ "[56Co]": 1635,
1825
+ "[177Hf]": 1636,
1826
+ "[244Pu]": 1637,
1827
+ "[38ClH]": 1638,
1828
+ "[142Pr]": 1639,
1829
+ "[199Hg]": 1640,
1830
+ "[179Hf]": 1641,
1831
+ "[178Hf]": 1642,
1832
+ "[237U]": 1643,
1833
+ "[156Eu]": 1644,
1834
+ "[157Eu]": 1645,
1835
+ "[105Ru]": 1646,
1836
+ "[171Tm]": 1647,
1837
+ "[199Au]": 1648,
1838
+ "[155Sm]": 1649,
1839
+ "[80BrH]": 1650,
1840
+ "[108Ag]": 1651,
1841
+ "[128IH]": 1652,
1842
+ "[48Sc]": 1653,
1843
+ "[45Ti]": 1654,
1844
+ "[176Lu]": 1655,
1845
+ "[121SnH2]": 1656,
1846
+ "[148Pm]": 1657,
1847
+ "[57Fe]": 1658,
1848
+ "[10BH3]": 1659,
1849
+ "[96Tc]": 1660,
1850
+ "[133IH]": 1661,
1851
+ "[143Pm]": 1662,
1852
+ "[105Rh]": 1663,
1853
+ "[130IH]": 1664,
1854
+ "[134IH]": 1665,
1855
+ "[131IH]": 1666,
1856
+ "[71Zn]": 1667,
1857
+ "[105Ag]": 1668,
1858
+ "[97Zr]": 1669,
1859
+ "[235Pu]": 1670,
1860
+ "[231Th]": 1671,
1861
+ "[109Pd]": 1672,
1862
+ "[93Y]": 1673,
1863
+ "[190Ir]": 1674,
1864
+ "[135Xe]": 1675,
1865
+ "[53Mn]": 1676,
1866
+ "[134Ce]": 1677,
1867
+ "[234Np]": 1678,
1868
+ "[240Am]": 1679,
1869
+ "[246Cf]": 1680,
1870
+ "[240Cm]": 1681,
1871
+ "[241Cm]": 1682,
1872
+ "[226Th]": 1683,
1873
+ "[39ClH]": 1684,
1874
+ "[229Th]": 1685,
1875
+ "[245Cm]": 1686,
1876
+ "[240U]": 1687,
1877
+ "[240Np]": 1688,
1878
+ "[249Cm]": 1689,
1879
+ "[243Pu]": 1690,
1880
+ "[145Pm]": 1691,
1881
+ "[199Pt]": 1692,
1882
+ "[246Bk]": 1693,
1883
+ "[193Pt]": 1694,
1884
+ "[230U]": 1695,
1885
+ "[250Cm]": 1696,
1886
+ "[44Ti]": 1697,
1887
+ "[175Hf]": 1698,
1888
+ "[254Fm]": 1699,
1889
+ "[255Fm]": 1700,
1890
+ "[257Fm]": 1701,
1891
+ "[92Y]": 1702,
1892
+ "[188Ir]": 1703,
1893
+ "[171Lu]": 1704,
1894
+ "[257Md]": 1705,
1895
+ "[247Bk]": 1706,
1896
+ "[121IH]": 1707,
1897
+ "[250Bk]": 1708,
1898
+ "[179Lu]": 1709,
1899
+ "[224Ac]": 1710,
1900
+ "[195Hg]": 1711,
1901
+ "[244Am]": 1712,
1902
+ "[246Pu]": 1713,
1903
+ "[194Au]": 1714,
1904
+ "[252Fm]": 1715,
1905
+ "[173Hf]": 1716,
1906
+ "[246Cm]": 1717,
1907
+ "[135Ce]": 1718,
1908
+ "[49Cr]": 1719,
1909
+ "[248Cf]": 1720,
1910
+ "[247Cm]": 1721,
1911
+ "[248Cm]": 1722,
1912
+ "[174Ta]": 1723,
1913
+ "[176Ta]": 1724,
1914
+ "[154Tb]": 1725,
1915
+ "[172Ta]": 1726,
1916
+ "[177Ta]": 1727,
1917
+ "[175Ta]": 1728,
1918
+ "[180Ta]": 1729,
1919
+ "[158Tb]": 1730,
1920
+ "[115Ag]": 1731,
1921
+ "[189Os]": 1732,
1922
+ "[251Cf]": 1733,
1923
+ "[145Pr]": 1734,
1924
+ "[147Pr]": 1735,
1925
+ "[76BrH]": 1736,
1926
+ "[102Rh]": 1737,
1927
+ "[238Np]": 1738,
1928
+ "[185Os]": 1739,
1929
+ "[246Am]": 1740,
1930
+ "[233Np]": 1741,
1931
+ "[166Dy]": 1742,
1932
+ "[254Es]": 1743,
1933
+ "[244Cf]": 1744,
1934
+ "[193Os]": 1745,
1935
+ "[245Am]": 1746,
1936
+ "[245Bk]": 1747,
1937
+ "[239Am]": 1748,
1938
+ "[238Am]": 1749,
1939
+ "[97Nb]": 1750,
1940
+ "[245Pu]": 1751,
1941
+ "[254Cf]": 1752,
1942
+ "[188W]": 1753,
1943
+ "[250Es]": 1754,
1944
+ "[251Es]": 1755,
1945
+ "[237Am]": 1756,
1946
+ "[182Hf]": 1757,
1947
+ "[258Md]": 1758,
1948
+ "[232Np]": 1759,
1949
+ "[238Cm]": 1760,
1950
+ "[60Fe]": 1761,
1951
+ "[109Pd+2]": 1762,
1952
+ "[234Pu]": 1763,
1953
+ "[141Ce+3]": 1764,
1954
+ "[136Nd]": 1765,
1955
+ "[136Pr]": 1766,
1956
+ "[173Ta]": 1767,
1957
+ "[110Ru]": 1768,
1958
+ "[147Tb]": 1769,
1959
+ "[253Fm]": 1770,
1960
+ "[139Nd]": 1771,
1961
+ "[178Re]": 1772,
1962
+ "[177Re]": 1773,
1963
+ "[200Au]": 1774,
1964
+ "[182Re]": 1775,
1965
+ "[156Tb]": 1776,
1966
+ "[155Tb]": 1777,
1967
+ "[157Tb]": 1778,
1968
+ "[161Tb]": 1779,
1969
+ "[161Ho]": 1780,
1970
+ "[167Tm]": 1781,
1971
+ "[173Lu]": 1782,
1972
+ "[179Ta]": 1783,
1973
+ "[171Er]": 1784,
1974
+ "[44Sc]": 1785,
1975
+ "[49Sc]": 1786,
1976
+ "[49V]": 1787,
1977
+ "[51Mn]": 1788,
1978
+ "[90Nb]": 1789,
1979
+ "[88Nb]": 1790,
1980
+ "[88Zr]": 1791,
1981
+ "[36SH2]": 1792,
1982
+ "[174Yb]": 1793,
1983
+ "[178Lu]": 1794,
1984
+ "[179W]": 1795,
1985
+ "[83BrH]": 1796,
1986
+ "[107Cd]": 1797,
1987
+ "[75BrH]": 1798,
1988
+ "[62Co]": 1799,
1989
+ "[48Cr]": 1800,
1990
+ "[63Zn]": 1801,
1991
+ "[102Ag]": 1802,
1992
+ "[154Sm]": 1803,
1993
+ "[168Er]": 1804,
1994
+ "[65Ni]": 1805,
1995
+ "[137La]": 1806,
1996
+ "[187Ir]": 1807,
1997
+ "[144Pm]": 1808,
1998
+ "[146Pm]": 1809,
1999
+ "[160Gd]": 1810,
2000
+ "[166Yb]": 1811,
2001
+ "[162Dy]": 1812,
2002
+ "[47V]": 1813,
2003
+ "[141Nd]": 1814,
2004
+ "[141Sm]": 1815,
2005
+ "[166Er]": 1816,
2006
+ "[150Sm]": 1817,
2007
+ "[146Eu]": 1818,
2008
+ "[149Eu]": 1819,
2009
+ "[174Lu]": 1820,
2010
+ "[17NH3]": 1821,
2011
+ "[102Ru]": 1822,
2012
+ "[170Hf]": 1823,
2013
+ "[188Pt]": 1824,
2014
+ "[61Ni]": 1825,
2015
+ "[56Ni]": 1826,
2016
+ "[149Gd]": 1827,
2017
+ "[151Gd]": 1828,
2018
+ "[141Pm]": 1829,
2019
+ "[147Gd]": 1830,
2020
+ "[146Gd]": 1831,
2021
+ "[161Er]": 1832,
2022
+ "[103Ag]": 1833,
2023
+ "[145Eu]": 1834,
2024
+ "[153Tb]": 1835,
2025
+ "[155Dy]": 1836,
2026
+ "[184Re]": 1837,
2027
+ "[180Os]": 1838,
2028
+ "[182Os]": 1839,
2029
+ "[186Pt]": 1840,
2030
+ "[181Os]": 1841,
2031
+ "[181Re]": 1842,
2032
+ "[151Tb]": 1843,
2033
+ "[178Ta]": 1844,
2034
+ "[178W]": 1845,
2035
+ "[189Pt]": 1846,
2036
+ "[194Hg]": 1847,
2037
+ "[145Sm]": 1848,
2038
+ "[150Tb]": 1849,
2039
+ "[132La]": 1850,
2040
+ "[158Gd]": 1851,
2041
+ "[104Ag]": 1852,
2042
+ "[193Hg]": 1853,
2043
+ "[94Ru]": 1854,
2044
+ "[137Pr]": 1855,
2045
+ "[155Ho]": 1856,
2046
+ "[117Cd]": 1857,
2047
+ "[99Ru]": 1858,
2048
+ "[146Nd]": 1859,
2049
+ "[218Rn]": 1860,
2050
+ "[95Y]": 1861,
2051
+ "[79Kr]": 1862,
2052
+ "[120IH]": 1863,
2053
+ "[138Pr]": 1864,
2054
+ "[100Pd]": 1865,
2055
+ "[166Tm]": 1866,
2056
+ "[90Mo]": 1867,
2057
+ "[151Nd]": 1868,
2058
+ "[231U]": 1869,
2059
+ "[138Nd]": 1870,
2060
+ "[89Nb]": 1871,
2061
+ "[98Nb]": 1872,
2062
+ "[162Ho]": 1873,
2063
+ "[142Sm]": 1874,
2064
+ "[186Ta]": 1875,
2065
+ "[104Tc]": 1876,
2066
+ "[184Ta]": 1877,
2067
+ "[185Ta]": 1878,
2068
+ "[170Er]": 1879,
2069
+ "[107Rh]": 1880,
2070
+ "[131La]": 1881,
2071
+ "[169Lu]": 1882,
2072
+ "[74BrH]": 1883,
2073
+ "[150Pm]": 1884,
2074
+ "[172Tm]": 1885,
2075
+ "[197Pt]": 1886,
2076
+ "[230Pu]": 1887,
2077
+ "[170Lu]": 1888,
2078
+ "[86Zr]": 1889,
2079
+ "[176W]": 1890,
2080
+ "[177W]": 1891,
2081
+ "[101Pd]": 1892,
2082
+ "[105Pd]": 1893,
2083
+ "[108Pd]": 1894,
2084
+ "[149Nd]": 1895,
2085
+ "[164Ho]": 1896,
2086
+ "[159Ho]": 1897,
2087
+ "[167Ho]": 1898,
2088
+ "[176Yb]": 1899,
2089
+ "[156Sm]": 1900,
2090
+ "[77BrH]": 1901,
2091
+ "[189Re]": 1902,
2092
+ "[99Rh]": 1903,
2093
+ "[100Rh]": 1904,
2094
+ "[151Pm]": 1905,
2095
+ "[232Pa]": 1906,
2096
+ "[228Pa]": 1907,
2097
+ "[230Pa]": 1908,
2098
+ "[66Ni]": 1909,
2099
+ "[194Os]": 1910,
2100
+ "[135La]": 1911,
2101
+ "[138La]": 1912,
2102
+ "[141La]": 1913,
2103
+ "[142La]": 1914,
2104
+ "[195Ir]": 1915,
2105
+ "[96Nb]": 1916,
2106
+ "[157Ho]": 1917,
2107
+ "[183Hf]": 1918,
2108
+ "[162Tm]": 1919,
2109
+ "[172Er]": 1920,
2110
+ "[148Eu]": 1921,
2111
+ "[150Eu]": 1922,
2112
+ "[15CH4]": 1923,
2113
+ "[89Kr]": 1924,
2114
+ "[143La]": 1925,
2115
+ "[58Ni]": 1926,
2116
+ "[61Co]": 1927,
2117
+ "[158Eu]": 1928,
2118
+ "[165Er]": 1929,
2119
+ "[167Yb]": 1930,
2120
+ "[173Tm]": 1931,
2121
+ "[175Tm]": 1932,
2122
+ "[172Hf]": 1933,
2123
+ "[172Lu]": 1934,
2124
+ "[93Tc]": 1935,
2125
+ "[177Yb]": 1936,
2126
+ "[124IH]": 1937,
2127
+ "[194Ir]": 1938,
2128
+ "[147Eu]": 1939,
2129
+ "[101Mo]": 1940,
2130
+ "[180Hf]": 1941,
2131
+ "[189Ir]": 1942,
2132
+ "[87Y]": 1943,
2133
+ "[43Sc]": 1944,
2134
+ "[195Au]": 1945,
2135
+ "[112Ag]": 1946,
2136
+ "[84BrH]": 1947,
2137
+ "[106Ag]": 1948,
2138
+ "[109Ag]": 1949,
2139
+ "[101Rh]": 1950,
2140
+ "[162Yb]": 1951,
2141
+ "[228Rn]": 1952,
2142
+ "[139Pr]": 1953,
2143
+ "[94Y]": 1954,
2144
+ "[201Au]": 1955,
2145
+ "[40PH3]": 1956,
2146
+ "[110Ag+]": 1957,
2147
+ "[104Cd]": 1958,
2148
+ "[133Ba+2]": 1959,
2149
+ "[226Ac]": 1960,
2150
+ "[145Gd]": 1961,
2151
+ "[186Ir]": 1962,
2152
+ "[184Ir]": 1963,
2153
+ "[224Rn]": 1964,
2154
+ "[185Ir]": 1965,
2155
+ "[182Ir]": 1966,
2156
+ "[184Hf]": 1967,
2157
+ "[200Pt]": 1968,
2158
+ "[227Pa]": 1969,
2159
+ "[178Yb]": 1970,
2160
+ "[72Br-]": 1971,
2161
+ "[72BrH]": 1972,
2162
+ "[248Am]": 1973,
2163
+ "[238Th]": 1974,
2164
+ "[161Gd]": 1975,
2165
+ "[35S-2]": 1976,
2166
+ "[107Ag]": 1977,
2167
+ "[FeH6-4]": 1978,
2168
+ "[89Sr]": 1979,
2169
+ "[SnH3-]": 1980,
2170
+ "[SeH3]": 1981,
2171
+ "[TeH3+]": 1982,
2172
+ "[SbH4+]": 1983,
2173
+ "[AsH4+]": 1984,
2174
+ "[4He]": 1985,
2175
+ "[AsH3-]": 1986,
2176
+ "[1HH]": 1987,
2177
+ "[3H+]": 1988,
2178
+ "[82Rb]": 1989,
2179
+ "[85Sr]": 1990,
2180
+ "[90Sr]": 1991,
2181
+ "[137Cs]": 1992,
2182
+ "[133Ba]": 1993,
2183
+ "[131Cs]": 1994,
2184
+ "[SbH5]": 1995,
2185
+ "[224Ra]": 1996,
2186
+ "[22Na]": 1997,
2187
+ "[210Bi]": 1998,
2188
+ "[214Bi]": 1999,
2189
+ "[228Ra]": 2000,
2190
+ "[127Sb]": 2001,
2191
+ "[136Cs]": 2002,
2192
+ "[125Sb]": 2003,
2193
+ "[134Cs]": 2004,
2194
+ "[140Ba]": 2005,
2195
+ "[45Ca]": 2006,
2196
+ "[206Pb]": 2007,
2197
+ "[207Pb]": 2008,
2198
+ "[24Na]": 2009,
2199
+ "[86Rb]": 2010,
2200
+ "[212Bi]": 2011,
2201
+ "[208Pb]": 2012,
2202
+ "[124Sb]": 2013,
2203
+ "[204Pb]": 2014,
2204
+ "[44K]": 2015,
2205
+ "[129Te]": 2016,
2206
+ "[113Sn]": 2017,
2207
+ "[204Tl]": 2018,
2208
+ "[87Sr]": 2019,
2209
+ "[208Tl]": 2020,
2210
+ "[87Rb]": 2021,
2211
+ "[47Ca]": 2022,
2212
+ "[135Cs]": 2023,
2213
+ "[216Po]": 2024,
2214
+ "[137Ba]": 2025,
2215
+ "[207Bi]": 2026,
2216
+ "[212Po]": 2027,
2217
+ "[79Se]": 2028,
2218
+ "[223Ra]": 2029,
2219
+ "[86Sr]": 2030,
2220
+ "[122Sb]": 2031,
2221
+ "[26Al]": 2032,
2222
+ "[32Si]": 2033,
2223
+ "[126Sn]": 2034,
2224
+ "[225Ra]": 2035,
2225
+ "[114In]": 2036,
2226
+ "[72Ga]": 2037,
2227
+ "[132Te]": 2038,
2228
+ "[10Be]": 2039,
2229
+ "[125Sn]": 2040,
2230
+ "[73As]": 2041,
2231
+ "[206Bi]": 2042,
2232
+ "[117Sn]": 2043,
2233
+ "[40Ca]": 2044,
2234
+ "[41Ca]": 2045,
2235
+ "[89Rb]": 2046,
2236
+ "[116In]": 2047,
2237
+ "[129Sb]": 2048,
2238
+ "[91Sr]": 2049,
2239
+ "[71Ge]": 2050,
2240
+ "[139Ba]": 2051,
2241
+ "[69Ga]": 2052,
2242
+ "[120Sb]": 2053,
2243
+ "[121Sn]": 2054,
2244
+ "[123Sn]": 2055,
2245
+ "[131Te]": 2056,
2246
+ "[77Ge]": 2057,
2247
+ "[135Ba]": 2058,
2248
+ "[82Sr]": 2059,
2249
+ "[43K]": 2060,
2250
+ "[131Ba]": 2061,
2251
+ "[92Sr]": 2062,
2252
+ "[88Rb]": 2063,
2253
+ "[129Cs]": 2064,
2254
+ "[144Cs]": 2065,
2255
+ "[127Cs]": 2066,
2256
+ "[200Tl]": 2067,
2257
+ "[202Tl]": 2068,
2258
+ "[141Ba]": 2069,
2259
+ "[117Sb]": 2070,
2260
+ "[116Sb]": 2071,
2261
+ "[78As]": 2072,
2262
+ "[131Sb]": 2073,
2263
+ "[126Sb]": 2074,
2264
+ "[128Sb]": 2075,
2265
+ "[130Sb]": 2076,
2266
+ "[67Ge]": 2077,
2267
+ "[68Ge]": 2078,
2268
+ "[78Ge]": 2079,
2269
+ "[66Ge]": 2080,
2270
+ "[223Fr]": 2081,
2271
+ "[132Cs]": 2082,
2272
+ "[125Cs]": 2083,
2273
+ "[138Cs]": 2084,
2274
+ "[133Te]": 2085,
2275
+ "[84Rb]": 2086,
2276
+ "[83Rb]": 2087,
2277
+ "[81Rb]": 2088,
2278
+ "[142Ba]": 2089,
2279
+ "[200Bi]": 2090,
2280
+ "[115Sb]": 2091,
2281
+ "[194Tl]": 2092,
2282
+ "[70Se]": 2093,
2283
+ "[112In]": 2094,
2284
+ "[118Sb]": 2095,
2285
+ "[70Ga]": 2096,
2286
+ "[27Mg]": 2097,
2287
+ "[202Bi]": 2098,
2288
+ "[83Se]": 2099,
2289
+ "[9Li]": 2100,
2290
+ "[69As]": 2101,
2291
+ "[79Rb]": 2102,
2292
+ "[81Sr]": 2103,
2293
+ "[83Sr]": 2104,
2294
+ "[78Se]": 2105,
2295
+ "[109In]": 2106,
2296
+ "[29Al]": 2107,
2297
+ "[118Sn]": 2108,
2298
+ "[117In]": 2109,
2299
+ "[119Sb]": 2110,
2300
+ "[114Sn]": 2111,
2301
+ "[138Ba]": 2112,
2302
+ "[69Ge]": 2113,
2303
+ "[73Ga]": 2114,
2304
+ "[74Ge]": 2115,
2305
+ "[206Tl]": 2116,
2306
+ "[199Tl]": 2117,
2307
+ "[130Cs]": 2118,
2308
+ "[28Mg]": 2119,
2309
+ "[116Te]": 2120,
2310
+ "[112Sn]": 2121,
2311
+ "[126Ba]": 2122,
2312
+ "[211Bi]": 2123,
2313
+ "[81Se]": 2124,
2314
+ "[127Sn]": 2125,
2315
+ "[143Cs]": 2126,
2316
+ "[134Te]": 2127,
2317
+ "[80Sr]": 2128,
2318
+ "[45K]": 2129,
2319
+ "[215Po]": 2130,
2320
+ "[207Po]": 2131,
2321
+ "[111Sn]": 2132,
2322
+ "[211Po]": 2133,
2323
+ "[128Ba]": 2134,
2324
+ "[198Tl]": 2135,
2325
+ "[227Ra]": 2136,
2326
+ "[213Po]": 2137,
2327
+ "[220Ra]": 2138,
2328
+ "[128Sn]": 2139,
2329
+ "[203Po]": 2140,
2330
+ "[205Po]": 2141,
2331
+ "[65Ga]": 2142,
2332
+ "[197Tl]": 2143,
2333
+ "[88Sr]": 2144,
2334
+ "[110In]": 2145,
2335
+ "[31Si]": 2146,
2336
+ "[201Bi]": 2147,
2337
+ "[121Te]": 2148,
2338
+ "[205Bi]": 2149,
2339
+ "[203Bi]": 2150,
2340
+ "[195Tl]": 2151,
2341
+ "[209Tl]": 2152,
2342
+ "[110Sn]": 2153,
2343
+ "[222Fr]": 2154,
2344
+ "[207At]": 2155,
2345
+ "[119In]": 2156,
2346
+ "[As@]": 2157,
2347
+ "[129IH]": 2158,
2348
+ "[157Dy]": 2159,
2349
+ "[111IH]": 2160,
2350
+ "[230Ra]": 2161,
2351
+ "[144Pr+3]": 2162,
2352
+ "[SiH3+]": 2163,
2353
+ "[3He]": 2164,
2354
+ "[AsH5]": 2165,
2355
+ "[72Se]": 2166,
2356
+ "[95Tc]": 2167,
2357
+ "[103Pd]": 2168,
2358
+ "[121Sn+2]": 2169,
2359
+ "[211Rn]": 2170,
2360
+ "[38SH2]": 2171,
2361
+ "[127IH]": 2172,
2362
+ "[74Br-]": 2173,
2363
+ "[133I-]": 2174,
2364
+ "[100Tc+4]": 2175,
2365
+ "[100Tc]": 2176,
2366
+ "[36Cl-]": 2177,
2367
+ "[89Y+3]": 2178,
2368
+ "[104Rh]": 2179,
2369
+ "[152Sm]": 2180,
2370
+ "[226Ra]": 2181,
2371
+ "[19FH]": 2182,
2372
+ "[104Pd]": 2183,
2373
+ "[148Gd]": 2184,
2374
+ "[157Lu]": 2185,
2375
+ "[33SH2]": 2186,
2376
+ "[121I-]": 2187,
2377
+ "[17FH]": 2188,
2378
+ "[71Se]": 2189,
2379
+ "[157Sm]": 2190,
2380
+ "[148Tb]": 2191,
2381
+ "[164Dy]": 2192,
2382
+ "[15OH2]": 2193,
2383
+ "[15O+]": 2194,
2384
+ "[39K]": 2195,
2385
+ "[40Ar]": 2196,
2386
+ "[50Cr+3]": 2197,
2387
+ "[50Cr]": 2198,
2388
+ "[52Ti]": 2199,
2389
+ "[103Pd+2]": 2200,
2390
+ "[130Ba]": 2201,
2391
+ "[142Pm]": 2202,
2392
+ "[153Gd+3]": 2203,
2393
+ "[151Eu]": 2204,
2394
+ "[103Rh]": 2205,
2395
+ "[124Xe]": 2206,
2396
+ "[152Tb]": 2207,
2397
+ "[17OH2]": 2208,
2398
+ "[20Ne]": 2209,
2399
+ "[52Fe]": 2210,
2400
+ "[94Zr+4]": 2211,
2401
+ "[94Zr]": 2212,
2402
+ "[149Pr]": 2213,
2403
+ "[16OH2]": 2214,
2404
+ "[53Cr+6]": 2215,
2405
+ "[53Cr]": 2216,
2406
+ "[81Br-]": 2217,
2407
+ "[112Pd]": 2218,
2408
+ "[125Xe]": 2219,
2409
+ "[155Gd]": 2220,
2410
+ "[157Gd]": 2221,
2411
+ "[168Yb]": 2222,
2412
+ "[184Os]": 2223,
2413
+ "[166Tb]": 2224,
2414
+ "[221Fr]": 2225,
2415
+ "[212Ra]": 2226,
2416
+ "[75Br-]": 2227,
2417
+ "[79Br-]": 2228,
2418
+ "[113Ag]": 2229,
2419
+ "[23Na]": 2230,
2420
+ "[34Cl-]": 2231,
2421
+ "[34ClH]": 2232,
2422
+ "[38Cl-]": 2233,
2423
+ "[56Fe]": 2234,
2424
+ "[68Cu]": 2235,
2425
+ "[77Br-]": 2236,
2426
+ "[90Zr+4]": 2237,
2427
+ "[90Zr]": 2238,
2428
+ "[102Pd]": 2239,
2429
+ "[154Eu+3]": 2240,
2430
+ "[57Mn]": 2241,
2431
+ "[165Tm]": 2242,
2432
+ "[152Dy]": 2243,
2433
+ "[217At]": 2244,
2434
+ "[77se]": 2245,
2435
+ "[13cH-]": 2246,
2436
+ "[122Te]": 2247,
2437
+ "[156Gd]": 2248,
2438
+ "[124Te]": 2249,
2439
+ "[53Ni]": 2250,
2440
+ "[131Xe]": 2251,
2441
+ "[174Hf+4]": 2252,
2442
+ "[174Hf]": 2253,
2443
+ "[76Se]": 2254,
2444
+ "[168Tm]": 2255,
2445
+ "[167Dy]": 2256,
2446
+ "[154Gd]": 2257,
2447
+ "[95Ru]": 2258,
2448
+ "[210At]": 2259,
2449
+ "[85Br]": 2260,
2450
+ "[59Co]": 2261,
2451
+ "[122Xe]": 2262,
2452
+ "[27Al]": 2263,
2453
+ "[54Cr]": 2264,
2454
+ "[198Hg]": 2265,
2455
+ "[85Rb+]": 2266,
2456
+ "[214Tl]": 2267,
2457
+ "[229Rn]": 2268,
2458
+ "[218Pb]": 2269,
2459
+ "[218Bi]": 2270,
2460
+ "[167Tm+3]": 2271,
2461
+ "[18o+]": 2272,
2462
+ "[P@@H+]": 2273,
2463
+ "[P@H+]": 2274,
2464
+ "[13N+]": 2275,
2465
+ "[212Pb+2]": 2276,
2466
+ "[217Bi]": 2277,
2467
+ "[249Cf+2]": 2278,
2468
+ "[18OH3+]": 2279,
2469
+ "[90Sr-]": 2280,
2470
+ "[Cf+3]": 2281,
2471
+ "[200Hg]": 2282,
2472
+ "[86Tc]": 2283,
2473
+ "[141Pr+3]": 2284,
2474
+ "[141Pr]": 2285,
2475
+ "[16nH]": 2286,
2476
+ "[14NH4+]": 2287,
2477
+ "[132Xe]": 2288,
2478
+ "[83Kr]": 2289,
2479
+ "[70Zn+2]": 2290,
2480
+ "[137Ba+2]": 2291,
2481
+ "[36Ar]": 2292,
2482
+ "[38Ar]": 2293,
2483
+ "[21Ne]": 2294,
2484
+ "[126Xe]": 2295,
2485
+ "[136Xe]": 2296,
2486
+ "[128Xe]": 2297,
2487
+ "[134Xe]": 2298,
2488
+ "[84Kr]": 2299,
2489
+ "[86Kr]": 2300,
2490
+ "[78Kr]": 2301,
2491
+ "[80Kr]": 2302,
2492
+ "[82Kr]": 2303,
2493
+ "[67Zn+2]": 2304,
2494
+ "[65Cu+2]": 2305,
2495
+ "[110Te]": 2306,
2496
+ "[58Fe+3]": 2307,
2497
+ "[142Nd]": 2308,
2498
+ "[38K]": 2309,
2499
+ "[198Au+3]": 2310,
2500
+ "[122IH]": 2311,
2501
+ "[38PH3]": 2312,
2502
+ "[130I-]": 2313,
2503
+ "[40K+]": 2314,
2504
+ "[38K+]": 2315,
2505
+ "[28Mg+2]": 2316,
2506
+ "[208Tl+]": 2317,
2507
+ "[13OH2]": 2318,
2508
+ "[198Bi]": 2319,
2509
+ "[192Bi]": 2320,
2510
+ "[194Bi]": 2321,
2511
+ "[196Bi]": 2322,
2512
+ "[132I-]": 2323,
2513
+ "[83Sr+2]": 2324,
2514
+ "[169Er+3]": 2325,
2515
+ "[122I-]": 2326,
2516
+ "[120I-]": 2327,
2517
+ "[92Sr+2]": 2328,
2518
+ "[126I-]": 2329,
2519
+ "[24Mg]": 2330,
2520
+ "[84Sr]": 2331,
2521
+ "[118Pd+2]": 2332,
2522
+ "[118Pd]": 2333,
2523
+ "[AsH4]": 2334,
2524
+ "[127I-]": 2335,
2525
+ "[9C-]": 2336,
2526
+ "[11CH3+]": 2337,
2527
+ "[17B]": 2338,
2528
+ "[7B]": 2339,
2529
+ "[4HH]": 2340,
2530
+ "[18C-]": 2341,
2531
+ "[22CH3-]": 2342,
2532
+ "[22CH4]": 2343,
2533
+ "[17C-]": 2344,
2534
+ "[15CH3]": 2345,
2535
+ "[16CH3]": 2346,
2536
+ "[11NH3]": 2347,
2537
+ "[21NH3]": 2348,
2538
+ "[11N-]": 2349,
2539
+ "[11NH]": 2350,
2540
+ "[16CH]": 2351,
2541
+ "[17CH2]": 2352,
2542
+ "[99Ru+2]": 2353,
2543
+ "[181Ta+2]": 2354,
2544
+ "[181Ta]": 2355,
2545
+ "[20CH]": 2356,
2546
+ "[32PH2]": 2357,
2547
+ "[55Fe+2]": 2358,
2548
+ "[SH3]": 2359,
2549
+ "[S@H]": 2360,
2550
+ "[UNK]": 2361
2551
+ },
2552
+ "merges": []
2553
+ }
2554
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[CLS]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[SEP]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[PAD]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[MASK]",
29
+ "lstrip": true,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "2361": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "extra_special_tokens": {},
47
+ "mask_token": "[MASK]",
48
+ "model_input_names": [
49
+ "input_ids",
50
+ "attention_mask"
51
+ ],
52
+ "model_max_length": 512,
53
+ "pad_token": "[PAD]",
54
+ "sep_token": "[SEP]",
55
+ "tokenizer_class": "PreTrainedTokenizerFast",
56
+ "unk_token": "[UNK]"
57
+ }