IlPakoZ
/

m5-encoder

@@ -61,7 +61,7 @@ rel_pos     = torch.tensor(pos_encod).unsqueeze(0)   # (1, seq_len, seq_len)
 outputs = model(input_ids=input_ids, attention_mask=attn_mask, relative_position=rel_pos)
 hidden  = outputs.last_hidden_state   # (1, seq_len, 512)
-```
 A function ``model.collate_for_dataset`` is also available to perform collation for use in Pytorch's DataLoader. The function gets a list of tuples, each of which is composed of:
 - the first element is a dictionary with keys ``"input_ids"`` (``np.ndarray``, shape ``(L,)``) and ``"attention_mask"`` (``np.ndarray``, shape ``(L,)``), as produced by a tokenizer
@@ -79,8 +79,8 @@ A function ``model.collate_for_dataset`` is also available to perform collation
 | `num_heads` | 12 |
 | `vocab_size` | 1 032 |
 | `feed_forward_proj` | gated-gelu |
-| `relative_attention_num_buckets` | 48 |
-| `relative_attention_max_distance` | 128 |
 Position biases are replaced by molecular-graph distances computed
 with RDKit and binned with a modified T5 logarithm binning algorithm, giving the model awareness to molecular topology without being too strict on precise distances.

 outputs = model(input_ids=input_ids, attention_mask=attn_mask, relative_position=rel_pos)
 hidden  = outputs.last_hidden_state   # (1, seq_len, 512)
+```
 A function ``model.collate_for_dataset`` is also available to perform collation for use in Pytorch's DataLoader. The function gets a list of tuples, each of which is composed of:
 - the first element is a dictionary with keys ``"input_ids"`` (``np.ndarray``, shape ``(L,)``) and ``"attention_mask"`` (``np.ndarray``, shape ``(L,)``), as produced by a tokenizer
 | `num_heads` | 12 |
 | `vocab_size` | 1 032 |
 | `feed_forward_proj` | gated-gelu |
+| `relative_attention_num_buckets` | 32 |
+| `relative_attention_max_distance` | 96 |
 Position biases are replaced by molecular-graph distances computed
 with RDKit and binned with a modified T5 logarithm binning algorithm, giving the model awareness to molecular topology without being too strict on precise distances.

config.json CHANGED Viewed

@@ -148,7 +148,7 @@
   },
   "layer_norm_epsilon": 1e-06,
   "model_type": "m5_model",
-  "num_decoder_layers": 24,
   "num_heads": 12,
   "num_layers": 24,
   "pad_token_id": 2,

   },
   "layer_norm_epsilon": 1e-06,
   "model_type": "m5_model",
+  "num_decoder_layers": 0,
   "num_heads": 12,
   "num_layers": 24,
   "pad_token_id": 2,

modeling_m5_encoder.py CHANGED Viewed

@@ -50,6 +50,7 @@ class M5EncoderConfig(T5Config):
                          relative_attention_max_distance=relative_attention_max_distance,
                          relative_attention_num_buckets=relative_attention_num_buckets,
                          vocab_size=vocab_size,
                          **kwargs)
 class M5Encoder(PreTrainedModel):

                          relative_attention_max_distance=relative_attention_max_distance,
                          relative_attention_num_buckets=relative_attention_num_buckets,
                          vocab_size=vocab_size,
+                         num_decoder_layers=0,
                          **kwargs)
 class M5Encoder(PreTrainedModel):