amrisaurus
/

pretrained-m-bert

@@ -11,9 +11,11 @@ probably proofread and complete it, then remove this comment. -->
 # pretrained-m-bert
-This model is a fine-tuned version of [amrisaurus/pretrained-bert](https://huggingface.co/amrisaurus/pretrained-bert) on an unknown dataset.
 It achieves the following results on the evaluation set:
 ## Model description
@@ -32,11 +34,23 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- optimizer: None
 - training_precision: float32
 ### Training results
 ### Framework versions

 # pretrained-m-bert
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Train Loss: 6.0589
+- Validation Loss: 12.2793
+- Epoch: 9
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- optimizer: {'name': 'Adam', 'learning_rate': 1e-04, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}
 - training_precision: float32
 ### Training results
+| Train Loss | Validation Loss | Epoch |
+|:----------:|:---------------:|:-----:|
+| 10.2772    | 11.0091         | 0     |
+| 7.9077     | 11.0096         | 1     |
+| 6.8422     | 11.0426         | 2     |
+| 6.6196     | 11.1006         | 3     |
+| 6.4596     | 11.5412         | 4     |
+| 6.9657     | 11.7570         | 5     |
+| 6.3738     | 11.7909         | 6     |
+| 6.1480     | 12.0058         | 7     |
+| 6.2503     | 11.9410         | 8     |
+| 6.0589     | 12.2793         | 9     |
 ### Framework versions

config.json CHANGED Viewed

@@ -1,11 +1,10 @@
 {
-  "_name_or_path": "amrisaurus/pretrained-bert",
   "architectures": [
-    "BertForSequenceClassification"
   ],
   "attention_probs_dropout_prob": 0.1,
   "classifier_dropout": null,
-  "gradient_checkpointing": false,
   "hidden_act": "gelu",
   "hidden_dropout_prob": 0.1,
   "hidden_size": 768,
@@ -17,9 +16,14 @@
   "num_attention_heads": 12,
   "num_hidden_layers": 12,
   "pad_token_id": 0,
   "position_embedding_type": "absolute",
   "transformers_version": "4.27.0.dev0",
   "type_vocab_size": 2,
   "use_cache": true,
-  "vocab_size": 28996
 }

 {
   "architectures": [
+    "BertForPreTraining"
   ],
   "attention_probs_dropout_prob": 0.1,
   "classifier_dropout": null,
+  "directionality": "bidi",
   "hidden_act": "gelu",
   "hidden_dropout_prob": 0.1,
   "hidden_size": 768,
   "num_attention_heads": 12,
   "num_hidden_layers": 12,
   "pad_token_id": 0,
+  "pooler_fc_size": 768,
+  "pooler_num_attention_heads": 12,
+  "pooler_num_fc_layers": 3,
+  "pooler_size_per_head": 128,
+  "pooler_type": "first_token_transform",
   "position_embedding_type": "absolute",
   "transformers_version": "4.27.0.dev0",
   "type_vocab_size": 2,
   "use_cache": true,
+  "vocab_size": 119547
 }

tf_model.h5 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e90af7ce7c9c09806d77bfb6d0ce6f7d12b7f4993cb3e7f8f460d9b5b90e06af
-size 433535320

 version https://git-lfs.github.com/spec/v1
+oid sha256:ee16c899da3b4047597557aae1b300660d935ffb668141fa31c010622e24a158
+size 1083389236