mekjr1
/

guilbert-base-uncased

@@ -5,52 +5,73 @@ tags:
 model-index:
 - name: mekjr1/guilbert-base-uncased
   results: []
 ---
-<!-- This model card has been generated automatically according to the information Keras had access to. You should
-probably proofread and complete it, then remove this comment. -->
 # mekjr1/guilbert-base-uncased
-This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Train Loss: 1.9616
-- Validation Loss: 1.8529
-- Epoch: 8
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 2e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 7167, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
 - training_precision: mixed_float16
 ### Training results
 | Train Loss | Validation Loss | Epoch |
 |:----------:|:---------------:|:-----:|
 | 1.9626     | 1.9024          | 5     |
 | 1.9574     | 1.8421          | 6     |
 | 1.9594     | 1.8632          | 7     |
 | 1.9616     | 1.8529          | 8     |
 ### Framework versions
 - Transformers 4.26.1
 - TensorFlow 2.11.0
 - Datasets 2.10.1
-- Tokenizers 0.13.2

 model-index:
 - name: mekjr1/guilbert-base-uncased
   results: []
+datasets:
+- mekjr1/guilbert_lm
+language:
+- en
 ---
 # mekjr1/guilbert-base-uncased
+This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased)  on an guilbert dataset. It is a masked language model that predicts missing tokens in a sentence.
 ## Model description
+The model is based on the `bert-base-uncased` architecture, which has 12 layers, 768 hidden units, and 12 attention heads. It has been fine-tuned on a dataset with samples labeled as guilt or non-guilt from the Vent dataset. The model was trained with a maximum sequence length of 128 tokens and a batch size of 32. The training process used the AdamW optimizer with a learning rate of 2e-5, a weight decay rate of 0.01, and a linear learning rate warmup over 1,000 steps. The model achieved a validation loss of 1.8529 after 8 epochs.
 ## Intended uses & limitations
+This model can be used for predicting missing tokens in text sequences, particularly in the context of detecting guilt emotion in documents or other relevant applications.
+However, the accuracy of the model may be limited by the quality and representativeness of the training data, as well as the biases present in the pre-trained `bert-base-uncased` architecture.
 ## Training and evaluation data
+The model was trained on a dataset of samples labeled as guilt or non-guilt from the guilbert dataset (Extracted from Vent).
 ## Training procedure
+The model was trained using TensorFlow Keras with the AdamW optimizer and a learning rate of 2e-5. The training process used a batch size of 32 and a maximum sequence length of 128 tokens. The optimizer used a weight decay rate of 0.01 and a linear learning rate warmup over 1,000 steps. The model was trained for 8 epochs, with early stopping based on the validation loss. The training process achieved a validation loss of 1.8529 after 8 epochs.
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- Optimizer: `AdamWeightDecay` with a learning rate of `WarmUp(initial_learning_rate=2e-05, decay_schedule_fn=PolynomialDecay(initial_learning_rate=2e-05, decay_steps=7167, end_learning_rate=0.0, power=1.0, cycle=False), warmup_steps=1000, power=1.0)`
+- Weight decay rate: 0.01
+- Batch size: 32
+- Maximum sequence length: 128
+- Number of warmup steps: 1,000
+- Number of training steps: 1,761
+The following hyperparameters were used during training:
 - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 2e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 7167, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
 - training_precision: mixed_float16
 ### Training results
+The following table shows the training and validation loss for each epoch:
 | Train Loss | Validation Loss | Epoch |
 |:----------:|:---------------:|:-----:|
+| 2.0976     | 1.8593          | 0     |
+| 1.9643     | 1.8547          | 1     |
+| 1.9651     | 1.9003          | 2     |
+| 1.9608     | 1.8617          | 3     |
+| 1.9646     | 1.8756          | 4     |
 | 1.9626     | 1.9024          | 5     |
 | 1.9574     | 1.8421          | 6     |
 | 1.9594     | 1.8632          | 7     |
 | 1.9616     | 1.8529          | 8     |
 ### Framework versions
 - Transformers 4.26.1
 - TensorFlow 2.11.0
 - Datasets 2.10.1
+- Tokenizers 0.13.2