Training in progress, step 200, checkpoint

Browse files

Files changed (16) hide show

last-checkpoint/1_Pooling/config.json +1 -1
last-checkpoint/README.md +29 -37
last-checkpoint/config.json +16 -52
last-checkpoint/config_sentence_transformers.json +3 -15
last-checkpoint/model.safetensors +2 -2
last-checkpoint/modules.json +1 -13
last-checkpoint/optimizer.pt +2 -2
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scaler.pt +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/special_tokens_map.json +13 -9
last-checkpoint/tokenizer.json +2 -2
last-checkpoint/tokenizer_config.json +0 -0
last-checkpoint/trainer_state.json +21 -49
last-checkpoint/training_args.bin +1 -1
last-checkpoint/vocab.txt +0 -0

last-checkpoint/1_Pooling/config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-    "word_embedding_dimension": 768,
     "pooling_mode_cls_token": false,
     "pooling_mode_mean_tokens": true,
     "pooling_mode_max_tokens": false,

 {
+    "word_embedding_dimension": 384,
     "pooling_mode_cls_token": false,
     "pooling_mode_mean_tokens": true,
     "pooling_mode_max_tokens": false,

last-checkpoint/README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 - generated_from_trainer
 - dataset_size:552482
 - loss:MultipleNegativesRankingLoss
-base_model: google/embeddinggemma-300m
 widget:
 - source_sentence: "title: \nCreatto Flashy Fish  Silly Swimmers LightUp 3D Puzzle\
     \ Kit  Includes Creatto Puzzle Pieces to Make Illuminated Craft Creations Sting\
@@ -356,17 +356,17 @@ pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
-# SentenceTransformer based on google/embeddinggemma-300m
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) on the [amazon_2023_items_processed_filtered](https://huggingface.co/datasets/guyhadad01/Amazon_2023_items_processed_filtered) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
-- **Base model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) <!-- at revision 57c266a740f537b4dc058e1b0cda161fd15afa75 -->
 - **Maximum Sequence Length:** 512 tokens
-- **Output Dimensionality:** 768 dimensions
 - **Similarity Function:** Cosine Similarity
 - **Training Dataset:**
     - [amazon_2023_items_processed_filtered](https://huggingface.co/datasets/guyhadad01/Amazon_2023_items_processed_filtered)
@@ -383,11 +383,9 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [g
 ```
 SentenceTransformer(
-  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
-  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
-  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
-  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
-  (4): Normalize()
 )
 ```
@@ -408,23 +406,21 @@ from sentence_transformers import SentenceTransformer
 # Download from the 🤗 Hub
 model = SentenceTransformer("guyhadad01/EncodeRec_Toys")
 # Run inference
-queries = [
-    "title: \nDH top bottom main gear for 910108 or 905308 or 905008",
-]
-documents = [
     'description\ntop bottom main gear for 910108 or 905308 or 9050081 set top bottom main geartop bottom main gear for 910108 or 905308 or 905008',
     'description\nFrom the Manufacturer\nRound 2s Polar Lights brings back another popular movie car the Time Machine made famous in Back to the Future The kit features every detail to be expected in a Polar Lights kit from the flux capacitor to its gullwing doors A lightning rod hook power assembly is also included as an option The car body features a new authentic looking brushed metal look that captures the brushed aluminum of the real car The tires roll and the snap assembly means the kit can be assembled in no time with great results With this model kit youre sure to hit 88 miles an hour and be outta timeOfficially licensed from the classic film\nFeatures every detail including the Flux Capacitor\nAuthentic brushedmetal look\nFeatures rolling tires\nSnap assembly',
-    'description\nLord of the Fries is the followup to Give Me the Brain another game in the Fast Food Restaurant of the Damned The mechanic is simple combine the ingredients in your hand to build combo meals with dishes like the Cowabunga the Meat Munch and of course the Lord of the Fries This is the fourth edition of Lord of the Fries It was introduced in 1998 and has been updated and improved in every edition The game returns to Cheapass Games after a brief vacation at Steve Jackson Games This time the game is designed to expand The core box contains the original restaurant menu 55 cards plus a completely new Coffee Shop menu with allnew art and ingredients also 55 cards Each deck can support up to 6 players and there is a third menu using ingredients from both decks which supports up to 8 players Also arriving on the scene will be four standalone restaurant decks Mexican Chinese Italian and Irish Each deck plays alone for up to 6 players fits into the core game box and can combine with the core deck for up to 8 playersLightweight funny card game make combo meals from random ingredients\nZombies All new card art by Brian Snoddy\nIncludes a complete Coffee Shop expansion\nDesigned by awardwinning game designer James Ernest\nAll new card art by Brian Snoddy',
 ]
-query_embeddings = model.encode_query(queries)
-document_embeddings = model.encode_document(documents)
-print(query_embeddings.shape, document_embeddings.shape)
-# [1, 768] [3, 768]
 # Get the similarity scores for the embeddings
-similarities = model.similarity(query_embeddings, document_embeddings)
 print(similarities)
-# tensor([[ 0.8773,  0.0462, -0.0253]])
 ```
 <!--
@@ -473,10 +469,10 @@ You can finetune this model on your own dataset.
 * Size: 552,482 training samples
 * Columns: <code>title</code> and <code>description</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | title                                                                            | description                                                                          |
-  |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
-  | type    | string                                                                           | string                                                                               |
-  | details | <ul><li>min: 8 tokens</li><li>mean: 25.9 tokens</li><li>max: 78 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 204.13 tokens</li><li>max: 512 tokens</li></ul> |
 * Samples:
   | title                                                                                                                                                                                             | description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
   |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -495,8 +491,8 @@ You can finetune this model on your own dataset.
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
-- `per_device_train_batch_size`: 32
-- `num_train_epochs`: 2
 - `warmup_ratio`: 0.1
 - `fp16`: True
 - `push_to_hub`: True
@@ -510,7 +506,7 @@ You can finetune this model on your own dataset.
 - `do_predict`: False
 - `eval_strategy`: no
 - `prediction_loss_only`: True
-- `per_device_train_batch_size`: 32
 - `per_device_eval_batch_size`: 8
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
@@ -523,7 +519,7 @@ You can finetune this model on your own dataset.
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1.0
-- `num_train_epochs`: 2
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
@@ -629,14 +625,10 @@ You can finetune this model on your own dataset.
 ### Training Logs
 | Epoch  | Step | Training Loss |
 |:------:|:----:|:-------------:|
-| 0.0029 | 50   | 0.696         |
-| 0.0058 | 100  | 0.1107        |
-| 0.0087 | 150  | 0.0709        |
-| 0.0116 | 200  | 0.0679        |
-| 0.0145 | 250  | 0.0597        |
-| 0.0174 | 300  | 0.0598        |
-| 0.0203 | 350  | 0.0764        |
-| 0.0232 | 400  | 0.0506        |
 ### Framework Versions

 - generated_from_trainer
 - dataset_size:552482
 - loss:MultipleNegativesRankingLoss
+base_model: sentence-transformers/all-MiniLM-L6-v2
 widget:
 - source_sentence: "title: \nCreatto Flashy Fish  Silly Swimmers LightUp 3D Puzzle\
     \ Kit  Includes Creatto Puzzle Pieces to Make Illuminated Craft Creations Sting\
 library_name: sentence-transformers
 ---
+# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on the [amazon_2023_items_processed_filtered](https://huggingface.co/datasets/guyhadad01/Amazon_2023_items_processed_filtered) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
 - **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
 - **Training Dataset:**
     - [amazon_2023_items_processed_filtered](https://huggingface.co/datasets/guyhadad01/Amazon_2023_items_processed_filtered)
 ```
 SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
+  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
 )
 ```
 # Download from the 🤗 Hub
 model = SentenceTransformer("guyhadad01/EncodeRec_Toys")
 # Run inference
+sentences = [
+    'title: \nDH top bottom main gear for 910108 or 905308 or 905008',
     'description\ntop bottom main gear for 910108 or 905308 or 9050081 set top bottom main geartop bottom main gear for 910108 or 905308 or 905008',
     'description\nFrom the Manufacturer\nRound 2s Polar Lights brings back another popular movie car the Time Machine made famous in Back to the Future The kit features every detail to be expected in a Polar Lights kit from the flux capacitor to its gullwing doors A lightning rod hook power assembly is also included as an option The car body features a new authentic looking brushed metal look that captures the brushed aluminum of the real car The tires roll and the snap assembly means the kit can be assembled in no time with great results With this model kit youre sure to hit 88 miles an hour and be outta timeOfficially licensed from the classic film\nFeatures every detail including the Flux Capacitor\nAuthentic brushedmetal look\nFeatures rolling tires\nSnap assembly',
 ]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 384]
 # Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
 print(similarities)
+# tensor([[1.0000, 0.8211, 0.0629],
+#         [0.8211, 1.0000, 0.1016],
+#         [0.0629, 0.1016, 1.0000]])
 ```
 <!--
 * Size: 552,482 training samples
 * Columns: <code>title</code> and <code>description</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | title                                                                             | description                                                                          |
+  |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                               |
+  | details | <ul><li>min: 6 tokens</li><li>mean: 21.46 tokens</li><li>max: 61 tokens</li></ul> | <ul><li>min: 13 tokens</li><li>mean: 193.18 tokens</li><li>max: 512 tokens</li></ul> |
 * Samples:
   | title                                                                                                                                                                                             | description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
   |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
+- `per_device_train_batch_size`: 512
+- `num_train_epochs`: 1
 - `warmup_ratio`: 0.1
 - `fp16`: True
 - `push_to_hub`: True
 - `do_predict`: False
 - `eval_strategy`: no
 - `prediction_loss_only`: True
+- `per_device_train_batch_size`: 512
 - `per_device_eval_batch_size`: 8
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1.0
+- `num_train_epochs`: 1
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
 ### Training Logs
 | Epoch  | Step | Training Loss |
 |:------:|:----:|:-------------:|
+| 0.0463 | 50   | 0.6644        |
+| 0.0926 | 100  | 0.3737        |
+| 0.1389 | 150  | 0.3271        |
+| 0.1852 | 200  | 0.309         |
 ### Framework Versions

last-checkpoint/config.json CHANGED Viewed

@@ -1,61 +1,25 @@
 {
-  "_sliding_window_pattern": 6,
   "architectures": [
-    "Gemma3TextModel"
   ],
-  "attention_bias": false,
-  "attention_dropout": 0.0,
-  "attn_logit_softcapping": null,
-  "bos_token_id": 2,
-  "dtype": "float32",
-  "eos_token_id": 1,
-  "final_logit_softcapping": null,
-  "head_dim": 256,
-  "hidden_activation": "gelu_pytorch_tanh",
-  "hidden_size": 768,
   "initializer_range": 0.02,
-  "intermediate_size": 1152,
-  "layer_types": [
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "full_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "full_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "full_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "sliding_attention",
-    "full_attention"
-  ],
-  "max_position_embeddings": 2048,
-  "model_type": "gemma3_text",
-  "num_attention_heads": 3,
-  "num_hidden_layers": 24,
-  "num_key_value_heads": 1,
   "pad_token_id": 0,
-  "query_pre_attn_scalar": 256,
-  "rms_norm_eps": 1e-06,
-  "rope_local_base_freq": 10000.0,
-  "rope_scaling": null,
-  "rope_theta": 1000000.0,
-  "sliding_window": 512,
   "torch_dtype": "float32",
   "transformers_version": "4.55.2",
-  "use_bidirectional_attention": true,
   "use_cache": true,
-  "vocab_size": 262144
 }

 {
   "architectures": [
+    "BertModel"
   ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
   "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
   "pad_token_id": 0,
+  "position_embedding_type": "absolute",
   "torch_dtype": "float32",
   "transformers_version": "4.55.2",
+  "type_vocab_size": 2,
   "use_cache": true,
+  "vocab_size": 30522
 }

last-checkpoint/config_sentence_transformers.json CHANGED Viewed

@@ -1,25 +1,13 @@
 {
-  "model_type": "SentenceTransformer",
   "__version__": {
     "sentence_transformers": "5.1.0",
     "transformers": "4.55.2",
     "pytorch": "2.7.1+cu126"
   },
   "prompts": {
-    "query": "task: search result | query: ",
-    "document": "title: none | text: ",
-    "BitextMining": "task: search result | query: ",
-    "Clustering": "task: clustering | query: ",
-    "Classification": "task: classification | query: ",
-    "InstructionRetrieval": "task: code retrieval | query: ",
-    "MultilabelClassification": "task: classification | query: ",
-    "PairClassification": "task: sentence similarity | query: ",
-    "Reranking": "task: search result | query: ",
-    "Retrieval": "task: search result | query: ",
-    "Retrieval-query": "task: search result | query: ",
-    "Retrieval-document": "title: none | text: ",
-    "STS": "task: sentence similarity | query: ",
-    "Summarization": "task: summarization | query: "
   },
   "default_prompt_name": null,
   "similarity_fn_name": "cosine"

 {
   "__version__": {
     "sentence_transformers": "5.1.0",
     "transformers": "4.55.2",
     "pytorch": "2.7.1+cu126"
   },
+  "model_type": "SentenceTransformer",
   "prompts": {
+    "query": "",
+    "document": ""
   },
   "default_prompt_name": null,
   "similarity_fn_name": "cosine"

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6582519cc43472715ff09bc8df56e970c36c8b09bfc714e3e192aee757eaee08
-size 1211486072

 version https://git-lfs.github.com/spec/v1
+oid sha256:ee6d5fe7579a8728e8ce1b5a3d16cc6d66c5719392898bc44d8ff0f2fa3b0a9f
+size 90864192

last-checkpoint/modules.json CHANGED Viewed

@@ -14,19 +14,7 @@
   {
     "idx": 2,
     "name": "2",
-    "path": "2_Dense",
-    "type": "sentence_transformers.models.Dense"
-  },
-  {
-    "idx": 3,
-    "name": "3",
-    "path": "3_Dense",
-    "type": "sentence_transformers.models.Dense"
-  },
-  {
-    "idx": 4,
-    "name": "4",
-    "path": "4_Normalize",
     "type": "sentence_transformers.models.Normalize"
   }
 ]

   {
     "idx": 2,
     "name": "2",
+    "path": "2_Normalize",
     "type": "sentence_transformers.models.Normalize"
   }
 ]

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5f2ae5f738d4f327522544fdad61a200aa371a47e470a3064bbdc08ee183ccfb
-size 2460919051

 version https://git-lfs.github.com/spec/v1
+oid sha256:30bfc8d92d7169e34af54cf060e06b10ee4e5ac9ddc2b191b9c37e4b367c1665
+size 180608203

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0a476cabe07caf4d88ceb96f89abdd17c804eea0bd9d134458869c5219b65220
 size 14645

 version https://git-lfs.github.com/spec/v1
+oid sha256:9cb75720d80bc56d0ec34834b2514caa83f7e0d893f31f4afcbf3ca5aa7e264b
 size 14645

last-checkpoint/scaler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1a4e9c48a349d0a234c258982447a8a00d011d07b1124aeae1ab41f62980a473
 size 1383

 version https://git-lfs.github.com/spec/v1
+oid sha256:124625e167eb28acbfc793cfcb3e8a08b32e7fea06501462bc9e420a5e1beb2a
 size 1383

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f73d856eaa2ecf465e98ae99429ae8f821bc70f284dc9f5844d0faffe2f796d2
 size 1465

 version https://git-lfs.github.com/spec/v1
+oid sha256:541a8d7a44578a73d37559badc34c3a1afe0d9b1d50ab26f1598001f3a3c1618
 size 1465

last-checkpoint/special_tokens_map.json CHANGED Viewed

@@ -1,30 +1,34 @@
 {
-  "boi_token": "<start_of_image>",
-  "bos_token": {
-    "content": "<bos>",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
-  "eoi_token": "<end_of_image>",
-  "eos_token": {
-    "content": "<eos>",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
-  "image_token": "<image_soft_token>",
   "pad_token": {
-    "content": "<pad>",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
   "unk_token": {
-    "content": "<unk>",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,

 {
+  "cls_token": {
+    "content": "[CLS]",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
+  "mask_token": {
+    "content": "[MASK]",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
   "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,
     "single_word": false
   },
   "unk_token": {
+    "content": "[UNK]",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,

last-checkpoint/tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c79a190be01275b078b3574d02188abc5784e5651a101b20d826371ba8e897dc
-size 33385261

 version https://git-lfs.github.com/spec/v1
+oid sha256:91f1def9b9391fdabe028cd3f3fcc4efd34e5d1f08c3bf2de513ebb5911a1854
+size 711649

last-checkpoint/tokenizer_config.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,74 +2,46 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.023166917641607783,
   "eval_steps": 500,
-  "global_step": 400,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 0.002895864705200973,
-      "grad_norm": 15.905227661132812,
-      "learning_rate": 6.514186450492183e-07,
-      "loss": 0.696,
       "step": 50
     },
     {
-      "epoch": 0.005791729410401946,
-      "grad_norm": 27.821880340576172,
-      "learning_rate": 1.3752171395483499e-06,
-      "loss": 0.1107,
       "step": 100
     },
     {
-      "epoch": 0.00868759411560292,
-      "grad_norm": 7.949689865112305,
-      "learning_rate": 2.099015634047481e-06,
-      "loss": 0.0709,
       "step": 150
     },
     {
-      "epoch": 0.011583458820803891,
-      "grad_norm": 28.785518646240234,
-      "learning_rate": 2.8228141285466127e-06,
-      "loss": 0.0679,
       "step": 200
-    },
-    {
-      "epoch": 0.014479323526004865,
-      "grad_norm": 4.010167598724365,
-      "learning_rate": 3.546612623045744e-06,
-      "loss": 0.0597,
-      "step": 250
-    },
-    {
-      "epoch": 0.01737518823120584,
-      "grad_norm": 10.415971755981445,
-      "learning_rate": 4.270411117544876e-06,
-      "loss": 0.0598,
-      "step": 300
-    },
-    {
-      "epoch": 0.02027105293640681,
-      "grad_norm": 2.172306537628174,
-      "learning_rate": 4.994209612044008e-06,
-      "loss": 0.0764,
-      "step": 350
-    },
-    {
-      "epoch": 0.023166917641607783,
-      "grad_norm": 15.422524452209473,
-      "learning_rate": 5.718008106543138e-06,
-      "loss": 0.0506,
-      "step": 400
     }
   ],
   "logging_steps": 50,
-  "max_steps": 34532,
   "num_input_tokens_seen": 0,
-  "num_train_epochs": 2,
   "save_steps": 200,
   "stateful_callbacks": {
     "TrainerControl": {
@@ -84,7 +56,7 @@
     }
   },
   "total_flos": 0.0,
-  "train_batch_size": 32,
   "trial_name": null,
   "trial_params": null
 }

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 0.18518518518518517,
   "eval_steps": 500,
+  "global_step": 200,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.046296296296296294,
+      "grad_norm": 1.7154957056045532,
+      "learning_rate": 2.2685185185185187e-05,
+      "loss": 0.6644,
       "step": 50
     },
     {
+      "epoch": 0.09259259259259259,
+      "grad_norm": 1.6062076091766357,
+      "learning_rate": 4.5833333333333334e-05,
+      "loss": 0.3737,
       "step": 100
     },
     {
+      "epoch": 0.1388888888888889,
+      "grad_norm": 1.5934187173843384,
+      "learning_rate": 4.7890946502057616e-05,
+      "loss": 0.3271,
       "step": 150
     },
     {
+      "epoch": 0.18518518518518517,
+      "grad_norm": 1.4217034578323364,
+      "learning_rate": 4.531893004115226e-05,
+      "loss": 0.309,
       "step": 200
     }
   ],
   "logging_steps": 50,
+  "max_steps": 1080,
   "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
   "save_steps": 200,
   "stateful_callbacks": {
     "TrainerControl": {
     }
   },
   "total_flos": 0.0,
+  "train_batch_size": 512,
   "trial_name": null,
   "trial_params": null
 }

last-checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:41bbb7d97dbf5fc36f4da6eb62b2c1b3fde18fa18ee31c63764adb9fcb45d686
 size 6097

 version https://git-lfs.github.com/spec/v1
+oid sha256:d38868219e5f23127c67539cb1aea90b6a40b71d9518f7f8f35a1f4dc71c50de
 size 6097

last-checkpoint/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff