model_5M_large_ds_masking_0.1_predicted_hparamas

Browse files

Files changed (4) hide show

README.md +16 -23
config.json +4 -4
model.safetensors +2 -2
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,8 +16,8 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0177
-- Accuracy: 0.9938
 ## Model description
@@ -36,10 +36,12 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.005776
-- train_batch_size: 256
-- eval_batch_size: 256
 - seed: 42
 - optimizer: Use OptimizerNames.SCHEDULE_FREE_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: constant
 - lr_scheduler_warmup_steps: 1000
@@ -48,24 +50,15 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch  | Step  | Validation Loss | Accuracy |
-|:-------------:|:------:|:-----:|:---------------:|:--------:|
-| No log        | 0      | 0     | 4.4232          | 0.0023   |
-| 0.1014        | 0.2190 | 1953  | 0.0735          | 0.9756   |
-| 0.0586        | 0.4379 | 3906  | 0.0483          | 0.9839   |
-| 0.0453        | 0.6569 | 5859  | 0.0375          | 0.9873   |
-| 0.0375        | 0.8759 | 7812  | 0.0323          | 0.9890   |
-| 0.0327        | 1.0949 | 9765  | 0.0286          | 0.9902   |
-| 0.0308        | 1.3138 | 11718 | 0.0269          | 0.9908   |
-| 0.0287        | 1.5328 | 13671 | 0.0271          | 0.9907   |
-| 0.0265        | 1.7518 | 15624 | 0.0241          | 0.9916   |
-| 0.0248        | 1.9707 | 17577 | 0.0217          | 0.9925   |
-| 0.0235        | 2.1897 | 19530 | 0.0207          | 0.9928   |
-| 0.0226        | 2.4087 | 21483 | 0.0196          | 0.9932   |
-| 0.0213        | 2.6276 | 23436 | 0.0196          | 0.9931   |
-| 0.0206        | 2.8466 | 25389 | 0.0182          | 0.9936   |
-| 0.0198        | 3.0656 | 27342 | 0.0178          | 0.9937   |
-| 0.0192        | 3.2846 | 29295 | 0.0196          | 0.9932   |
 ### Framework versions

 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.3592
+- Accuracy: 0.8820
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.032227
+- train_batch_size: 512
+- eval_batch_size: 512
 - seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 4096
 - optimizer: Use OptimizerNames.SCHEDULE_FREE_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: constant
 - lr_scheduler_warmup_steps: 1000
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Accuracy |
+|:-------------:|:------:|:----:|:---------------:|:--------:|
+| No log        | 0      | 0    | 4.5828          | 0.0102   |
+| No log        | 0.0044 | 122  | 0.6880          | 0.7866   |
+| No log        | 0.0087 | 244  | 0.4368          | 0.8569   |
+| No log        | 0.0131 | 366  | 0.4019          | 0.8682   |
+| No log        | 0.0175 | 488  | 0.3571          | 0.8823   |
+| 6.242         | 0.0218 | 610  | 0.3568          | 0.8831   |
+| 6.242         | 0.0262 | 732  | 0.3879          | 0.8729   |
 ### Framework versions

config.json CHANGED Viewed

@@ -17,10 +17,10 @@
   "global_attn_every_n_layers": 1,
   "global_rope_theta": 160000.0,
   "hidden_activation": "gelu",
-  "hidden_size": 384,
   "initializer_cutoff_factor": 2.0,
   "initializer_range": 0.02,
-  "intermediate_size": 576,
   "local_attention": 128,
   "local_rope_theta": 10000.0,
   "max_position_embeddings": 502,
@@ -29,8 +29,8 @@
   "model_type": "modernbert",
   "norm_bias": false,
   "norm_eps": 1e-05,
-  "num_attention_heads": 6,
-  "num_hidden_layers": 12,
   "pad_token_id": 1,
   "repad_logits_with_grad": false,
   "sep_token_id": 3,

   "global_attn_every_n_layers": 1,
   "global_rope_theta": 160000.0,
   "hidden_activation": "gelu",
+  "hidden_size": 256,
   "initializer_cutoff_factor": 2.0,
   "initializer_range": 0.02,
+  "intermediate_size": 384,
   "local_attention": 128,
   "local_rope_theta": 10000.0,
   "max_position_embeddings": 502,
   "model_type": "modernbert",
   "norm_bias": false,
   "norm_eps": 1e-05,
+  "num_attention_heads": 4,
+  "num_hidden_layers": 8,
   "pad_token_id": 1,
   "repad_logits_with_grad": false,
   "sep_token_id": 3,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c42e7df2d3a9cf53a5eb965dfadf595f85169b6e06b9b099ee8cfa677eff227c
-size 60925776

 version https://git-lfs.github.com/spec/v1
+oid sha256:ea1e850c5c7a32e410c0889f182bc7748967c48f7e09c8367cc4176ad7dd679c
+size 18195880

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b3e64dc216100c5e190e7cac2d9057d62128bc81cdbb8e3bef681abcbcb9e3f5
 size 5905

 version https://git-lfs.github.com/spec/v1
+oid sha256:b84d5daf60035789cf715d153a5bee4499ce6f2dd288bf595a618494c24931bc
 size 5905