fabikru commited on
Commit
389c5e0
·
verified ·
1 Parent(s): 8495306

model_5M_large_ds_masking_0.1_predicted_hparamas

Browse files
Files changed (4) hide show
  1. README.md +16 -23
  2. config.json +4 -4
  3. model.safetensors +2 -2
  4. training_args.bin +1 -1
README.md CHANGED
@@ -16,8 +16,8 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.0177
20
- - Accuracy: 0.9938
21
 
22
  ## Model description
23
 
@@ -36,10 +36,12 @@ More information needed
36
  ### Training hyperparameters
37
 
38
  The following hyperparameters were used during training:
39
- - learning_rate: 0.005776
40
- - train_batch_size: 256
41
- - eval_batch_size: 256
42
  - seed: 42
 
 
43
  - optimizer: Use OptimizerNames.SCHEDULE_FREE_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
44
  - lr_scheduler_type: constant
45
  - lr_scheduler_warmup_steps: 1000
@@ -48,24 +50,15 @@ The following hyperparameters were used during training:
48
 
49
  ### Training results
50
 
51
- | Training Loss | Epoch | Step | Validation Loss | Accuracy |
52
- |:-------------:|:------:|:-----:|:---------------:|:--------:|
53
- | No log | 0 | 0 | 4.4232 | 0.0023 |
54
- | 0.1014 | 0.2190 | 1953 | 0.0735 | 0.9756 |
55
- | 0.0586 | 0.4379 | 3906 | 0.0483 | 0.9839 |
56
- | 0.0453 | 0.6569 | 5859 | 0.0375 | 0.9873 |
57
- | 0.0375 | 0.8759 | 7812 | 0.0323 | 0.9890 |
58
- | 0.0327 | 1.0949 | 9765 | 0.0286 | 0.9902 |
59
- | 0.0308 | 1.3138 | 11718 | 0.0269 | 0.9908 |
60
- | 0.0287 | 1.5328 | 13671 | 0.0271 | 0.9907 |
61
- | 0.0265 | 1.7518 | 15624 | 0.0241 | 0.9916 |
62
- | 0.0248 | 1.9707 | 17577 | 0.0217 | 0.9925 |
63
- | 0.0235 | 2.1897 | 19530 | 0.0207 | 0.9928 |
64
- | 0.0226 | 2.4087 | 21483 | 0.0196 | 0.9932 |
65
- | 0.0213 | 2.6276 | 23436 | 0.0196 | 0.9931 |
66
- | 0.0206 | 2.8466 | 25389 | 0.0182 | 0.9936 |
67
- | 0.0198 | 3.0656 | 27342 | 0.0178 | 0.9937 |
68
- | 0.0192 | 3.2846 | 29295 | 0.0196 | 0.9932 |
69
 
70
 
71
  ### Framework versions
 
16
 
17
  This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.3592
20
+ - Accuracy: 0.8820
21
 
22
  ## Model description
23
 
 
36
  ### Training hyperparameters
37
 
38
  The following hyperparameters were used during training:
39
+ - learning_rate: 0.032227
40
+ - train_batch_size: 512
41
+ - eval_batch_size: 512
42
  - seed: 42
43
+ - gradient_accumulation_steps: 8
44
+ - total_train_batch_size: 4096
45
  - optimizer: Use OptimizerNames.SCHEDULE_FREE_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
46
  - lr_scheduler_type: constant
47
  - lr_scheduler_warmup_steps: 1000
 
50
 
51
  ### Training results
52
 
53
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
54
+ |:-------------:|:------:|:----:|:---------------:|:--------:|
55
+ | No log | 0 | 0 | 4.5828 | 0.0102 |
56
+ | No log | 0.0044 | 122 | 0.6880 | 0.7866 |
57
+ | No log | 0.0087 | 244 | 0.4368 | 0.8569 |
58
+ | No log | 0.0131 | 366 | 0.4019 | 0.8682 |
59
+ | No log | 0.0175 | 488 | 0.3571 | 0.8823 |
60
+ | 6.242 | 0.0218 | 610 | 0.3568 | 0.8831 |
61
+ | 6.242 | 0.0262 | 732 | 0.3879 | 0.8729 |
 
 
 
 
 
 
 
 
 
62
 
63
 
64
  ### Framework versions
config.json CHANGED
@@ -17,10 +17,10 @@
17
  "global_attn_every_n_layers": 1,
18
  "global_rope_theta": 160000.0,
19
  "hidden_activation": "gelu",
20
- "hidden_size": 384,
21
  "initializer_cutoff_factor": 2.0,
22
  "initializer_range": 0.02,
23
- "intermediate_size": 576,
24
  "local_attention": 128,
25
  "local_rope_theta": 10000.0,
26
  "max_position_embeddings": 502,
@@ -29,8 +29,8 @@
29
  "model_type": "modernbert",
30
  "norm_bias": false,
31
  "norm_eps": 1e-05,
32
- "num_attention_heads": 6,
33
- "num_hidden_layers": 12,
34
  "pad_token_id": 1,
35
  "repad_logits_with_grad": false,
36
  "sep_token_id": 3,
 
17
  "global_attn_every_n_layers": 1,
18
  "global_rope_theta": 160000.0,
19
  "hidden_activation": "gelu",
20
+ "hidden_size": 256,
21
  "initializer_cutoff_factor": 2.0,
22
  "initializer_range": 0.02,
23
+ "intermediate_size": 384,
24
  "local_attention": 128,
25
  "local_rope_theta": 10000.0,
26
  "max_position_embeddings": 502,
 
29
  "model_type": "modernbert",
30
  "norm_bias": false,
31
  "norm_eps": 1e-05,
32
+ "num_attention_heads": 4,
33
+ "num_hidden_layers": 8,
34
  "pad_token_id": 1,
35
  "repad_logits_with_grad": false,
36
  "sep_token_id": 3,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c42e7df2d3a9cf53a5eb965dfadf595f85169b6e06b9b099ee8cfa677eff227c
3
- size 60925776
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea1e850c5c7a32e410c0889f182bc7748967c48f7e09c8367cc4176ad7dd679c
3
+ size 18195880
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b3e64dc216100c5e190e7cac2d9057d62128bc81cdbb8e3bef681abcbcb9e3f5
3
  size 5905
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b84d5daf60035789cf715d153a5bee4499ce6f2dd288bf595a618494c24931bc
3
  size 5905