sengi commited on
Commit
28a0712
·
verified ·
1 Parent(s): 0f85252

End of training

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  library_name: transformers
3
- base_model: maple-research-lab/LLaDOU-v0-Math
 
4
  tags:
5
  - generated_from_trainer
6
  model-index:
@@ -13,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # LLaDA-planner_balanced
15
 
16
- This model is a fine-tuned version of [maple-research-lab/LLaDOU-v0-Math](https://huggingface.co/maple-research-lab/LLaDOU-v0-Math) on an unknown dataset.
17
 
18
  ## Model description
19
 
@@ -32,14 +33,14 @@ More information needed
32
  ### Training hyperparameters
33
 
34
  The following hyperparameters were used during training:
35
- - learning_rate: 1e-06
36
  - train_batch_size: 4
37
  - eval_batch_size: 16
38
  - seed: 42
39
  - distributed_type: multi-GPU
40
  - optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
41
  - lr_scheduler_type: cosine_warmup_with_min_lr
42
- - training_steps: 22000
43
 
44
  ### Training results
45
 
@@ -49,5 +50,5 @@ The following hyperparameters were used during training:
49
 
50
  - Transformers 4.56.1
51
  - Pytorch 2.8.0+cu128
52
- - Datasets 4.0.0
53
  - Tokenizers 0.22.0
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
+ base_model: GSAI-ML/LLaDA-8B-Instruct
5
  tags:
6
  - generated_from_trainer
7
  model-index:
 
14
 
15
  # LLaDA-planner_balanced
16
 
17
+ This model is a fine-tuned version of [GSAI-ML/LLaDA-8B-Instruct](https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct) on an unknown dataset.
18
 
19
  ## Model description
20
 
 
33
  ### Training hyperparameters
34
 
35
  The following hyperparameters were used during training:
36
+ - learning_rate: 1e-05
37
  - train_batch_size: 4
38
  - eval_batch_size: 16
39
  - seed: 42
40
  - distributed_type: multi-GPU
41
  - optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
42
  - lr_scheduler_type: cosine_warmup_with_min_lr
43
+ - training_steps: 100000
44
 
45
  ### Training results
46
 
 
50
 
51
  - Transformers 4.56.1
52
  - Pytorch 2.8.0+cu128
53
+ - Datasets 4.4.1
54
  - Tokenizers 0.22.0