sophiargh commited on
Commit
042893c
·
verified ·
1 Parent(s): a0ac842

End of training

Browse files
Files changed (1) hide show
  1. README.md +9 -18
README.md CHANGED
@@ -4,8 +4,6 @@ license: apache-2.0
4
  base_model: Qwen/Qwen3-0.6B-Base
5
  tags:
6
  - generated_from_trainer
7
- metrics:
8
- - accuracy
9
  model-index:
10
  - name: MNLP_M3_mcqa_model_3
11
  results: []
@@ -18,8 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.2762
22
- - Accuracy: 0.8986
23
 
24
  ## Model description
25
 
@@ -45,25 +42,19 @@ The following hyperparameters were used during training:
45
  - gradient_accumulation_steps: 4
46
  - total_train_batch_size: 8
47
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
- - lr_scheduler_type: linear
49
  - lr_scheduler_warmup_ratio: 0.01
50
  - num_epochs: 4
51
 
52
  ### Training results
53
 
54
- | Training Loss | Epoch | Step | Validation Loss | Accuracy |
55
- |:-------------:|:------:|:-----:|:---------------:|:--------:|
56
- | 0.2684 | 0.2278 | 1000 | 0.2629 | 0.8904 |
57
- | 0.2572 | 0.4555 | 2000 | 0.2585 | 0.8927 |
58
- | 0.2475 | 0.6833 | 3000 | 0.2496 | 0.8963 |
59
- | 0.2508 | 0.9111 | 4000 | 0.2467 | 0.8970 |
60
- | 0.199 | 1.1387 | 5000 | 0.2594 | 0.8981 |
61
- | 0.1927 | 1.3665 | 6000 | 0.2618 | 0.8987 |
62
- | 0.2051 | 1.5942 | 7000 | 0.2683 | 0.8995 |
63
- | 0.1988 | 1.8220 | 8000 | 0.2651 | 0.8989 |
64
- | 0.1797 | 2.0497 | 9000 | 0.2833 | 0.9 |
65
- | 0.1738 | 2.2774 | 10000 | 0.2861 | 0.8995 |
66
- | 0.1781 | 2.5052 | 11000 | 0.2762 | 0.8986 |
67
 
68
 
69
  ### Framework versions
 
4
  base_model: Qwen/Qwen3-0.6B-Base
5
  tags:
6
  - generated_from_trainer
 
 
7
  model-index:
8
  - name: MNLP_M3_mcqa_model_3
9
  results: []
 
16
 
17
  This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.2545
 
20
 
21
  ## Model description
22
 
 
42
  - gradient_accumulation_steps: 4
43
  - total_train_batch_size: 8
44
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
+ - lr_scheduler_type: cosine
46
  - lr_scheduler_warmup_ratio: 0.01
47
  - num_epochs: 4
48
 
49
  ### Training results
50
 
51
+ | Training Loss | Epoch | Step | Validation Loss |
52
+ |:-------------:|:------:|:----:|:---------------:|
53
+ | 0.2526 | 0.2597 | 1000 | 0.2546 |
54
+ | 0.2401 | 0.5194 | 2000 | 0.2429 |
55
+ | 0.237 | 0.7791 | 3000 | 0.2330 |
56
+ | 0.2227 | 1.0387 | 4000 | 0.2550 |
57
+ | 0.1778 | 1.2984 | 5000 | 0.2545 |
 
 
 
 
 
 
58
 
59
 
60
  ### Framework versions