Sky ι
commited on
Commit
Β·
0c71773
1
Parent(s):
dcbc29a
myupdate
Browse files- stage_0_training_ck/result.md +14 -0
- stage_1_training_ck/results.md +131 -0
stage_0_training_ck/result.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
c=0
|
| 2 |
+
CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes 1 --nproc_per_node 2 run.py args/gsm_coconut_eval.yaml
|
| 3 |
+
|
| 4 |
+
**ck6**
|
| 5 |
+
Accuracy on validation set: 520 / 1320 = 0.3939393939393939
|
| 6 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 7 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 8 |
+
<<9*2=18>>'
|
| 9 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 10 |
+
<|start-latent|><|end-latent|>3+4=7>>
|
| 11 |
+
<<16-7=9>>
|
| 12 |
+
<<9*2=18>>
|
| 13 |
+
### 18'
|
| 14 |
+
Extracted Output: '18'
|
stage_1_training_ck/results.md
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nnodes 1 --nproc_per_node 4 run.py args/gsm_coconut_eval.yaml
|
| 2 |
+
|
| 3 |
+
**ck4**
|
| 4 |
+
Accuracy on validation set: 389 / 1320 = 0.2946969696969697
|
| 5 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 6 |
+
|
| 7 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 8 |
+
<<9*2=18>>'
|
| 9 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 10 |
+
<|start-latent|><|latent|><|latent|><|end-latent|><<16-9=7>>
|
| 11 |
+
<<7*2=14>>
|
| 12 |
+
### 14'
|
| 13 |
+
Extracted Output: '14'
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
**ck5**
|
| 17 |
+
Accuracy on validation set: 464 / 1320 = 0.3515151515151515
|
| 18 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 19 |
+
|
| 20 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 21 |
+
<<9*2=18>>'
|
| 22 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 23 |
+
<|start-latent|><|latent|><|latent|><|end-latent|><<16-7=9>>
|
| 24 |
+
<<9*2=18>>
|
| 25 |
+
### 18'
|
| 26 |
+
Extracted Output: '18'
|
| 27 |
+
|
| 28 |
+
**ck6**
|
| 29 |
+
Accuracy on validation set: 457 / 1320 = 0.3462121212121212
|
| 30 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 31 |
+
|
| 32 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 33 |
+
<<9*2=18>>'
|
| 34 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 35 |
+
<|start-latent|><|latent|><|latent|><|end-latent|><<16-12=4>>
|
| 36 |
+
<<4*2=8>>
|
| 37 |
+
### 8'
|
| 38 |
+
Extracted Output: '8'
|
| 39 |
+
|
| 40 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 41 |
+
|
| 42 |
+
**ck7**
|
| 43 |
+
Accuracy on validation set: 407 / 1320 = 0.30833333333333335
|
| 44 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 45 |
+
|
| 46 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 47 |
+
<<9*2=18>>'
|
| 48 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 49 |
+
<|start-latent|><|latent|><|latent|><|latent|><|latent|><|end-latent|><<13*2=26>>
|
| 50 |
+
### 26'
|
| 51 |
+
Extracted Output: '26'
|
| 52 |
+
|
| 53 |
+
**ck8**
|
| 54 |
+
Accuracy on validation set: 449 / 1320 = 0.34015151515151515
|
| 55 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 56 |
+
|
| 57 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 58 |
+
<<9*2=18>>'
|
| 59 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 60 |
+
<|start-latent|><|latent|><|latent|><|latent|><|latent|><|end-latent|><<11*2=22>>
|
| 61 |
+
### 22'
|
| 62 |
+
Extracted Output: '22'
|
| 63 |
+
|
| 64 |
+
**ck9**
|
| 65 |
+
Accuracy on validation set: 454 / 1320 = 0.34393939393939393
|
| 66 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 67 |
+
|
| 68 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 69 |
+
<<9*2=18>>'
|
| 70 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 71 |
+
<|start-latent|><|latent|><|latent|><|latent|><|latent|><|end-latent|><<8*2=16>>
|
| 72 |
+
### 16'
|
| 73 |
+
Extracted Output: '16'
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 77 |
+
**ck10**
|
| 78 |
+
Accuracy on validation set: 426 / 1320 = 0.32272727272727275
|
| 79 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 80 |
+
|
| 81 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 82 |
+
<<9*2=18>>'
|
| 83 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 84 |
+
<|start-latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|end-latent|>### 18'
|
| 85 |
+
Extracted Output: '18'
|
| 86 |
+
|
| 87 |
+
**ck11**
|
| 88 |
+
Accuracy on validation set: 447 / 1320 = 0.3386363636363636
|
| 89 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 90 |
+
|
| 91 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 92 |
+
<<9*2=18>>'
|
| 93 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 94 |
+
<|start-latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|end-latent|>### 18'
|
| 95 |
+
Extracted Output: '18'
|
| 96 |
+
|
| 97 |
+
**ck12**
|
| 98 |
+
Accuracy on validation set: 433 / 1320 = 0.328030303030303
|
| 99 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 100 |
+
|
| 101 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 102 |
+
<<9*2=18>>'
|
| 103 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 104 |
+
<|start-latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|end-latent|>### 32'
|
| 105 |
+
Extracted Output: '32'
|
| 106 |
+
|
| 107 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 108 |
+
|
| 109 |
+
**ck13**
|
| 110 |
+
Accuracy on validation set: 375 / 1320 = 0.2840909090909091
|
| 111 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 112 |
+
|
| 113 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 114 |
+
<<9*2=18>>'
|
| 115 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 116 |
+
<|start-latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|end-latent|>### 18'
|
| 117 |
+
Extracted Output: '18'
|
| 118 |
+
|
| 119 |
+
**ck14**
|
| 120 |
+
Accuracy on validation set: 396 / 1320 = 0.3
|
| 121 |
+
CoT match on validation set: 0 / 1320 = 0.0
|
| 122 |
+
|
| 123 |
+
Question 0: Answer = '18' CoT = '<<16-3-4=9>>
|
| 124 |
+
<<9*2=18>>'
|
| 125 |
+
Full output: 'Janetβs ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
|
| 126 |
+
<|start-latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|latent|><|end-latent|>### 18'
|
| 127 |
+
Extracted Output: '18'
|
| 128 |
+
|
| 129 |
+
|
| 130 |
+
|
| 131 |
+
|