davidkim205 commited on
Commit
cb24a9d
·
verified ·
1 Parent(s): aeb09c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -100,25 +100,25 @@ The score is calculated by:
100
  2. Assigning full points for a difference of 0, and half a point for a difference of 1.
101
  3. The total score is the sum of all points divided by the number of data points.
102
 
103
- | | file | wrong | score | length | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
104
- |---:|:--------------------|:---------|:--------|---------:|:-----------|:-----------|:-----------|:---------|:---------|:---------|:---------|:---------|----:|:---------|:---------|-----:|:---------|
105
- | 0 | kgrammar-2-9b.jsonl | 0 (0.0%) | 77.5% | 80 | 52 (65.0%) | 20 (25.0%) | 5 (6.2%) | 1 (1.2%) | 1 (1.2%) | 0 | 1 (1.2%) | 0 | 0 | 0 | 0 | 0 | 0 |
106
- | 1 | kgrammar-2-3b.jsonl | 0 (0.0%) | 74.4% | 80 | 51 (63.7%) | 17 (21.2%) | 8 (10.0%) | 1 (1.2%) | 1 (1.2%) | 0 | 1 (1.2%) | 1 (1.2%) | 0 | 0 | 0 | 0 | 0 |
107
- | 2 | kgrammar-2-1b.jsonl | 1 (1.2%) | 67.5% | 80 | 44 (55.0%) | 20 (25.0%) | 8 (10.0%) | 2 (2.5%) | 2 (2.5%) | 1 (1.2%) | 0 | 2 (2.5%) | 0 | 0 | 0 | 0 | 0 |
108
- | 3 | gpt-4o.jsonl | 1 (1.2%) | 56.9% | 80 | 34 (42.5%) | 23 (28.7%) | 14 (17.5%) | 3 (3.8%) | 2 (2.5%) | 2 (2.5%) | 0 | 0 | 0 | 0 | 0 | 0 | 1 (1.2%) |
109
- | 4 | gpt-4o-mini.jsonl | 0 (0.0%) | 44.4% | 80 | 19 (23.8%) | 33 (41.2%) | 18 (22.5%) | 3 (3.8%) | 1 (1.2%) | 3 (3.8%) | 0 | 1 (1.2%) | 0 | 1 (1.2%) | 1 (1.2%) | 0 | 0 |
110
 
111
  ### Accuracy
112
 
113
  The `score` column represents the ratio of correctly predicted labels to the total number of data points. The `wrong` column shows the count and percentage of incorrectly formatted answers. The columns labeled "0" through "10" represent the number and percentage of correct predictions for each label, based on how well the model predicted each specific label.
114
 
115
- | | file | wrong | score | length | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
116
- |---:|:--------------------|:---------|:--------|---------:|:-----------|:----------|:----------|:----------|:----------|:-----------|:----------|:-----------|----:|----:|-----:|-----:|-----:|
117
- | 0 | kgrammar-2-9b.jsonl | 0 (0.0%) | 65.0% | 80 | 35 (97.2%) | 5 (71.4%) | 7 (50.0%) | 3 (37.5%) | 2 (40.0%) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
118
- | 1 | kgrammar-2-3b.jsonl | 0 (0.0%) | 63.7% | 80 | 35 (97.2%) | 2 (28.6%) | 8 (57.1%) | 3 (37.5%) | 2 (40.0%) | 1 (50.0%) | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
119
- | 2 | kgrammar-2-1b.jsonl | 1 (1.2%) | 55.0% | 80 | 34 (94.4%) | 3 (42.9%) | 4 (28.6%) | 2 (25.0%) | 0 | 0 | 1 (50.0%) | 0 | 0 | 0 | 0 | 0 | 0 |
120
- | 3 | gpt-4o.jsonl | 1 (1.2%) | 42.5% | 80 | 9 (25.0%) | 6 (85.7%) | 8 (57.1%) | 7 (87.5%) | 1 (20.0%) | 2 (100.0%) | 0 | 1 (100.0%) | 0 | 0 | 0 | 0 | 0 |
121
- | 4 | gpt-4o-mini.jsonl | 0 (0.0%) | 23.8% | 80 | 1 (2.8%) | 5 (71.4%) | 8 (57.1%) | 5 (62.5%) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
122
 
123
  ### Error Detection Accuracy
124
 
 
100
  2. Assigning full points for a difference of 0, and half a point for a difference of 1.
101
  3. The total score is the sum of all points divided by the number of data points.
102
 
103
+ | | model | wrong | score | length | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
104
+ |---:|:--------------|:---------|:--------|---------:|:-----------|:-----------|:-----------|:---------|:---------|:---------|:---------|:---------|----:|:---------|:---------|-----:|:---------|
105
+ | 0 | kgrammar-2-9b | 0 (0.0%) | 77.5% | 80 | 52 (65.0%) | 20 (25.0%) | 5 (6.2%) | 1 (1.2%) | 1 (1.2%) | 0 | 1 (1.2%) | 0 | 0 | 0 | 0 | 0 | 0 |
106
+ | 1 | kgrammar-2-3b | 0 (0.0%) | 74.4% | 80 | 51 (63.7%) | 17 (21.2%) | 8 (10.0%) | 1 (1.2%) | 1 (1.2%) | 0 | 1 (1.2%) | 1 (1.2%) | 0 | 0 | 0 | 0 | 0 |
107
+ | 2 | kgrammar-2-1b | 1 (1.2%) | 67.5% | 80 | 44 (55.0%) | 20 (25.0%) | 8 (10.0%) | 2 (2.5%) | 2 (2.5%) | 1 (1.2%) | 0 | 2 (2.5%) | 0 | 0 | 0 | 0 | 0 |
108
+ | 3 | gpt-4o | 1 (1.2%) | 56.9% | 80 | 34 (42.5%) | 23 (28.7%) | 14 (17.5%) | 3 (3.8%) | 2 (2.5%) | 2 (2.5%) | 0 | 0 | 0 | 0 | 0 | 0 | 1 (1.2%) |
109
+ | 4 | gpt-4o-mini | 0 (0.0%) | 44.4% | 80 | 19 (23.8%) | 33 (41.2%) | 18 (22.5%) | 3 (3.8%) | 1 (1.2%) | 3 (3.8%) | 0 | 1 (1.2%) | 0 | 1 (1.2%) | 1 (1.2%) | 0 | 0 |
110
 
111
  ### Accuracy
112
 
113
  The `score` column represents the ratio of correctly predicted labels to the total number of data points. The `wrong` column shows the count and percentage of incorrectly formatted answers. The columns labeled "0" through "10" represent the number and percentage of correct predictions for each label, based on how well the model predicted each specific label.
114
 
115
+ | | model | wrong | score | length | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
116
+ |---:|:--------------|:---------|:--------|---------:|:-----------|:----------|:----------|:----------|:----------|:-----------|:----------|:-----------|----:|----:|-----:|-----:|-----:|
117
+ | 0 | kgrammar-2-9b | 0 (0.0%) | 65.0% | 80 | 35 (97.2%) | 5 (71.4%) | 7 (50.0%) | 3 (37.5%) | 2 (40.0%) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
118
+ | 1 | kgrammar-2-3b | 0 (0.0%) | 63.7% | 80 | 35 (97.2%) | 2 (28.6%) | 8 (57.1%) | 3 (37.5%) | 2 (40.0%) | 1 (50.0%) | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
119
+ | 2 | kgrammar-2-1b | 1 (1.2%) | 55.0% | 80 | 34 (94.4%) | 3 (42.9%) | 4 (28.6%) | 2 (25.0%) | 0 | 0 | 1 (50.0%) | 0 | 0 | 0 | 0 | 0 | 0 |
120
+ | 3 | gpt-4o | 1 (1.2%) | 42.5% | 80 | 9 (25.0%) | 6 (85.7%) | 8 (57.1%) | 7 (87.5%) | 1 (20.0%) | 2 (100.0%) | 0 | 1 (100.0%) | 0 | 0 | 0 | 0 | 0 |
121
+ | 4 | gpt-4o-mini | 0 (0.0%) | 23.8% | 80 | 1 (2.8%) | 5 (71.4%) | 8 (57.1%) | 5 (62.5%) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
122
 
123
  ### Error Detection Accuracy
124