OzTianlu commited on
Commit
17e6d5c
·
verified ·
1 Parent(s): 327a471

Upload 2 files

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +16 -0
  3. benchmark_comparison.png +3 -0
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ benchmark_comparison.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -72,6 +72,22 @@ Reasoning samples are wrapped with `<think>…</think>` tags and upsampled 10×
72
 
73
  Results from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness):
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  | Benchmark | Few-shot | Metric | Score | ± |
76
  |-----------|----------|--------|-------|---|
77
  | GSM8K | 5 | flexible-extract / exact_match | **0.6293** | 0.0133 |
 
72
 
73
  Results from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness):
74
 
75
+ ### Comparison with Peer Models
76
+
77
+ ![Benchmark Comparison](benchmark_comparison.png)
78
+
79
+ > `< 10%` entries are displayed as `<10%` in the chart.
80
+
81
+ | Benchmark | Arcade-3B | Gemma-2-2B | Llama-2-7B | Qwen1.5-1.8B | OpenLLaMA-v2-3B |
82
+ |-----------|-----------|------------|------------|--------------|-----------------|
83
+ | MMLU | **52.9%** | 52.4% | 45.3% | 46.8% | 41.0% |
84
+ | GSM8K | **62.9%** | 50.9% | 14.6% | 37.8% | < 10% |
85
+ | HumanEval | **41.5%** | 32.3% | 12.8% | 27.4% | < 10% |
86
+ | ARC-Challenge | 52.6% | **53.1%** | 46.2% | 41.2% | 34.2% |
87
+ | ARC-Easy | 74.4% | **75.9%** | 75.3% | 66.8% | 68.1% |
88
+
89
+ ### Arcade-3B Detailed Scores
90
+
91
  | Benchmark | Few-shot | Metric | Score | ± |
92
  |-----------|----------|--------|-------|---|
93
  | GSM8K | 5 | flexible-extract / exact_match | **0.6293** | 0.0133 |
benchmark_comparison.png ADDED

Git LFS Details

  • SHA256: 6e35a9aacc7072f561a59613f4691039d9f711a124ea8cf899b2d39edb858e37
  • Pointer size: 131 Bytes
  • Size of remote file: 110 kB