TitleOS commited on
Commit
4ccba9e
·
verified ·
1 Parent(s): df70c71

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -76,8 +76,18 @@ Benchmarking is on-going, with a number of evaluations runs. So far, the followi
76
  1. LiveCodeBench (Code Generation Lite - Release v2)
77
  Pass@1 (Quantization Q8_0): 26.22% (Passed 134 out of 511 problems)
78
 
79
- 2. CyberSecEval 4 (CyberSOCEval - Meta's Purple LLaMA)
80
- Score (Quantization Q8_0):
 
 
 
 
 
 
 
 
 
 
81
 
82
  ## Limitations & Warning
83
 
 
76
  1. LiveCodeBench (Code Generation Lite - Release v2)
77
  Pass@1 (Quantization Q8_0): 26.22% (Passed 134 out of 511 problems)
78
 
79
+ | Comparable Model | Parameter Size / Tier | Approximate Pass@1 |
80
+ | :--- | :--- | :--- |
81
+ | LLama-3-70b-Instruct | 70B | ~28.3% |
82
+ | GPT-4o-mini (2024-07) | Small Proprietary | ~27.7% |
83
+ | Claude 3 Sonnet (Original) | Large Proprietary | ~26.9% |
84
+ | Mixtral-8x22B-Instruct | 141B (MoE) | ~26.4% |
85
+ | **Eve-4B (Q8_0)** | 4B (Quantized) | 26.22% |
86
+ | Mistral-Large | Large Proprietary | ~26.0% |
87
+ | GPT-3.5-Turbo-0125 | Mid Proprietary | ~24.6% |
88
+ | Claude 3 Haiku | Small Proprietary | ~24.5% |
89
+ | Codestral-Latest | 22B | ~23.8% |
90
+ | Llama-3-8b-Instruct | 8B | ~15.3% |
91
 
92
  ## Limitations & Warning
93