OpenNLPLab commited on
Commit
67b67aa
·
1 Parent(s): fc689fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -26,7 +26,7 @@ tags:
26
  - [Released Weights](#released-weights)
27
  - [Benchmark Results](#benchmark-results)
28
  - [General Domain](#general-domain)
29
- - [7B Model Results](#7b-model-results)
30
  - [Inference and Deployment](#inference-and-deployment)
31
  - [Dependency Installation](#dependency-installation)
32
  - [Notice](#notice)
@@ -86,7 +86,7 @@ In the general domain, we conducted 5-shot tests on the following datasets:
86
  - [CMMLU](https://github.com/haonan-li/CMMLU) is a comprehensive Chinese evaluation benchmark covering 67 topics, specifically designed to assess language models' knowledge and reasoning capabilities in a Chinese context. We adopted its [official](https://github.com/haonan-li/CMMLU) evaluation approach.
87
 
88
 
89
- ### 7B Model Results
90
  **Performance Comparison on Commonsense Reasoning and Aggregated Benchmarks.** For a fair comparison, we report competing methods' results reproduced by us using their released models. PS: parameter size (billion). T: tokens (trillion). HS: HellaSwag. WG: WinoGrande.
91
 
92
  | Model | PS | T | BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | MMLU | CMMLU | C-Eval |
 
26
  - [Released Weights](#released-weights)
27
  - [Benchmark Results](#benchmark-results)
28
  - [General Domain](#general-domain)
29
+ - [Model Results](#model-results)
30
  - [Inference and Deployment](#inference-and-deployment)
31
  - [Dependency Installation](#dependency-installation)
32
  - [Notice](#notice)
 
86
  - [CMMLU](https://github.com/haonan-li/CMMLU) is a comprehensive Chinese evaluation benchmark covering 67 topics, specifically designed to assess language models' knowledge and reasoning capabilities in a Chinese context. We adopted its [official](https://github.com/haonan-li/CMMLU) evaluation approach.
87
 
88
 
89
+ ### Model Results
90
  **Performance Comparison on Commonsense Reasoning and Aggregated Benchmarks.** For a fair comparison, we report competing methods' results reproduced by us using their released models. PS: parameter size (billion). T: tokens (trillion). HS: HellaSwag. WG: WinoGrande.
91
 
92
  | Model | PS | T | BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | MMLU | CMMLU | C-Eval |