|
|
--- |
|
|
title: LLM IC Leaderboard |
|
|
emoji: 👍 |
|
|
colorFrom: purple |
|
|
colorTo: indigo |
|
|
sdk: static |
|
|
pinned: false |
|
|
app_build_command: npm install && npm run build |
|
|
app_file: dist/index.html |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 使用 |
|
|
1. 配置参数及字典映射: `src/composables/useLeaderboardData.js` |
|
|
```js |
|
|
DEFAULT_HIDDEN: new Set(['seq_len', 'uniform_entropy', 'entropy_gain', 'information_capacity', 'data_name']) // 默认隐藏的列 |
|
|
|
|
|
// 模型类型映射对象:键为模型类型,值为包含的 model_series 数组 |
|
|
const modelTypeMapping = { |
|
|
'Qwen1.5':['Qwen1.5'], |
|
|
'Qwen2.5':['Qwen2.5'], |
|
|
'Gemma-3': ['gemma-3'], |
|
|
'Qwen2': ['Qwen2'], |
|
|
'Hunyuan': ['Hunyuan'], |
|
|
'Qwen3': ['Qwen3'], |
|
|
'InternLM2.5': ['internlm2.5'], |
|
|
'Llama-3': ['Llama-3.1','Llama-3.2'], |
|
|
'DeepSeek-V2': ['DeepSeek-V2'], |
|
|
'DeepSeek-V3.1': ['DeepSeek-V3.1-Base'], |
|
|
'GLM-4': ['glm-4','GLM-4'], |
|
|
'GLM-4.5': ['GLM-4.5-Air-Base','GLM-4.5-Base'], |
|
|
'Llama-4': ['Llama-4'], |
|
|
'Seed-OSS': ['Seed-OSS'], |
|
|
} |
|
|
|
|
|
const autoShowSeries = ['Qwen3','Llama-3','InternLM2.5','GLM-4','Seed-OSS','Gemma-3','Hunyuan','DeepSeek-V3.1','DeepSeek-V2','GLM-4.5'] |
|
|
|
|
|
// 表头显示名称映射(raw header -> 显示名),可以在此添加或由用户修改 |
|
|
const headerDisplayMap = reactive({ |
|
|
'model_name': 'MODEL NAME', |
|
|
'model_series': 'MODEL SERIES', |
|
|
'model_size (B)': 'MODEL SIZE (B)', |
|
|
'BF16_TFLOPs': 'BF16 TFLOPs', |
|
|
'ic': 'IC', |
|
|
}) |
|
|
|
|
|
// 数据集名称显示映射(raw data_name -> 显示名) |
|
|
const dataNameDisplayMap = reactive({ |
|
|
'data_part_0000': 'Data Part 0000', |
|
|
'NextCoderDataset_v1_lo': 'NextCoder v1', |
|
|
'IndustryCorpus_batch_aa_long': 'Industry Corpus AA', |
|
|
'CC-MAIN-2013-20_train-00000-of-00014_long': 'CC-MAIN (2013-20)', |
|
|
'NextCoderDataset_v1_long': 'NextCoder v1', |
|
|
}) |
|
|
|
|
|
// MoE模型区分配置 |
|
|
const MoEModelSeries = ['Qwen2', 'Qwen1.5'] |
|
|
|
|
|
``` |
|
|
|
|
|
# 安装 |
|
|
|
|
|
## 1 合并 CSV(脚本) |
|
|
|
|
|
仓库包含一个小脚本 `scripts/merge_csv.js`,用于把多个 CSV 合并为一个文件(简单合并,保留首次出现的列顺序)。 |
|
|
|
|
|
示例: |
|
|
|
|
|
```sh |
|
|
# 使用目录 |
|
|
npm run merge-csv -- --inputDir ./data --out ./merged.csv |
|
|
|
|
|
# 或直接传文件 |
|
|
npm run merge-csv -- file1.csv file2.csv --out merged.csv |
|
|
``` |
|
|
|
|
|
说明:`npm run merge-csv` 后面的参数会传递给脚本(注意 `--`),脚本会把所有 CSV 的表头合并成最终输出的表头,行数按各文件合并。该实现为轻量版,能处理普通带引号字段的 CSV,但不保证支持所有复杂嵌套换行场景。 |
|
|
|
|
|
## 2 统计类别 |
|
|
```sh |
|
|
node .\scripts\count_column.js --file .\data\entropy_calc4_part3.csv --column model_series --out .\p3_model_series.csv |
|
|
``` |
|
|
统计后需要手动合并类目,如Llama-3.2 Llama-3.1合并为Llama-3 |
|
|
删除TingLLama |
|
|
|
|
|
## 3. 筛选处理 |
|
|
因为在步骤`2`中我们删除了TinyLlama,执行此脚本从merged.csv中提取出最终表 |
|
|
|
|
|
使用默认路径(merged.csv 在 outdata): |
|
|
`node .\scripts\filter_by_series.js` |
|
|
|
|
|
指定输入文件与 series 列表并输出到自定义文件: |
|
|
`node .\scripts\filter_by_series.js --file .\outdata\merged.csv --series .\outdata\p3_model_series.csv --out .\outdata\filtered.csv` |
|
|
|