File size: 3,159 Bytes
d7041cd
 
 
23048e7
 
d7041cd
53ea372
521215e
d7041cd
521215e
d7041cd
23048e7
d7041cd
521215e
 
d7041cd
521215e
 
53ea372
23048e7
53ea372
d7041cd
 
 
53ea372
23048e7
d7041cd
521215e
d7041cd
521215e
d7041cd
521215e
d7041cd
23048e7
d7041cd
 
 
23048e7
d7041cd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Model	Size	Accuracy/std	Precision_Unsafe/std	Recall_Unsafe/std	Precision_Safe/std	Recall_Safe/std
DeepSeek-LLM-67B-Chat	>65B	68.08/0.35	94.80/0.83	38.40/0.43	61.27/0.26	97.88/0.36
Qwen1.5-72B-Chat	>65B	63.67/0.46	58.27/0.32	96.84/0.13	90.51/0.57	30.34/0.80
Qwen2.5-72B-Instruct	>65B	63.27/0.52	66.00/0.60	55.09/0.82	61.31/0.46	71.49/0.25
Qwen2-72B-Instruct	>65B	60.70/0.49	57.90/0.42	79.03/0.63	66.75/0.77	42.28/0.43
Opt-66B	>65B	59.93/0.41	56.52/0.37	86.87/0.59	71.36/0.78	32.86/0.74
DeepSeek-R1-Distill-Llama-70B	>65B	47.68/0.64	45.77/1.21	23.85/0.67	48.35/0.46	71.62/0.60
Llama-3.1-70B-Instruct	>65B	43.68/0.41	36.45/0.84	16.66/0.34	45.83/0.30	70.82/0.48
Llama3-ChatQA-1.5-70B	>65B	40.41/0.29	33.86/0.75	19.84/0.75	43.13/0.25	61.08/0.37
Llama-3.3-70B-Instruct	>65B	36.84/0.82	32.02/1.29	23.19/1.13	39.58/0.63	50.55/0.69
Yi-1.5-34B-Chat	~30B	66.02/0.22	80.13/0.55	42.82/0.25	60.86/0.16	89.33/0.41
Qwen2.5-32B-Instruct	~30B	64.33/0.46	62.46/0.44	72.24/0.71	66.91/0.53	56.38/0.18
Opt-30B	~30B	53.82/0.03	54.42/0.21	48.32/0.20	53.34/0.11	59.34/0.27
QwQ-32B-Preview	~30B	51.82/0.06	51.04/0.10	94.83/0.28	62.38/0.26	8.61/0.39
Phi-3-medium-4k-instruct	10B~20B	71.04/0.31	69.74/0.29	74.56/0.97	72.54/0.59	67.49/0.89
Baichuan2-13B-Chat	10B~20B	70.43/0.39	65.81/0.38	85.34/0.63	79.02/0.63	55.46/0.47
Phi-3-medium-128k-instruct	10B~20B	68.87/0.81	68.08/0.51	71.32/1.44	69.75/1.17	66.41/0.57
Mistral-Nemo-Instruct-2407	10B~20B	66.88/0.46	62.56/0.28	84.42/0.90	75.89/1.13	49.26/0.24
phi-4	10B~20B	62.62/0.32	63.73/0.41	58.98/0.20	61.66/0.31	66.28/0.78
Qwen1.5-14B-Chat	10B~20B	61.29/0.40	57.02/0.32	92.43/0.55	79.80/1.05	30.02/0.47
Mistral-Small-24B-Instruct-2501	10B~20B	59.20/0.46	58.32/0.42	65.16/1.08	60.33/0.56	53.22/0.20
Ziya2-13B-Chat	10B~20B	55.25/0.26	59.24/0.37	34.30/0.11	53.61/0.26	76.29/0.39
InternLM2-Chat-20B	10B~20B	53.67/0.16	79.00/0.66	10.30/0.60	51.90/0.11	97.25/0.26
Opt-13B	10B~20B	49.31/0.31	37.77/3.57	1.76/0.16	49.59/0.23	97.08/0.29
Moonlight-16B-A3B-Instruct	10B~20B	48.92/0.16	3.46/0.57	0.07/0.01	49.40/0.15	98.00/0.08
Gemma-1.1-7B-it	5B~10B	64.32/0.68	59.98/0.58	86.60/0.35	75.70/0.80	41.95/0.93
Qwen1.5-7B-Chat	5B~10B	62.48/0.54	59.06/0.48	81.92/0.50	70.28/0.65	42.96/0.81
Phi-3-small-128k-instruct	5B~10B	61.76/0.27	60.47/0.16	68.45/0.61	63.46/0.50	55.05/0.61
Yi-1.5-9B-Chat	5B~10B	60.35/0.52	79.47/1.37	28.16/0.33	56.22/0.39	92.69/0.59
Phi-3-small-8k-instruct	5B~10B	59.47/0.39	56.25/0.30	86.06/0.40	70.05/0.85	32.75/0.49
DeepSeek-LLM-7B-Chat	5B~10B	56.79/0.19	84.83/1.23	16.77/0.09	53.70/0.15	96.99/0.27
Ministral-8B-Instruct-2410	5B~10B	56.28/0.51	55.10/0.51	68.83/0.58	58.24/0.51	43.66/0.54
GPT-J-6B	5B~10B	55.98/0.42	80.27/1.42	16.11/0.86	53.26/0.23	96.03/0.20
Baichuan2-7B-Chat	5B~10B	53.99/0.51	62.89/1.57	19.96/0.88	52.31/0.30	88.18/0.23
GLM-4-9B-Chat	5B~10B	50.03/0.15	50.07/0.13	99.31/0.22	44.12/9.01	0.52/0.04
InternLM2-Chat-7B	5B~10B	49.49/0.11	42.16/1.58	2.15/0.31	49.68/0.13	97.06/0.25
Opt-6.7B	5B~10B	48.54/0.43	49.24/0.31	86.62/1.03	43.40/1.18	10.30/0.55
Mistral-7B-Instruct-v0.3	5B~10B	42.99/0.06	39.54/0.47	26.01/0.69	44.69/0.11	60.05/0.50
Llama3-ChatQA-1.5-8B	5B~10B	42.11/0.29	37.46/0.85	23.20/0.89	44.20/0.09	61.11/0.57