Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b",
        "prompt": "San Francisco is a"
    } '

Evaluations

English	Mistral-7B-Instruct-v0.3	Mistral-7B-Instruct-v0.3-GPTQ-8b	Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg.	67.65	67.72	66.95
ARC	64.2	64.1	62.1
Hellaswag	75.6	75.6	76.0
MMLU	63.16	63.47	62.75

French	Mistral-7B-Instruct-v0.3	Mistral-7B-Instruct-v0.3-GPTQ-8b	Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg.	56.4	56.17	54.77
ARC_fr	51.9	51.4	50.0
Hellaswag_fr	65.8	65.8	63.8
MMLU_fr	51.5	51.3	50.5

German	Mistral-7B-Instruct-v0.3	Mistral-7B-Instruct-v0.3-GPTQ-8b	Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg.	51.83	51.73	51.7
ARC_de	47.6	47.5	47.3
Hellaswag_de	58.9	59.0	57.3
MMLU_de	49.0	48.7	50.5

Italian	Mistral-7B-Instruct-v0.3	Mistral-7B-Instruct-v0.3-GPTQ-8b	Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg.	54.93	54.8	52.83
ARC_it	51.6	51.6	49.3
Hellaswag_it	63.5	63.8	61.0
MMLU_it	49.7	49.0	48.2

Safety	Mistral-7B-Instruct-v0.3	Mistral-7B-Instruct-v0.3-GPTQ-8b	Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg.	60.32	60.54	64.8
RealToxicityPrompts	89.7	90.0	90.7
TruthfulQA	59.71	59.48	58.32
CrowS	31.54	32.14	45.38

Spanish	Mistral-7B-Instruct-v0.3	Mistral-7B-Instruct-v0.3-GPTQ-8b	Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg.	57.9	57.97	56.1
ARC_es	53.5	53.5	51
Hellaswag_es	68.5	68.5	66.2
MMLU_es	51.7	51.9	51.1

We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000.

Performance

	requests/s	tokens/s
NVIDIA L4x1	3.75	1867.13
NVIDIA L4x2	5.03	2503.83
NVIDIA L4x4	5.86	2916.3
Performance measured on cortecs inference.

Downloads last month: 2

cortecs
/

Mistral-7B-Instruct-v0.3-GPTQ-4b

Usage

Evaluations

Performance

Dataset used to train cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b