Instructions to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8")
model = AutoModelForCausalLM.from_pretrained("DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8

SGLang

How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with Docker Model Runner:
```
docker model run hf.co/DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8
```

Mistral-Nemo-Instruct-2407-NVFP4-FP8 / benchmarks.md

DataSnake

Upload 2 files

d752c92 verified about 2 months ago

preview code

raw

history blame contribute delete

2.28 kB

		Mistral-Nemo-Instruct-2407-NVFP4		Mistral-Nemo-Instruct-2407-NVFP4-4over6		Mistral-Nemo-Instruct-2407-NVFP4-FP8-RTN		Mistral-Nemo-Instruct-2407-NVFP4-FP8
Task	Metric	Value	Stderr	Value	Stderr	Value	Stderr	Value	Stderr
coqa	em	0.5392	0.0199	0.5498	0.0202	0.5683	0.0196	0.5733	0.0195
	f1	0.7182	0.0151	0.7212	0.0154	0.7401	0.0142	0.7347	0.0150
hellaswag	acc	0.6186	0.0048	0.6194	0.0048	0.6238	0.0048	0.6240	0.0048
	acc_norm	0.8084	0.0039	0.8132	0.0039	0.8140	0.0039	0.8125	0.0039
ifeval	inst_level_loose_acc	0.5456	N/A	0.5683	N/A	0.5564	N/A	0.5767	N/A
	inst_level_strict_acc	0.4712	N/A	0.5096	N/A	0.5012	N/A	0.5108	N/A
	prompt_level_loose_acc	0.4603	0.0214	0.4824	0.0215	0.4621	0.0215	0.4917	0.0215
	prompt_level_strict_acc	0.3808	0.0209	0.4122	0.0212	0.3993	0.0211	0.4196	0.0212
lambada_openai	acc	0.7584	0.0060	0.7687	0.0059	0.7619	0.0059	0.7726	0.0058
	perplexity	3.0229	0.0563	2.9546	0.0541	2.9591	0.0556	2.9233	0.0542
lambada_openai_cloze	acc	0.3122	0.0065	0.2983	0.0064	0.3315	0.0066	0.3317	0.0066
	perplexity	29.8427	0.7625	30.0355	0.7780	26.6970	0.6838	26.6948	0.6858
lambada_standard	acc	0.6885	0.0065	0.6907	0.0064	0.6971	0.0064	0.6926	0.0064
	perplexity	3.6401	0.0766	3.6600	0.0756	3.4930	0.0721	3.5514	0.0734
lambada_standard_cloze	acc	0.2259	0.0058	0.2467	0.0060	0.2583	0.0061	0.2837	0.0063
	perplexity	44.8440	1.1469	40.9925	1.0271	37.4110	0.9371	35.5615	0.8741
commonsense_qa	acc	0.5774	0.0141	0.5921	0.0141	0.6061	0.0140	0.6208	0.0139
mmlu	acc	0.6325	0.0038	0.6364	0.0038	0.6434	0.0038	0.6454	0.0038
	acc	0.5673	0.0067	0.5779	0.0067	0.5819	0.0067	0.5864	0.0067
	acc	0.7123	0.0078	0.7110	0.0078	0.7210	0.0078	0.7277	0.0077
	acc	0.7491	0.0076	0.7504	0.0076	0.7563	0.0076	0.7563	0.0076
	acc	0.5373	0.0085	0.5392	0.0085	0.5487	0.0084	0.5442	0.0085
openbookqa	acc	0.3680	0.0216	0.3920	0.0219	0.4040	0.0220	0.4040	0.0220
	acc_norm	0.4700	0.0223	0.4720	0.0223	0.4780	0.0224	0.4880	0.0224
winogrande	acc	0.7672	0.0119	0.7553	0.0121	0.7514	0.0121	0.7545	0.0121
triviaqa	exact_match	0.5953	0.0037	0.6011	0.0037	0.6105	0.0036	0.6184	0.0036
truthfulqa_mc1	acc	0.3782	0.0170	0.3831	0.0170	0.3892	0.0171	0.3917	0.0171
truthfulqa_mc2	acc	0.5284	0.0150	0.5367	0.0149	0.5390	0.0151	0.5475	0.0150