Text Generation
Transformers
Safetensors
mistral
nvfp4
conversational
text-generation-inference
8-bit precision
compressed-tensors
Instructions to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8") model = AutoModelForCausalLM.from_pretrained("DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8
- SGLang
How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with Docker Model Runner:
docker model run hf.co/DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8
| Mistral-Nemo-Instruct-2407-NVFP4 | Mistral-Nemo-Instruct-2407-NVFP4-4over6 | Mistral-Nemo-Instruct-2407-NVFP4-FP8-RTN | Mistral-Nemo-Instruct-2407-NVFP4-FP8 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Task | Metric | Value | Stderr | Value | Stderr | Value | Stderr | Value | Stderr |
| coqa | em | 0.5392 | 0.0199 | 0.5498 | 0.0202 | 0.5683 | 0.0196 | 0.5733 | 0.0195 |
| f1 | 0.7182 | 0.0151 | 0.7212 | 0.0154 | 0.7401 | 0.0142 | 0.7347 | 0.0150 | |
| hellaswag | acc | 0.6186 | 0.0048 | 0.6194 | 0.0048 | 0.6238 | 0.0048 | 0.6240 | 0.0048 |
| acc_norm | 0.8084 | 0.0039 | 0.8132 | 0.0039 | 0.8140 | 0.0039 | 0.8125 | 0.0039 | |
| ifeval | inst_level_loose_acc | 0.5456 | N/A | 0.5683 | N/A | 0.5564 | N/A | 0.5767 | N/A |
| inst_level_strict_acc | 0.4712 | N/A | 0.5096 | N/A | 0.5012 | N/A | 0.5108 | N/A | |
| prompt_level_loose_acc | 0.4603 | 0.0214 | 0.4824 | 0.0215 | 0.4621 | 0.0215 | 0.4917 | 0.0215 | |
| prompt_level_strict_acc | 0.3808 | 0.0209 | 0.4122 | 0.0212 | 0.3993 | 0.0211 | 0.4196 | 0.0212 | |
| lambada_openai | acc | 0.7584 | 0.0060 | 0.7687 | 0.0059 | 0.7619 | 0.0059 | 0.7726 | 0.0058 |
| perplexity | 3.0229 | 0.0563 | 2.9546 | 0.0541 | 2.9591 | 0.0556 | 2.9233 | 0.0542 | |
| lambada_openai_cloze | acc | 0.3122 | 0.0065 | 0.2983 | 0.0064 | 0.3315 | 0.0066 | 0.3317 | 0.0066 |
| perplexity | 29.8427 | 0.7625 | 30.0355 | 0.7780 | 26.6970 | 0.6838 | 26.6948 | 0.6858 | |
| lambada_standard | acc | 0.6885 | 0.0065 | 0.6907 | 0.0064 | 0.6971 | 0.0064 | 0.6926 | 0.0064 |
| perplexity | 3.6401 | 0.0766 | 3.6600 | 0.0756 | 3.4930 | 0.0721 | 3.5514 | 0.0734 | |
| lambada_standard_cloze | acc | 0.2259 | 0.0058 | 0.2467 | 0.0060 | 0.2583 | 0.0061 | 0.2837 | 0.0063 |
| perplexity | 44.8440 | 1.1469 | 40.9925 | 1.0271 | 37.4110 | 0.9371 | 35.5615 | 0.8741 | |
| commonsense_qa | acc | 0.5774 | 0.0141 | 0.5921 | 0.0141 | 0.6061 | 0.0140 | 0.6208 | 0.0139 |
| mmlu | acc | 0.6325 | 0.0038 | 0.6364 | 0.0038 | 0.6434 | 0.0038 | 0.6454 | 0.0038 |
| acc | 0.5673 | 0.0067 | 0.5779 | 0.0067 | 0.5819 | 0.0067 | 0.5864 | 0.0067 | |
| acc | 0.7123 | 0.0078 | 0.7110 | 0.0078 | 0.7210 | 0.0078 | 0.7277 | 0.0077 | |
| acc | 0.7491 | 0.0076 | 0.7504 | 0.0076 | 0.7563 | 0.0076 | 0.7563 | 0.0076 | |
| acc | 0.5373 | 0.0085 | 0.5392 | 0.0085 | 0.5487 | 0.0084 | 0.5442 | 0.0085 | |
| openbookqa | acc | 0.3680 | 0.0216 | 0.3920 | 0.0219 | 0.4040 | 0.0220 | 0.4040 | 0.0220 |
| acc_norm | 0.4700 | 0.0223 | 0.4720 | 0.0223 | 0.4780 | 0.0224 | 0.4880 | 0.0224 | |
| winogrande | acc | 0.7672 | 0.0119 | 0.7553 | 0.0121 | 0.7514 | 0.0121 | 0.7545 | 0.0121 |
| triviaqa | exact_match | 0.5953 | 0.0037 | 0.6011 | 0.0037 | 0.6105 | 0.0036 | 0.6184 | 0.0036 |
| truthfulqa_mc1 | acc | 0.3782 | 0.0170 | 0.3831 | 0.0170 | 0.3892 | 0.0171 | 0.3917 | 0.0171 |
| truthfulqa_mc2 | acc | 0.5284 | 0.0150 | 0.5367 | 0.0149 | 0.5390 | 0.0151 | 0.5475 | 0.0150 |