Text Generation
Transformers
Safetensors
mistral
nvfp4
conversational
text-generation-inference
8-bit precision
compressed-tensors
Instructions to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8") model = AutoModelForCausalLM.from_pretrained("DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8
- SGLang
How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8 with Docker Model Runner:
docker model run hf.co/DataSnake/Mistral-Nemo-Instruct-2407-NVFP4-FP8
| ,,Mistral-Nemo-Instruct-2407-NVFP4,,Mistral-Nemo-Instruct-2407-NVFP4-4over6,,Mistral-Nemo-Instruct-2407-NVFP4-FP8-RTN,,Mistral-Nemo-Instruct-2407-NVFP4-FP8, | |
| Task,Metric,Value,Stderr,Value,Stderr,Value,Stderr,Value,Stderr | |
| coqa,em,0.5392,0.0199,0.5498,0.0202,0.5683,0.0196,0.5733,0.0195 | |
| ,f1,0.7182,0.0151,0.7212,0.0154,0.7401,0.0142,0.7347,0.0150 | |
| hellaswag,acc,0.6186,0.0048,0.6194,0.0048,0.6238,0.0048,0.6240,0.0048 | |
| ,acc_norm,0.8084,0.0039,0.8132,0.0039,0.8140,0.0039,0.8125,0.0039 | |
| ifeval,inst_level_loose_acc,0.5456,N/A,0.5683,N/A,0.5564,N/A,0.5767,N/A | |
| ,inst_level_strict_acc,0.4712,N/A,0.5096,N/A,0.5012,N/A,0.5108,N/A | |
| ,prompt_level_loose_acc,0.4603,0.0214,0.4824,0.0215,0.4621,0.0215,0.4917,0.0215 | |
| ,prompt_level_strict_acc,0.3808,0.0209,0.4122,0.0212,0.3993,0.0211,0.4196,0.0212 | |
| lambada_openai,acc,0.7584,0.0060,0.7687,0.0059,0.7619,0.0059,0.7726,0.0058 | |
| ,perplexity,3.0229,0.0563,2.9546,0.0541,2.9591,0.0556,2.9233,0.0542 | |
| lambada_openai_cloze,acc,0.3122,0.0065,0.2983,0.0064,0.3315,0.0066,0.3317,0.0066 | |
| ,perplexity,29.8427,0.7625,30.0355,0.7780,26.6970,0.6838,26.6948,0.6858 | |
| lambada_standard,acc,0.6885,0.0065,0.6907,0.0064,0.6971,0.0064,0.6926,0.0064 | |
| ,perplexity,3.6401,0.0766,3.6600,0.0756,3.4930,0.0721,3.5514,0.0734 | |
| lambada_standard_cloze,acc,0.2259,0.0058,0.2467,0.0060,0.2583,0.0061,0.2837,0.0063 | |
| ,perplexity,44.8440,1.1469,40.9925,1.0271,37.4110,0.9371,35.5615,0.8741 | |
| commonsense_qa,acc,0.5774,0.0141,0.5921,0.0141,0.6061,0.0140,0.6208,0.0139 | |
| mmlu,acc,0.6325,0.0038,0.6364,0.0038,0.6434,0.0038,0.6454,0.0038 | |
| ,acc,0.5673,0.0067,0.5779,0.0067,0.5819,0.0067,0.5864,0.0067 | |
| ,acc,0.7123,0.0078,0.7110,0.0078,0.7210,0.0078,0.7277,0.0077 | |
| ,acc,0.7491,0.0076,0.7504,0.0076,0.7563,0.0076,0.7563,0.0076 | |
| ,acc,0.5373,0.0085,0.5392,0.0085,0.5487,0.0084,0.5442,0.0085 | |
| openbookqa,acc,0.3680,0.0216,0.3920,0.0219,0.4040,0.0220,0.4040,0.0220 | |
| ,acc_norm,0.4700,0.0223,0.4720,0.0223,0.4780,0.0224,0.4880,0.0224 | |
| winogrande,acc,0.7672,0.0119,0.7553,0.0121,0.7514,0.0121,0.7545,0.0121 | |
| triviaqa,exact_match,0.5953,0.0037,0.6011,0.0037,0.6105,0.0036,0.6184,0.0036 | |
| truthfulqa_mc1,acc,0.3782,0.0170,0.3831,0.0170,0.3892,0.0171,0.3917,0.0171 | |
| truthfulqa_mc2,acc,0.5284,0.0150,0.5367,0.0149,0.5390,0.0151,0.5475,0.0150 |