Instructions to use CreitinGameplays/Mistral-Nemo-12B-R1-v0.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CreitinGameplays/Mistral-Nemo-12B-R1-v0.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="CreitinGameplays/Mistral-Nemo-12B-R1-v0.2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CreitinGameplays/Mistral-Nemo-12B-R1-v0.2") model = AutoModelForCausalLM.from_pretrained("CreitinGameplays/Mistral-Nemo-12B-R1-v0.2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use CreitinGameplays/Mistral-Nemo-12B-R1-v0.2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CreitinGameplays/Mistral-Nemo-12B-R1-v0.2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CreitinGameplays/Mistral-Nemo-12B-R1-v0.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/CreitinGameplays/Mistral-Nemo-12B-R1-v0.2
- SGLang
How to use CreitinGameplays/Mistral-Nemo-12B-R1-v0.2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CreitinGameplays/Mistral-Nemo-12B-R1-v0.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CreitinGameplays/Mistral-Nemo-12B-R1-v0.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CreitinGameplays/Mistral-Nemo-12B-R1-v0.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CreitinGameplays/Mistral-Nemo-12B-R1-v0.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use CreitinGameplays/Mistral-Nemo-12B-R1-v0.2 with Docker Model Runner:
docker model run hf.co/CreitinGameplays/Mistral-Nemo-12B-R1-v0.2
#
#28
by CreitinGameplays - opened
- tokenizer_config.json +1 -1
tokenizer_config.json
CHANGED
|
@@ -8005,7 +8005,7 @@
|
|
| 8005 |
}
|
| 8006 |
},
|
| 8007 |
"bos_token": "<s>",
|
| 8008 |
-
"chat_template": "{%- set system_message = \"A user will ask you to solve a task. You should first draft your thinking process (inner monologue) until you have derived the final answer. Afterwards, write a self-contained summary of your thoughts (i.e. your summary should be succinct but contain all the critical steps you needed to reach the conclusion). You should use Markdown and Latex to format your response. Write both your thoughts and summary in the same language as the task posed by the user.\\n\\nYour thinking process must follow the template below:\\n<think>\\nYour thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate a correct answer.\\n</think>\\nHere, provide a concise summary that reflects your reasoning and presents a clear final answer to the user.\\
|
| 8009 |
"clean_up_tokenization_spaces": false,
|
| 8010 |
"eos_token": "</s>",
|
| 8011 |
"extra_special_tokens": {},
|
|
|
|
| 8005 |
}
|
| 8006 |
},
|
| 8007 |
"bos_token": "<s>",
|
| 8008 |
+
"chat_template": "{%- set system_message = \"A user will ask you to solve a task. You should first draft your thinking process (inner monologue) until you have derived the final answer. Afterwards, write a self-contained summary of your thoughts (i.e. your summary should be succinct but contain all the critical steps you needed to reach the conclusion). You should use Markdown and Latex to format your response. Write both your thoughts and summary in the same language as the task posed by the user.\\n\\nYour thinking process must follow the template below:\\n<think>\\nYour thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate a correct answer.\\n</think>\\nHere, provide a concise summary that reflects your reasoning and presents a clear final answer to the user.\\n\" %}\n{%- set system_message_two = \"\\nA user will ask you to solve a task. You should first draft your thinking process (inner monologue) until you have derived the final answer. Afterwards, write a self-contained summary of your thoughts (i.e. your summary should be succinct but contain all the critical steps you needed to reach the conclusion). You should use Markdown and Latex to format your response. Write both your thoughts and summary in the same language as the task posed by the user.\\n\\nYour thinking process must follow the template below:\\n<think>\\nYour thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate a correct answer.\\n</think>\\nHere, provide a concise summary that reflects your reasoning and presents a clear final answer to the user.\" %}\n{# Removed external system-prompt exception: ignore any custom system prompts #}\n{%- if messages[0][\"role\"] == \"system\" %} \n {%- set system_message = messages[0][\"content\"] + system_message_two + \"\\n\" %} \n {%- set loop_messages = messages[1:] %} \n{%- else %} \n {%- set loop_messages = messages %} \n{%- endif %} \n{%- if not tools is defined %} \n {%- set tools = none %} \n{%- endif %}\n{%- set user_messages = loop_messages | selectattr(\"role\", \"equalto\", \"user\") | list %} \n{#- This block checks for alternating user/assistant messages, skipping tool calling messages #} \n{%- set ns = namespace() %} \n{%- set ns.index = 0 %} \n{%- for message in loop_messages %} \n {%- if not (message.role == \"tool\" or message.role == \"tool_results\" or (message.tool_calls is defined and message.tool_calls is not none)) %} \n {%- if (message[\"role\"] == \"user\") != (ns.index % 2 == 0) %} \n {{- raise_exception(\"After the optional system message, conversation roles must alternate user/assistant/user/assistant...\") }} \n {%- endif %} \n {%- set ns.index = ns.index + 1 %} \n {%- endif %} \n{%- endfor %}\n{{- bos_token }} \n{%- for message in loop_messages %} \n {%- if message[\"role\"] == \"user\" %} \n {%- if tools is not none and (message == user_messages[-1]) %} \n {{- \"[AVAILABLE_TOOLS] [\" }} \n {%- for tool in tools %} \n {%- set tool = tool.function %} \n {{- '{\"type\": \"function\", \"function\": {' }} \n {%- for key, val in tool.items() if key != \"return\" %} \n {%- if val is string %} \n {{- '\"' + key + '\": \"' + val + '\"' }} \n {%- else %} \n {{- '\"' + key + '\": ' + val|tojson }} \n {%- endif %} \n {%- if not loop.last %} \n {{- \", \" }} \n {%- endif %} \n {%- endfor %} \n {{- \"}}\" }} \n {%- if not loop.last %} \n {{- \", \" }} \n {%- else %} \n {{- \"]\" }} \n {%- endif %} \n {%- endfor %} \n {{- \"[/AVAILABLE_TOOLS]\" }} \n {%- endif %} \n {%- if system_message is defined %} \n {{- \"[INST]\" + system_message + \"\\n\" + message[\"content\"] + \"[/INST]\\n\" }} \n {%- else %} \n {{- \"[INST]\" + message[\"content\"] + \"[/INST]\\n\" }} \n {%- endif %} \n {%- elif message.tool_calls is defined and message.tool_calls is not none %} \n {{- \"[TOOL_CALLS] [\" }} \n {%- for tool_call in message.tool_calls %} \n {%- set out = tool_call.function|tojson %} \n {{- out[:-1] }} \n {%- if not tool_call.id is defined or tool_call.id|length != 9 %} \n {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }} \n {%- endif %} \n {{- ', \"id\": \"' + tool_call.id + '\"}' }} \n {%- if not loop.last %} \n {{- \", \" }} \n {%- else %} \n {{- \"]\" + eos_token }} \n {%- endif %} \n {%- endfor %} \n {%- elif message[\"role\"] == \"assistant\" %} \n {{- \"\" + message[\"content\"]|trim + eos_token + \"\\n\"}} \n {%- elif message[\"role\"] == \"tool_results\" or message[\"role\"] == \"tool\" %} \n {%- if message.content is defined and message.content.content is defined %} \n {%- set content = message.content.content %} \n {%- else %} \n {%- set content = message.content %} \n {%- endif %} \n {{- '[TOOL_RESULTS] {\"content\": ' + content|string + \", 'call_id': '\" + message.tool_call_id + \"'}[/TOOL_RESULTS]\" }} \n {%- else %} \n {{- raise_exception(\"Only user and assistant roles are supported, with the exception of an initial optional system message!\") }} \n {%- endif %} \n{%- endfor %}",
|
| 8009 |
"clean_up_tokenization_spaces": false,
|
| 8010 |
"eos_token": "</s>",
|
| 8011 |
"extra_special_tokens": {},
|