Nemotron 3 Custom Tokenizers
Collection
3 items β’ Updated
A byte-identical mirror of the nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 tokenizer, hosted under the geodesic-research namespace for stable referencing in our reasoning / thinking SFT pipelines. No modifications.
The upstream NVIDIA tokenizer ships a chat template that supports <think>...</think> reasoning traces β this is the right tool for any model trained with reasoning data. We host an unmodified copy so:
geodesic-research/* path that won't shift if NVIDIA re-tags the upstream repo.geodesic-research/nemotron-instruct-tokenizer: one is the reasoning variant, the other strips think-tag injection. Both share the same encoder.nemotron-think-* vs nemotron-instruct-*) makes it explicit at the config level which behavior a training run expects.| File | sha256 | Source |
|---|---|---|
tokenizer.json |
623c34567aebb18582765289fbe23d901c62704d6518d71866e0e58db892b5b7 |
upstream Super 120B BF16, verbatim |
tokenizer_config.json |
matches upstream | upstream Super 120B BF16, verbatim |
special_tokens_map.json |
matches upstream | upstream Super 120B BF16, verbatim |
chat_template.jinja |
575fb74f54ed264df9047d0ecce3c98938aae953fb4f50356675706264cbb68a (10771 B) |
upstream Super 120B BF16, verbatim |
The tokenizer.json blob is also byte-identical to nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16, nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16, and nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-Base-BF16 β the entire Nemotron 3 family shares one encoder.
This is the upstream Nemotron 3 reasoning template. Default behavior:
enable_thinking defaults to True. The generation prompt ends at <|im_start|>assistant\n<think>\n to elicit a reasoning trace.<think></think> is auto-prepended to assistant messages whose content lacks think tags β so {"role": "assistant", "content": "42"} renders as <|im_start|>assistant\n<think></think>42<|im_end|>.reasoning_content field is supported. A message like {"role": "assistant", "reasoning_content": "let me check", "content": "42"} renders as <|im_start|>assistant\n<think>\nlet me check\n</think>\n42<|im_end|>.truncate_history_thinking=True by default. Older assistant turns have their reasoning traces stripped and replaced with <think></think> stubs, keeping only the final answer in context.low_effort=False by default. When set to True, appends \n\n{reasoning effort: low} to the last user message as a hint to the model to produce shorter chains of thought.from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("geodesic-research/nemotron-think-tokenizer")
msgs = [
{"role": "user", "content": "What's 2+2?"},
{"role": "assistant", "content": "4."}, # auto-prepended with <think></think>
{"role": "user", "content": "And 3+3?"},
]
print(tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True))
# Ends with: ...<|im_start|>assistant\n<think>\n
| Use case | Use this tokenizer? |
|---|---|
Reasoning / thinking SFT (training data has <think>...</think> traces) |
β Yes |
| Distillation from a reasoning teacher model | β Yes |
| Instruct SFT with no reasoning | β Use geodesic-research/nemotron-instruct-tokenizer instead β avoids stray </think> echoes at inference |
| Continued pretraining (CPT) on raw text | Either works β chat template is irrelevant for .bin/.idx data |
| Evaluating a reasoning-trained model with vLLM | β Yes |
| Evaluating an instruct (non-reasoning) model | β Use the instruct variant β this template emits <think>\n on the generation prompt, which mismatches an instruct model's training distribution |
tokenizer_class: PreTrainedTokenizerFast, no backend/is_local keys, no custom Python files. Compatible with transformers 4.57.x and 5.x.generate() works; <|im_end|> is registered as the eos_token.nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 (revision 49ad1f46ee9df444a0a3b8b63520faa1ca66324a)geodesic-research/nemotron-instruct-tokenizer β same encoder, chat template stripped of <think> injection