Nemotron Think Tokenizer

A byte-identical mirror of the nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 tokenizer, hosted under the geodesic-research namespace for stable referencing in our reasoning / thinking SFT pipelines. No modifications.

Why mirror?

The upstream NVIDIA tokenizer ships a chat template that supports <think>...</think> reasoning traces β€” this is the right tool for any model trained with reasoning data. We host an unmodified copy so:

  1. Our training configs can reference a stable geodesic-research/* path that won't shift if NVIDIA re-tags the upstream repo.
  2. It pairs cleanly with geodesic-research/nemotron-instruct-tokenizer: one is the reasoning variant, the other strips think-tag injection. Both share the same encoder.
  3. A single naming convention (nemotron-think-* vs nemotron-instruct-*) makes it explicit at the config level which behavior a training run expects.

Contents

File sha256 Source
tokenizer.json 623c34567aebb18582765289fbe23d901c62704d6518d71866e0e58db892b5b7 upstream Super 120B BF16, verbatim
tokenizer_config.json matches upstream upstream Super 120B BF16, verbatim
special_tokens_map.json matches upstream upstream Super 120B BF16, verbatim
chat_template.jinja 575fb74f54ed264df9047d0ecce3c98938aae953fb4f50356675706264cbb68a (10771 B) upstream Super 120B BF16, verbatim

The tokenizer.json blob is also byte-identical to nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16, nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16, and nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-Base-BF16 β€” the entire Nemotron 3 family shares one encoder.

Chat template behavior

This is the upstream Nemotron 3 reasoning template. Default behavior:

  • enable_thinking defaults to True. The generation prompt ends at <|im_start|>assistant\n<think>\n to elicit a reasoning trace.
  • <think></think> is auto-prepended to assistant messages whose content lacks think tags β€” so {"role": "assistant", "content": "42"} renders as <|im_start|>assistant\n<think></think>42<|im_end|>.
  • reasoning_content field is supported. A message like {"role": "assistant", "reasoning_content": "let me check", "content": "42"} renders as <|im_start|>assistant\n<think>\nlet me check\n</think>\n42<|im_end|>.
  • truncate_history_thinking=True by default. Older assistant turns have their reasoning traces stripped and replaced with <think></think> stubs, keeping only the final answer in context.
  • low_effort=False by default. When set to True, appends \n\n{reasoning effort: low} to the last user message as a hint to the model to produce shorter chains of thought.
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("geodesic-research/nemotron-think-tokenizer")

msgs = [
    {"role": "user", "content": "What's 2+2?"},
    {"role": "assistant", "content": "4."},  # auto-prepended with <think></think>
    {"role": "user", "content": "And 3+3?"},
]
print(tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True))
# Ends with: ...<|im_start|>assistant\n<think>\n

When to use this tokenizer

Use case Use this tokenizer?
Reasoning / thinking SFT (training data has <think>...</think> traces) βœ… Yes
Distillation from a reasoning teacher model βœ… Yes
Instruct SFT with no reasoning ❌ Use geodesic-research/nemotron-instruct-tokenizer instead β€” avoids stray </think> echoes at inference
Continued pretraining (CPT) on raw text Either works β€” chat template is irrelevant for .bin/.idx data
Evaluating a reasoning-trained model with vLLM βœ… Yes
Evaluating an instruct (non-reasoning) model ❌ Use the instruct variant β€” this template emits <think>\n on the generation prompt, which mismatches an instruct model's training distribution

Compatibility

  • vLLM: works out of the box. tokenizer_class: PreTrainedTokenizerFast, no backend/is_local keys, no custom Python files. Compatible with transformers 4.57.x and 5.x.
  • HuggingFace generation: standard generate() works; <|im_end|> is registered as the eos_token.
  • Existing Nemotron checkpoints: vocab, merges, special tokens, and added-token IDs all match the entire Nemotron 3 family. Drop-in replacement at the encoder level.

Provenance

  • Source: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 (revision 49ad1f46ee9df444a0a3b8b63520faa1ca66324a)
  • Modifications: none
  • License: NVIDIA Open Model License (inherited from upstream)
  • Sibling: geodesic-research/nemotron-instruct-tokenizer β€” same encoder, chat template stripped of <think> injection
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for geodesic-research/nemotron-think-tokenizer

Finetuned
(15)
this model

Collection including geodesic-research/nemotron-think-tokenizer