Nemotron-Cascade-14B-Thinking (Modified Chat Template)

This is a modified version of nvidia/Nemotron-Cascade-14B-Thinking with a fixed chat template for RL training compatibility.

Changes

The original Nemotron chat template strips <think> sections from messages when processing inputs. This violates the increasing context requirement for multi-turn RL training (see verifiers documentation).

This version uses a simplified chat template (based on willcb/Qwen3-14B) that preserves thinking tokens in the conversation history, making it suitable for RL training with tools like verifiers.

Model Details

  • Base Model: nvidia/Nemotron-Cascade-14B-Thinking
  • Architecture: Qwen3ForCausalLM
  • Model Type: qwen3
  • Parameters: 14B

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("bdsaglam/Nemotron-Cascade-14B-Thinking")
tokenizer = AutoTokenizer.from_pretrained("bdsaglam/Nemotron-Cascade-14B-Thinking")

License

This model inherits the NVIDIA Open Model License from the base model.

Downloads last month
-
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bdsaglam/Nemotron-Cascade-14B-Thinking

Finetuned
(6)
this model