Nemotron-Cascade-14B-Thinking (Modified Chat Template)

This is a modified version of nvidia/Nemotron-Cascade-14B-Thinking with a fixed chat template for RL training compatibility.

Changes

The original Nemotron chat template strips <think> sections from messages when processing inputs. This violates the increasing context requirement for multi-turn RL training (see verifiers documentation).

This version uses a simplified chat template (based on willcb/Qwen3-14B) that preserves thinking tokens in the conversation history, making it suitable for RL training with tools like verifiers.

Model Details

Base Model: nvidia/Nemotron-Cascade-14B-Thinking
Architecture: Qwen3ForCausalLM
Model Type: qwen3
Parameters: 14B

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("bdsaglam/Nemotron-Cascade-14B-Thinking")
tokenizer = AutoTokenizer.from_pretrained("bdsaglam/Nemotron-Cascade-14B-Thinking")

License

This model inherits the NVIDIA Open Model License from the base model.

Downloads last month: 3

Safetensors

Model size

15B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bdsaglam/Nemotron-Cascade-14B-Thinking

Base model

nvidia/Nemotron-Cascade-14B-Thinking

Finetuned

(6)

this model