Nemotron-Cascade-14B-Thinking (Modified Chat Template)
This is a modified version of nvidia/Nemotron-Cascade-14B-Thinking with a fixed chat template for RL training compatibility.
Changes
The original Nemotron chat template strips <think> sections from messages when processing inputs. This violates the increasing context requirement for multi-turn RL training (see verifiers documentation).
This version uses a simplified chat template (based on willcb/Qwen3-14B) that preserves thinking tokens in the conversation history, making it suitable for RL training with tools like verifiers.
Model Details
- Base Model: nvidia/Nemotron-Cascade-14B-Thinking
- Architecture: Qwen3ForCausalLM
- Model Type: qwen3
- Parameters: 14B
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("bdsaglam/Nemotron-Cascade-14B-Thinking")
tokenizer = AutoTokenizer.from_pretrained("bdsaglam/Nemotron-Cascade-14B-Thinking")
License
This model inherits the NVIDIA Open Model License from the base model.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for bdsaglam/Nemotron-Cascade-14B-Thinking
Base model
nvidia/Nemotron-Cascade-14B-Thinking