GooseReason-4B-Instruct โ€” MLX 16-bit (Full Precision)

This is the full-precision MLX version of nvidia/Nemotron-Research-GooseReason-4B-Instruct, converted for inference using MLX.

Model Overview

Attribute Value
Original Model nvidia/Nemotron-Research-GooseReason-4B-Instruct
Architecture Qwen3 (4.4B parameters)
Precision 16-bit (BFloat16, no quantization)
Base Model Qwen3-4B-Instruct-2507
Training Method RLVR (Reinforcement Learning with Verifiable Rewards)
Max Sequence Length 32,768 tokens
License CC-BY-NC-4.0

About GooseReason-4B

Nemotron-Research-GooseReason-4B-Instruct is NVIDIA's reasoning model built on Qwen3-4B-Instruct-2507 using RLVR. It achieves strong performance on math, code, and STEM reasoning benchmarks while remaining compact at 4B parameters.

Key Capabilities

  • Math Reasoning: Strong performance on AIME 2025 and AMC benchmarks
  • Code Generation: Competitive on LiveCodeBench and HumanEval
  • STEM: Broad science and technical reasoning capabilities
  • Thinking Mode: Uses extended thinking (<think> tags) for complex reasoning tasks

Benchmark Highlights

Benchmark GooseReason-4B
AIME 2025 (avg@64) 55.0
AMC (avg@64) 82.2
LiveCodeBench v6 (pass@1) 30.1
GPQA Diamond (avg@8) 47.5

Usage with MLX

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit")

messages = [
    {"role": "user", "content": "Solve: What is the sum of all prime numbers less than 20?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
print(response)

Enabling Extended Thinking

For complex reasoning tasks, the model uses <think> tags automatically. You can also prompt it explicitly:

messages = [
    {
        "role": "system",
        "content": "Think step by step before answering."
    },
    {
        "role": "user",
        "content": "Find all positive integers n such that n^2 + 2n + 2 is divisible by 7."
    }
]

All Available Formats

Acknowledgments

Downloads last month
84
Safetensors
Model size
4B params
Tensor type
F32
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DJLougen/Nemotron-Research-GooseReason-4B-Instruct-MLX-16bit