Instructions to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Mungert/OpenCodeReasoning-Nemotron-32B-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Mungert/OpenCodeReasoning-Nemotron-32B-GGUF", dtype="auto")

llama-cpp-python

How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Mungert/OpenCodeReasoning-Nemotron-32B-GGUF",
	filename="OpenCodeReasoning-Nemotron-32B-bf16_q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Mungert/OpenCodeReasoning-Nemotron-32B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Mungert/OpenCodeReasoning-Nemotron-32B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

SGLang

How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Mungert/OpenCodeReasoning-Nemotron-32B-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Mungert/OpenCodeReasoning-Nemotron-32B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Mungert/OpenCodeReasoning-Nemotron-32B-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Mungert/OpenCodeReasoning-Nemotron-32B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with Ollama:
```
ollama run hf.co/Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M
```

Unsloth Studio

How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Mungert/OpenCodeReasoning-Nemotron-32B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Mungert/OpenCodeReasoning-Nemotron-32B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Mungert/OpenCodeReasoning-Nemotron-32B-GGUF to start chatting

How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with Docker Model Runner:
```
docker model run hf.co/Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M
```

Lemonade

How to use Mungert/OpenCodeReasoning-Nemotron-32B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Mungert/OpenCodeReasoning-Nemotron-32B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.OpenCodeReasoning-Nemotron-32B-GGUF-Q4_K_M

List all available models

lemonade list

Mungert commited on Sep 24, 2025

Commit

531511c

verified ·

0 Parent(s):

Super-squash history to reclaim storage

Browse files

Files changed (27) hide show

.gitattributes +73 -0
OpenCodeReasoning-Nemotron-32B-bf16_q8_0.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-f16_q8_0.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-iq1_m.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-iq1_s.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-iq2_xs.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-iq2_xxs.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q2_k_m.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q2_k_s.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q3_k_m.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q3_k_s.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q4_0.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q4_1.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q4_k_m.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q4_k_s.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q5_0.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q5_1.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q5_k_m.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q5_k_s.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q6_k_m.gguf +3 -0
OpenCodeReasoning-Nemotron-32B-q8_0.gguf +3 -0
OpenCodeReasoning-Nemotron-32B.imatrix +3 -0
README.md +202 -0
bf16/OpenCodeReasoning-Nemotron-32B-bf16-00001-of-00002.gguf +3 -0
bf16/OpenCodeReasoning-Nemotron-32B-bf16-00002-of-00002.gguf +3 -0
f16/OpenCodeReasoning-Nemotron-32B-f16-00001-of-00002.gguf +3 -0
f16/OpenCodeReasoning-Nemotron-32B-f16-00002-of-00002.gguf +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,73 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+f16/OpenCodeReasoning-Nemotron-32B-f16-00001-of-00002.gguf filter=lfs diff=lfs merge=lfs -text
+f16/OpenCodeReasoning-Nemotron-32B-f16-00002-of-00002.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-f16_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-bf16_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-f16_q6_k.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-bf16_q6_k.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-f16_q4_k.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-bf16_q4_k.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q2_k_l.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q3_k_l.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q4_k_l.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q5_k_l.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q6_k_l.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q2_k_s.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q3_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q3_k_s.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q4_k_s.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q5_k_s.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q6_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q4_1.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q4_0_l.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q4_1_l.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q5_0.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q5_1.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q5_0_l.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q5_1_l.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-iq1_s.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-iq1_m.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-iq2_xs.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-iq2_xxs.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B.imatrix filter=lfs diff=lfs merge=lfs -text
+bf16/OpenCodeReasoning-Nemotron-32B-bf16-00001-of-00002.gguf filter=lfs diff=lfs merge=lfs -text
+bf16/OpenCodeReasoning-Nemotron-32B-bf16-00002-of-00002.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeReasoning-Nemotron-32B-q2_k_m.gguf filter=lfs diff=lfs merge=lfs -text

OpenCodeReasoning-Nemotron-32B-bf16_q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f432cc2ef44f3bb1b81cc7bd6f688a272fa6226cf2651f9dfcc512b6f6655e20
+size 46661602304

OpenCodeReasoning-Nemotron-32B-f16_q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb9f2cde469f8dc33c50ca9d05b93c3d256868c7c8c43d65fc1911893ccae0d5
+size 36280699904

OpenCodeReasoning-Nemotron-32B-iq1_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e996d801e0037195bea4623b9bcc43b1b0f1c3161dcef3dbe644d16e9d4033bf
+size 9742306592

OpenCodeReasoning-Nemotron-32B-iq1_s.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a921ec5f24446ba4d39d97b8b4952eac93bc1912f966e75c7ca1da7c7de836ef
+size 8992492832

OpenCodeReasoning-Nemotron-32B-iq2_xs.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa663d4683f1a1ec18cadda2925f82d0b2e3c9a7357d5fefb9200e96150c4a34
+size 11016326432

OpenCodeReasoning-Nemotron-32B-iq2_xxs.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d236194cde966eb4b88f71035dad3ed4acc578abb279995f0b8c952577f179ab
+size 10124954912

OpenCodeReasoning-Nemotron-32B-q2_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:65534d66507efcf0905478eb98f2778c3a4a49a97a26d21ef6f3cd2bd920a349
+size 12573011232

OpenCodeReasoning-Nemotron-32B-q2_k_s.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7842c97c1639f8a320fbe24ded3c2c63e577c51b3edc8cfa2eeb0c0041168e9a
+size 11547413792

OpenCodeReasoning-Nemotron-32B-q3_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4ca41bdfd0ef7f43c3adf17dd177563062163e7c64c0841913772a4d38e0d5c0
+size 16091393312

OpenCodeReasoning-Nemotron-32B-q3_k_s.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d5a266830d98030ef046fc9306df92d49b9347bd7d84a7bea758e3dbdd4269c
+size 14735658272

OpenCodeReasoning-Nemotron-32B-q4_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:031d643e792d91dc3bd57e5d828624e4c94252b794d30db37bd2209a3565baeb
+size 18439507232

OpenCodeReasoning-Nemotron-32B-q4_1.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:43c290d9981ca1cc0c0b418ef838fccc6e7c29b1f40eaf6c49d46ff62eac0291
+size 20487179552

OpenCodeReasoning-Nemotron-32B-q4_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0424f84d4c64113eb27e14dd899a69cff903624f9dcd1afba94978ecc10d2cdd
+size 19824979232

OpenCodeReasoning-Nemotron-32B-q4_k_s.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a4cdc67db6e4d9e52d9ab9c956c581dc54a646d3c2a526914e15c0c5fdb81635
+size 19161427232

OpenCodeReasoning-Nemotron-32B-q5_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a0efcd01db274e9e45972b47196711a7895d3a5d7a717fd4901be66953a8f13e
+size 22534851872

OpenCodeReasoning-Nemotron-32B-q5_1.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e581a16c7f0a836dbed52090703ac2ac24c1fb27e020dff6df7d054d2e16c114
+size 24582524192

OpenCodeReasoning-Nemotron-32B-q5_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4a7c51904a4c6b978654e7b0688b5c8c23b2f08eb16ec842b69cb35b3597404a
+size 23414304032

OpenCodeReasoning-Nemotron-32B-q5_k_s.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a1157919e054d63d5a2ab406cfa4ba2bf42445f89645f6b16b239872977f208e
+size 22741658912

OpenCodeReasoning-Nemotron-32B-q6_k_m.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b6da985756d6449568025ed8f694af9ba16b848b60f56e61ae251b1f2e463f5c
+size 26886155552

OpenCodeReasoning-Nemotron-32B-q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d2ac9e023e6c54f4fc6890678bacbe826dabc358925902208957e02b3459cc28
+size 34820885504

OpenCodeReasoning-Nemotron-32B.imatrix ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:81195c1a872922ad45359cfbaa038d9af32e74c78bad6b8e544a6ccd37a2cad0
+size 14957098

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model:
+- Qwen/Qwen2.5-32B-Instruct
+datasets:
+- nvidia/OpenCodeReasoning
+language:
+- en
+library_name: transformers
+license: apache-2.0
+tags:
+- nvidia
+- code
+pipeline_tag: text-generation
+---
+# OpenCodeReasoning-Nemotron-32B Overview
+## Description: <br>
+OpenCodeReasoning-Nemotron-32B is a large language model (LLM) which is a derivative of Qwen2.5-32B-Instruct (AKA the reference model). It is a reasoning model that is post-trained for reasoning for code generation. The model supports a context length of 32K tokens. <br>
+This model is ready for commercial/non-commercial use. <br>
+![Evaluation Results](./results.png)
+## Results from [OpenCodeReasoning](https://arxiv.org/abs/2504.01943)
+Below results are the average of **64 evaluations** on each benchmark.
+| Model                  | LiveCodeBench Avg. | CodeContest All |
+|------------------------|--------------------|-----------------|
+| DeepSeek-R1            | 65.6               | 26.2            |
+| QwQ-32B                | 61.3               | 20.2            |
+|                        |                    |                 |
+| **Distilled 7B+ Models** |                    |                 |
+|                        |                    |                 |
+| Bespoke-Stratos-7B     | 14.7               | 2.0             |
+| OpenThinker-7B         | 25.5               | 5.0             |
+| R1-Distill-Qwen-7B     | 38.0               | 11.1            |
+| OlympicCoder-7B        | 40.9               | 10.6            |
+| **OCR-Qwen-7B** | **48.5** | **16.3** |
+| **OCR-Qwen-7B-Instruct** | **51.3** | **18.1** |
+|                        |                    |                 |
+| **Distilled 14B+ Models**|                    |                 |
+|                        |                    |                 |
+| R1-Distill-Qwen-14B    | 51.3               | 17.6            |
+| **OCR-Qwen-14B** | **57.7** | **22.6** |
+| **OCR-Qwen-14B-Instruct**| **59.4** | **23.6** |
+|                        |                    |                 |
+| **Distilled 32B+ Models**|                    |                 |
+|                        |                    |                 |
+| Bespoke-Stratos-32B    | 30.1               | 6.3             |
+| OpenThinker-32B        | 54.1               | 16.4            |
+| R1-Distill-Qwen-32B    | 58.1               | 18.3            |
+| OlympicCoder-32B       | 57.4               | 18.0            |
+| **OCR-Qwen-32B** | **61.8** | **24.6** |
+| **OCR-Qwen-32B-Instruct**| **61.7** | **24.4** |
+## Reproducing our results
+* [Models](https://huggingface.co/collections/nvidia/opencodereasoning-2-68168f37cd7c6beb1e3f92e7)
+* [Dataset](https://huggingface.co/datasets/nvidia/OpenCodeReasoning)
+* [Paper](https://arxiv.org/abs/2504.01943)
+## How to use the models?
+To run inference on coding problems:
+````python
+import transformers
+import torch
+model_id = "nvidia/OpenCodeReasoning-Nemotron-32B"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device_map="auto",
+)
+prompt = """You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.
+Please use python programming language only.
+You must use ```python for just the final solution code block with the following format:
+```python
+# Your code here
+```
+{user}
+"""
+messages = [
+    {
+        "role": "user",
+        "content": prompt.format(user="Write a program to calculate the sum of the first $N$ fibonacci numbers")},
+]
+outputs = pipeline(
+    messages,
+    max_new_tokens=32768,
+)
+print(outputs[0]["generated_text"][-1]['content'])
+````
+## Citation
+If you find the data useful, please cite:
+```
+@article{ahmad2025opencodereasoning,
+      title={OpenCodeReasoning: Advancing Data Distillation for Competitive Coding},
+      author={Wasi Uddin Ahmad, Sean Narenthiran, Somshubra Majumdar, Aleksander Ficek, Siddhartha Jain, Jocelyn Huang, Vahid Noroozi, Boris Ginsburg},
+      year={2025},
+      eprint={2504.01943},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2504.01943},
+}
+```
+## Additional Information
+## Model Architecture: <br>
+Architecture Type: Dense decoder-only Transformer model
+Network Architecture: Qwen-32B-Instruct
+<br>
+**This model was developed based on Qwen2.5-32B-Instruct and has 32B model parameters. <br>**
+**OpenCodeReasoning-Nemotron-32B was developed based on Qwen2.5-32B-Instruct and has 32B model parameters. <br>**
+## Input: <br>
+**Input Type(s):** Text <br>
+**Input Format(s):** String <br>
+**Input Parameters:** One-Dimensional (1D) <br>
+**Other Properties Related to Input:** Context length up to 32,768 tokens <br>
+## Output: <br>
+**Output Type(s):** Text <br>
+**Output Format:** String <br>
+**Output Parameters:** One-Dimensional (1D) <br>
+**Other Properties Related to Output:** Context length up to 32,768 tokens <br>
+Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. <br>
+## Software Integration : <br>
+* Runtime Engine: NeMo 2.3.0 <br>
+* Recommended Hardware Microarchitecture Compatibility: <br>
+NVIDIA Ampere <br>
+NVIDIA Hopper <br>
+* Preferred/Supported Operating System(s): Linux <br>
+## Model Version(s):
+1.0 (4/25/2025)  <br>
+OpenCodeReasoning-Nemotron-7B<br>
+OpenCodeReasoning-Nemotron-14B<br>
+OpenCodeReasoning-Nemotron-32B<br>
+OpenCodeReasoning-Nemotron-32B-IOI<br>
+# Training and Evaluation Datasets: <br>
+## Training Dataset:
+The training corpus for OpenCodeReasoning-Nemotron-32B is [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) dataset, which is composed of competitive programming questions and DeepSeek-R1 generated responses.
+Data Collection Method: Hybrid: Automated, Human, Synthetic <br>
+Labeling Method: Hybrid: Automated, Human, Synthetic <br>
+Properties: 736k samples from OpenCodeReasoning (https://huggingface.co/datasets/nvidia/OpenCodeReasoning)
+## Evaluation Dataset:
+We used the datasets listed in the next section to evaluate OpenCodeReasoning-Nemotron-32B. <br>
+**Data Collection Method: Hybrid: Automated, Human, Synthetic <br>**
+**Labeling Method: Hybrid: Automated, Human, Synthetic <br>**
+### License/Terms of Use: <br>
+GOVERNING TERMS: Use of this model is governed by [Apache 2.0](https://huggingface.co/nvidia/OpenCode-Nemotron-2-7B/blob/main/LICENSE).
+### Deployment Geography:
+Global<br>
+### Use Case: <br>
+This model is intended for developers and researchers building LLMs. <br>
+### Release Date:  <br>
+Huggingface [04/25/2025] via https://huggingface.co/nvidia/OpenCodeReasoning-Nemotron-32B/ <br>
+## Reference(s):
+[2504.01943] OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
+<br>
+## Inference:
+**Engine:** vLLM <br>
+**Test Hardware** NVIDIA H100-80GB <br>
+## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+Please report security vulnerabilities or NVIDIA AI Concerns here.

bf16/OpenCodeReasoning-Nemotron-32B-bf16-00001-of-00002.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05e5f6c1413c33f0c42aad1b5321d99478281085d2b5638e53c11d9554c6a11b
+size 45902462976

bf16/OpenCodeReasoning-Nemotron-32B-bf16-00002-of-00002.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:76c9405f5682d2eb9ecda68aa149d3c92d57f7162d200f2c922d70b90e313cdc
+size 19633507328

f16/OpenCodeReasoning-Nemotron-32B-f16-00001-of-00002.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:72eaa4877d3015fcce0c0e384b336727656fa4d5b8983f924ed15df89f437247
+size 45902462976

f16/OpenCodeReasoning-Nemotron-32B-f16-00002-of-00002.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd2577f50faa451c7ee1f826d6ae8cd0e9652acdabb90fa55c0de7b8df9d4933
+size 19633507328