Quantizations of https://huggingface.co/nvidia/OpenCodeReasoning-Nemotron-1.1-7B

Open source inference clients/UIs

Closed source inference clients/UIs


From original readme


base_model: - Qwen/Qwen2.5-7B-Instruct datasets: - nvidia/OpenCodeReasoning language: - en library_name: transformers tags: - nvidia - code pipeline_tag: text-generation

OpenCodeReasoning-Nemotron-1.1-7B Overview

Description:

OpenCodeReasoning-Nemotron-1.1-7B is a large language model (LLM) which is a derivative of Qwen2.5-7B-Instruct (AKA the reference model). It is a reasoning model that is post-trained for reasoning for code generation. The model supports a context length of 64k tokens.

This model is ready for commercial/non-commercial use.

Results

Below results are the average of 64 evaluations on LiveCodeBench (v5) [2408-2501].

Model Pass@1
DeepSeek-R1-0528 73.4
DeepSeek-R1 65.6
QwQ-32B 61.3
Distilled 7B+ Models
Bespoke-Stratos-7B 14.7
OpenThinker-7B 25.5
R1-Distill-Qwen-7B 38.0
OlympicCoder-7B 40.9
OpenCodeReasoning-Nemotron-7B 51.3
OpenCodeReasoning-Nemotron-1.1-7B 55.5
Distilled 14B+ Models
R1-Distill-Qwen-14B 51.3
OpenCodeReasoning-Nemotron-14B 59.4
OpenCodeReasoning-Nemotron-1.1-14B 65.9
Distilled 32B+ Models
Bespoke-Stratos-32B 30.1
OpenThinker-32B 54.1
R1-Distill-Qwen-32B 58.1
OlympicCoder-32B 57.4
OpenCodeReasoning-Nemotron-32B 61.7
OpenCodeReasoning-Nemotron-1.1-32B 69.9

How to use the models?

To run inference on coding problems:

import transformers
import torch

model_id = "nvidia/OpenCodeReasoning-Nemotron-7B-v1.1"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

prompt = """You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.

Please use python programming language only.

You must use ```python for just the final solution code block with the following format:
```python
# Your code here
```

{user}
"""

messages = [
    {
        "role": "user",
        "content": prompt.format(user="Write a program to calculate the sum of the first $N$ fibonacci numbers")
    },
]

outputs = pipeline(
    messages,
    max_new_tokens=49152,
)
print(outputs[0]["generated_text"][-1]['content'])
Downloads last month
20
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support