Quantizations of https://huggingface.co/nvidia/OpenCodeReasoning-Nemotron-1.1-7B
Open source inference clients/UIs
Closed source inference clients/UIs
- LM Studio
- More will be added...
From original readme
base_model: - Qwen/Qwen2.5-7B-Instruct datasets: - nvidia/OpenCodeReasoning language: - en library_name: transformers tags: - nvidia - code pipeline_tag: text-generation
OpenCodeReasoning-Nemotron-1.1-7B Overview
Description:
OpenCodeReasoning-Nemotron-1.1-7B is a large language model (LLM) which is a derivative of Qwen2.5-7B-Instruct (AKA the reference model). It is a reasoning model that is post-trained for reasoning for code generation. The model supports a context length of 64k tokens.
This model is ready for commercial/non-commercial use.
Results
Below results are the average of 64 evaluations on LiveCodeBench (v5) [2408-2501].
| Model | Pass@1 |
|---|---|
| DeepSeek-R1-0528 | 73.4 |
| DeepSeek-R1 | 65.6 |
| QwQ-32B | 61.3 |
| Distilled 7B+ Models | |
| Bespoke-Stratos-7B | 14.7 |
| OpenThinker-7B | 25.5 |
| R1-Distill-Qwen-7B | 38.0 |
| OlympicCoder-7B | 40.9 |
| OpenCodeReasoning-Nemotron-7B | 51.3 |
| OpenCodeReasoning-Nemotron-1.1-7B | 55.5 |
| Distilled 14B+ Models | |
| R1-Distill-Qwen-14B | 51.3 |
| OpenCodeReasoning-Nemotron-14B | 59.4 |
| OpenCodeReasoning-Nemotron-1.1-14B | 65.9 |
| Distilled 32B+ Models | |
| Bespoke-Stratos-32B | 30.1 |
| OpenThinker-32B | 54.1 |
| R1-Distill-Qwen-32B | 58.1 |
| OlympicCoder-32B | 57.4 |
| OpenCodeReasoning-Nemotron-32B | 61.7 |
| OpenCodeReasoning-Nemotron-1.1-32B | 69.9 |
How to use the models?
To run inference on coding problems:
import transformers
import torch
model_id = "nvidia/OpenCodeReasoning-Nemotron-7B-v1.1"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
prompt = """You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.
Please use python programming language only.
You must use ```python for just the final solution code block with the following format:
```python
# Your code here
```
{user}
"""
messages = [
{
"role": "user",
"content": prompt.format(user="Write a program to calculate the sum of the first $N$ fibonacci numbers")
},
]
outputs = pipeline(
messages,
max_new_tokens=49152,
)
print(outputs[0]["generated_text"][-1]['content'])
- Downloads last month
- 20
Hardware compatibility
Log In
to view the estimation
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit