Text Generation
Transformers
Safetensors
PyTorch
English
nvidia
reasoning
math
code
reinforcement learning
Instructions to use nvidia/AceReason-Nemotron-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/AceReason-Nemotron-14B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nvidia/AceReason-Nemotron-14B")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nvidia/AceReason-Nemotron-14B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nvidia/AceReason-Nemotron-14B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nvidia/AceReason-Nemotron-14B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/AceReason-Nemotron-14B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/nvidia/AceReason-Nemotron-14B
- SGLang
How to use nvidia/AceReason-Nemotron-14B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nvidia/AceReason-Nemotron-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/AceReason-Nemotron-14B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nvidia/AceReason-Nemotron-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/AceReason-Nemotron-14B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use nvidia/AceReason-Nemotron-14B with Docker Model Runner:
docker model run hf.co/nvidia/AceReason-Nemotron-14B
Commit History
Update README_EVALUATION.md 8212cdd verified
Update README.md 5d7ec87 verified
Update README_EVALUATION.md 497f344 verified
Update README.md 8cf140d verified
Update README.md fbd3400 verified
Upload README_EVALUATION.md 6e8acad verified
Upload evaluation.tar.gz 5f8704e verified
Update README.md a787777 verified
Update README.md eaaedcf verified
Update README.md a425abb verified
Update README.md 2867598 verified
Update README.md f95ea76 verified
Update README.md c6233d7 verified
Update README.md 3e0e632 verified
initial commit 728fcb3
Yang Chen commited on