Running Claude Code with Local Models via Ollama (NVIDIA's nemotron-3-nano)
TL;DR: You can run Claude Code with local open-source models instead of the Claude API. This guide walks through setting up NVIDIA's Nemotron-3-Nano with Ollama for a fully offline, privacy-preserving experience.
Why Bother?
Ollama now provides an Anthropic-compatible API, which means you can point Claude Code at local models. Running locally gives you:
- Complete privacy: Your code never leaves your machine (not even metadata)
- No API costs: Inference is free (well, aside from electricity)
- Offline capability: Work without internet (perfect for planes, trains, or automobiles (if you're the passenger!))
Prerequisites
- 16GB+ RAM (32GB+ recommended)
- Node.js 18+
- Optional: NVIDIA GPU with CUDA for faster inference
Step 1: Install Ollama
Download from ollama.com or:
curl -fsSL https://ollama.ai/install.sh | sh
Verify it's running:
curl http://localhost:11434
# Should return: "Ollama is running"
Step 2: Pull Nemotron-3-Nano
Nemotron-3-Nano is NVIDIA's 30B parameter model with a hybrid Mixture-of-Experts architecture.
ollama pull nemotron-3-nano
Test it works:
ollama run nemotron-3-nano "Hello"
Step 3: Install Claude Code
curl -fsSL https://claude.ai/install.sh | bash
Step 4: Configure Environment Variables
This is the key step (and where most tutorials fall short). Add to your ~/.bashrc or ~/.zshrc:
export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
Then source ~/.bashrc (or restart your terminal).
Alternatively, create ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:11434",
"ANTHROPIC_AUTH_TOKEN": "ollama",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
}
}
Step 5: Run It
claude --model nemotron-3-nano
To verify it's truly local: disconnect from the internet and run a prompt. If you get a response (even a slow one), you're running fully offline.
Performance Expectations
Local inference is slower than API calls (sometimes dramatically so). On an M1 Max MacBook Pro with 64GB RAM, a simple "Hi" took about 55 seconds, and listing files in the current directory took around 2 minutes. A desktop with a beefy GPU will be faster, but temper your expectations on Apple Silicon.
Switching Back to Claude API
When you want the real thing again:
unset ANTHROPIC_BASE_URL
unset ANTHROPIC_AUTH_TOKEN
Or remove the env section from ~/.claude/settings.json.
Troubleshooting
Connection refused: Ensure Ollama is running (ollama serve)
Model not found: Check ollama list and use the exact model name
Slow responses: Expected on CPU (go make some coffee)