Running Claude Code with Local Models via Ollama (NVIDIA's nemotron-3-nano)

Published January 20, 2026

TL;DR: You can run Claude Code with local open-source models instead of the Claude API. This guide walks through setting up NVIDIA's Nemotron-3-Nano with Ollama for a fully offline, privacy-preserving experience.

Why Bother?

Ollama now provides an Anthropic-compatible API, which means you can point Claude Code at local models. Running locally gives you:

Complete privacy: Your code never leaves your machine (not even metadata)
No API costs: Inference is free (well, aside from electricity)
Offline capability: Work without internet (perfect for planes, trains, or automobiles (if you're the passenger!))

Prerequisites

16GB+ RAM (32GB+ recommended)
Node.js 18+
Optional: NVIDIA GPU with CUDA for faster inference

Step 1: Install Ollama

Download from ollama.com or:

curl -fsSL https://ollama.ai/install.sh | sh

Verify it's running:

curl http://localhost:11434
# Should return: "Ollama is running"

Step 2: Pull Nemotron-3-Nano

Nemotron-3-Nano is NVIDIA's 30B parameter model with a hybrid Mixture-of-Experts architecture.

ollama pull nemotron-3-nano

Test it works:

ollama run nemotron-3-nano "Hello"

Step 3: Install Claude Code

curl -fsSL https://claude.ai/install.sh | bash

Step 4: Configure Environment Variables

This is the key step (and where most tutorials fall short). Add to your ~/.bashrc or ~/.zshrc:

export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

Then source ~/.bashrc (or restart your terminal).

Alternatively, create ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:11434",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
  }
}

Step 5: Run It

claude --model nemotron-3-nano

To verify it's truly local: disconnect from the internet and run a prompt. If you get a response (even a slow one), you're running fully offline.

Performance Expectations

Local inference is slower than API calls (sometimes dramatically so). On an M1 Max MacBook Pro with 64GB RAM, a simple "Hi" took about 55 seconds, and listing files in the current directory took around 2 minutes. A desktop with a beefy GPU will be faster, but temper your expectations on Apple Silicon.

Switching Back to Claude API

When you want the real thing again:

unset ANTHROPIC_BASE_URL
unset ANTHROPIC_AUTH_TOKEN

Or remove the env section from ~/.claude/settings.json.

Troubleshooting

Connection refused: Ensure Ollama is running (ollama serve)

Model not found: Check ollama list and use the exact model name

Slow responses: Expected on CPU (go make some coffee)

Resources

Models mentioned in this article 1

Community

BenTouss

Jan 27

Great explanation! I can create my own claude cowork now.. All running on local

john8data

Feb 4

Thanks for the helpful article. Please consider adding this to Step 4, or the Troubleshooting section:

Known issue for fresh install, claude doesn’t honor ~/.claude/settings.json:
https://github.com/anthropics/claude-code/issues/13827

Workarounds:

Temporarily set the environment variables in your shell before running claude:
export ANTHROPIC_BASE_URL="https://your-litellm-endpoint"
export ANTHROPIC_AUTH_TOKEN="your-token"
claude
Manually edit ~/.claude.json after the first run, and set "hasCompletedOnboarding": true to bypass the login prompt.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote