# Complete Ollama OpenELM Setup Guide

This guide provides complete instructions to set up a local Ollama instance with Apple's OpenELM model and use it via OpenAI/Anthropic compatible APIs.

## Table of Contents

1. [Prerequisites](#prerequisites)
2. [Quick Start (One Command)](#quick-start-one-command)
3. [Manual Setup](#manual-setup)
4. [Testing](#testing)
5. [API Usage](#api-usage)
6. [Docker Compose Setup](#docker-compose-setup)
7. [Troubleshooting](#troubleshooting)

---

## Prerequisites

### Required Software

- **Docker**: [Install Docker](https://docs.docker.com/get-docker/)
- **NVIDIA Driver**: For GPU support (check with `nvidia-smi`)
- **NVIDIA Container Toolkit**: [Install Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)

### Verify GPU Access

```bash
# Check NVIDIA driver
nvidia-smi

# Verify Docker can see GPU
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```

---

## Quick Start (One Command)

### For Linux/macOS

```bash
# Download and run the complete setup script
curl -O https://raw.githubusercontent.com/your-repo/setup_ollama_openelm.sh
chmod +x setup_ollama_openelm.sh
./setup_ollama_openelm.sh
```

### For Windows (PowerShell)

```powershell
# Run each command manually (see Manual Setup below)
```

---

## Manual Setup

### Step 1: Start Ollama Container

```bash
# Start Ollama with GPU support
docker run -d \
    --name ollama \
    -v ollama:/root/.ollama \
    -p 127.0.0.1:11434:11434 \
    --gpus all \
    ollama/ollama

# Verify it's running
docker ps | grep ollama
```

### Step 2: Pull OpenELM Model

```bash
# Pull the 3B parameter model (2.1 GB)
docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct

# Verify installation
docker exec ollama ollama list
```

Expected output:
```
NAME                             	ID          	SIZE  	MODIFIED
apple/OpenELM-3B-Instruct:latest	abc123...	2.1 GB	About a minute ago
```

### Step 3: Install Python Dependencies

```bash
# Create virtual environment (optional but recommended)
python3 -m venv ollama_env
source ollama_env/bin/activate  # Linux/macOS
# or: .\ollama_env\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements_local.txt
```

### Step 4: Run the API Server

```bash
# Start the FastAPI server (runs on port 8001)
python app_ollama.py
```

Or using uvicorn directly:
```bash
uvicorn app_ollama:app --host 0.0.0.0 --port 8001
```

---

## Testing

### Test 1: Verify Ollama is Running

```bash
# Check Ollama status
curl http://127.0.0.1:11434/api/tags

# Should return something like:
# {"models":[{"name":"apple/OpenELM-3B-Instruct"...}]}
```

### Test 2: Quick Generation Test

```bash
# Test basic generation
curl http://127.0.0.1:11434/api/generate \
    -d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Say hello!", "stream": false}'
```

### Test 3: Run Test Scripts

```bash
# Make test scripts executable
chmod +x test_curl.sh

# Run curl tests
./test_curl.sh

# Run Python tests (requires openai package)
python test_python.py
```

### Test 4: Test the Full API Server

```bash
# Test OpenAI format endpoint
curl -X POST http://127.0.0.1:8001/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "apple/OpenELM-3B-Instruct",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 100
    }'

# Test Anthropic format endpoint
curl -X POST http://127.0.0.1:8001/v1/messages \
    -H "Content-Type: application/json" \
    -d '{
        "model": "apple/OpenELM-3B-Instruct",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 100
    }'
```

---

## API Usage

### Using OpenAI SDK (Python)

```python
from openai import OpenAI

# Connect to local Ollama
client = OpenAI(
    base_url="http://127.0.0.1:11434/v1",
    api_key="ollama",  # Any string works
)

# Basic usage
response = client.chat.completions.create(
    model="apple/OpenELM-3B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing simply."}
    ],
    max_tokens=200,
    temperature=0.7
)

print(response.choices[0].message.content)
```

### Using Anthropic SDK (Python)

```python
import anthropic

# Connect to local Ollama (via API server)
client = anthropic.Anthropic(
    base_url="http://127.0.0.1:8001/v1",
    api_key="ollama",  # Any string works
)

# Basic usage
message = client.messages.create(
    model="apple/OpenELM-3B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=100
)

print(message.content[0].text)
```

### Using cURL

```bash
# Basic generation
curl http://127.0.0.1:11434/api/generate \
    -d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Your prompt here"}'

# Chat completion (OpenAI format)
curl http://127.0.0.1:8001/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "apple/OpenELM-3B-Instruct",
        "messages": [{"role": "user", "content": "Your prompt here"}],
        "max_tokens": 100
    }'
```

---

## Docker Compose Setup

For easier deployment, use Docker Compose:

### Step 1: Start All Services

```bash
# Start Ollama and API server together
docker-compose up -d

# View logs
docker-compose logs -f
```

### Step 2: Access the Services

- **Ollama API**: http://localhost:11434
- **FastAPI Server**: http://localhost:8001

### Step 3: Stop Services

```bash
docker-compose down
```

---

## Troubleshooting

### Issue: GPU Not Detected

**Error**: `Error response from daemon: could not select device driver "" with capabilities: [[gpu]]`

**Solution**:
```bash
# Install NVIDIA Container Toolkit
distribution=$(. /etc/ossa;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```

### Issue: Model Download Fails

**Error**: `Error: pull model manifest`

**Solution**:
```bash
# Check network connection
curl -I https://huggingface.co

# Retry with verbose output
docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct --verbose
```

### Issue: API Server Can't Connect to Ollama

**Error**: `Connection refused` or `Ollama not responding`

**Solution**:
```bash
# Check if Ollama is running
docker ps | grep ollama

# Check Ollama logs
docker logs ollama

# Restart Ollama
docker restart ollama
```

### Issue: Out of Memory

**Error**: `CUDA out of memory`

**Solution**:
- Reduce `max_tokens` parameter
- Use smaller batch sizes
- Restart the Ollama container to free memory

### Issue: Port Already in Use

**Error**: `Address already in use`

**Solution**:
```bash
# Find the process using the port
lsof -i :11434  # Linux/macOS
netstat -ano | findstr :11434  # Windows

# Kill the process or use a different port
```

---

## File Structure

```
ollama-openelm/
├── setup_ollama_openelm.sh    # Complete setup script
├── app_ollama.py              # FastAPI server
├── requirements_local.txt     # Python dependencies
├── docker-compose.yml         # Docker Compose configuration
├── Dockerfile.api             # API server Docker image
├── test_python.py             # Python test script
├── test_curl.sh               # cURL test script
└── README.md                  # This file
```

---

## Environment Variables

### For API Server

| Variable | Default | Description |
|----------|---------|-------------|
| `OLLAMA_BASE_URL` | `http://127.0.0.1:11434` | Ollama server URL |
| `OLLAMA_MODEL` | `apple/OpenELM-3B-Instruct` | Model name |
| `PORT` | `8001` | API server port |

### For Ollama Container

| Variable | Description |
|----------|-------------|
| `OLLAMA_HOST` | Override the Ollama server URL |
| `OLLAMA_MODELS` | Path to model storage |

---

## Performance Tips

1. **GPU Memory**: The 3B model uses ~6GB GPU memory
2. **CPU Inference**: Falls back to CPU if no GPU available (slower)
3. **Batch Size**: Use `num_predict` to control output length
4. **Temperature**: Lower values (0.0-0.5) for more deterministic output

---

## Additional Resources

- [Ollama Documentation](https://ollama.com/)
- [OpenELM Model Card](https://huggingface.co/apple/OpenELM-3B-Instruct)
- [OpenAI API Compatibility](https://platform.openai.com/docs/api-reference)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)

---

## License

This setup is provided for educational and research purposes. The OpenELM models from Apple are released under their respective licenses.