# Complete Ollama OpenELM Setup Guide This guide provides complete instructions to set up a local Ollama instance with Apple's OpenELM model and use it via OpenAI/Anthropic compatible APIs. ## Table of Contents 1. [Prerequisites](#prerequisites) 2. [Quick Start (One Command)](#quick-start-one-command) 3. [Manual Setup](#manual-setup) 4. [Testing](#testing) 5. [API Usage](#api-usage) 6. [Docker Compose Setup](#docker-compose-setup) 7. [Troubleshooting](#troubleshooting) --- ## Prerequisites ### Required Software - **Docker**: [Install Docker](https://docs.docker.com/get-docker/) - **NVIDIA Driver**: For GPU support (check with `nvidia-smi`) - **NVIDIA Container Toolkit**: [Install Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) ### Verify GPU Access ```bash # Check NVIDIA driver nvidia-smi # Verify Docker can see GPU docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi ``` --- ## Quick Start (One Command) ### For Linux/macOS ```bash # Download and run the complete setup script curl -O https://raw.githubusercontent.com/your-repo/setup_ollama_openelm.sh chmod +x setup_ollama_openelm.sh ./setup_ollama_openelm.sh ``` ### For Windows (PowerShell) ```powershell # Run each command manually (see Manual Setup below) ``` --- ## Manual Setup ### Step 1: Start Ollama Container ```bash # Start Ollama with GPU support docker run -d \ --name ollama \ -v ollama:/root/.ollama \ -p 127.0.0.1:11434:11434 \ --gpus all \ ollama/ollama # Verify it's running docker ps | grep ollama ``` ### Step 2: Pull OpenELM Model ```bash # Pull the 3B parameter model (2.1 GB) docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct # Verify installation docker exec ollama ollama list ``` Expected output: ``` NAME ID SIZE MODIFIED apple/OpenELM-3B-Instruct:latest abc123... 2.1 GB About a minute ago ``` ### Step 3: Install Python Dependencies ```bash # Create virtual environment (optional but recommended) python3 -m venv ollama_env source ollama_env/bin/activate # Linux/macOS # or: .\ollama_env\Scripts\activate # Windows # Install dependencies pip install -r requirements_local.txt ``` ### Step 4: Run the API Server ```bash # Start the FastAPI server (runs on port 8001) python app_ollama.py ``` Or using uvicorn directly: ```bash uvicorn app_ollama:app --host 0.0.0.0 --port 8001 ``` --- ## Testing ### Test 1: Verify Ollama is Running ```bash # Check Ollama status curl http://127.0.0.1:11434/api/tags # Should return something like: # {"models":[{"name":"apple/OpenELM-3B-Instruct"...}]} ``` ### Test 2: Quick Generation Test ```bash # Test basic generation curl http://127.0.0.1:11434/api/generate \ -d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Say hello!", "stream": false}' ``` ### Test 3: Run Test Scripts ```bash # Make test scripts executable chmod +x test_curl.sh # Run curl tests ./test_curl.sh # Run Python tests (requires openai package) python test_python.py ``` ### Test 4: Test the Full API Server ```bash # Test OpenAI format endpoint curl -X POST http://127.0.0.1:8001/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "apple/OpenELM-3B-Instruct", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100 }' # Test Anthropic format endpoint curl -X POST http://127.0.0.1:8001/v1/messages \ -H "Content-Type: application/json" \ -d '{ "model": "apple/OpenELM-3B-Instruct", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100 }' ``` --- ## API Usage ### Using OpenAI SDK (Python) ```python from openai import OpenAI # Connect to local Ollama client = OpenAI( base_url="http://127.0.0.1:11434/v1", api_key="ollama", # Any string works ) # Basic usage response = client.chat.completions.create( model="apple/OpenELM-3B-Instruct", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing simply."} ], max_tokens=200, temperature=0.7 ) print(response.choices[0].message.content) ``` ### Using Anthropic SDK (Python) ```python import anthropic # Connect to local Ollama (via API server) client = anthropic.Anthropic( base_url="http://127.0.0.1:8001/v1", api_key="ollama", # Any string works ) # Basic usage message = client.messages.create( model="apple/OpenELM-3B-Instruct", messages=[{"role": "user", "content": "Hello!"}], max_tokens=100 ) print(message.content[0].text) ``` ### Using cURL ```bash # Basic generation curl http://127.0.0.1:11434/api/generate \ -d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Your prompt here"}' # Chat completion (OpenAI format) curl http://127.0.0.1:8001/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "apple/OpenELM-3B-Instruct", "messages": [{"role": "user", "content": "Your prompt here"}], "max_tokens": 100 }' ``` --- ## Docker Compose Setup For easier deployment, use Docker Compose: ### Step 1: Start All Services ```bash # Start Ollama and API server together docker-compose up -d # View logs docker-compose logs -f ``` ### Step 2: Access the Services - **Ollama API**: http://localhost:11434 - **FastAPI Server**: http://localhost:8001 ### Step 3: Stop Services ```bash docker-compose down ``` --- ## Troubleshooting ### Issue: GPU Not Detected **Error**: `Error response from daemon: could not select device driver "" with capabilities: [[gpu]]` **Solution**: ```bash # Install NVIDIA Container Toolkit distribution=$(. /etc/ossa;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` ### Issue: Model Download Fails **Error**: `Error: pull model manifest` **Solution**: ```bash # Check network connection curl -I https://huggingface.co # Retry with verbose output docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct --verbose ``` ### Issue: API Server Can't Connect to Ollama **Error**: `Connection refused` or `Ollama not responding` **Solution**: ```bash # Check if Ollama is running docker ps | grep ollama # Check Ollama logs docker logs ollama # Restart Ollama docker restart ollama ``` ### Issue: Out of Memory **Error**: `CUDA out of memory` **Solution**: - Reduce `max_tokens` parameter - Use smaller batch sizes - Restart the Ollama container to free memory ### Issue: Port Already in Use **Error**: `Address already in use` **Solution**: ```bash # Find the process using the port lsof -i :11434 # Linux/macOS netstat -ano | findstr :11434 # Windows # Kill the process or use a different port ``` --- ## File Structure ``` ollama-openelm/ ├── setup_ollama_openelm.sh # Complete setup script ├── app_ollama.py # FastAPI server ├── requirements_local.txt # Python dependencies ├── docker-compose.yml # Docker Compose configuration ├── Dockerfile.api # API server Docker image ├── test_python.py # Python test script ├── test_curl.sh # cURL test script └── README.md # This file ``` --- ## Environment Variables ### For API Server | Variable | Default | Description | |----------|---------|-------------| | `OLLAMA_BASE_URL` | `http://127.0.0.1:11434` | Ollama server URL | | `OLLAMA_MODEL` | `apple/OpenELM-3B-Instruct` | Model name | | `PORT` | `8001` | API server port | ### For Ollama Container | Variable | Description | |----------|-------------| | `OLLAMA_HOST` | Override the Ollama server URL | | `OLLAMA_MODELS` | Path to model storage | --- ## Performance Tips 1. **GPU Memory**: The 3B model uses ~6GB GPU memory 2. **CPU Inference**: Falls back to CPU if no GPU available (slower) 3. **Batch Size**: Use `num_predict` to control output length 4. **Temperature**: Lower values (0.0-0.5) for more deterministic output --- ## Additional Resources - [Ollama Documentation](https://ollama.com/) - [OpenELM Model Card](https://huggingface.co/apple/OpenELM-3B-Instruct) - [OpenAI API Compatibility](https://platform.openai.com/docs/api-reference) - [FastAPI Documentation](https://fastapi.tiangolo.com/) --- ## License This setup is provided for educational and research purposes. The OpenELM models from Apple are released under their respective licenses.