Spaces:
Runtime error
Runtime error
MiniMax Agent
Add complete local Ollama setup with OpenELM - includes setup script, API server, test scripts, and documentation
41831f1 | # Complete Ollama OpenELM Setup Guide | |
| This guide provides complete instructions to set up a local Ollama instance with Apple's OpenELM model and use it via OpenAI/Anthropic compatible APIs. | |
| ## Table of Contents | |
| 1. [Prerequisites](#prerequisites) | |
| 2. [Quick Start (One Command)](#quick-start-one-command) | |
| 3. [Manual Setup](#manual-setup) | |
| 4. [Testing](#testing) | |
| 5. [API Usage](#api-usage) | |
| 6. [Docker Compose Setup](#docker-compose-setup) | |
| 7. [Troubleshooting](#troubleshooting) | |
| --- | |
| ## Prerequisites | |
| ### Required Software | |
| - **Docker**: [Install Docker](https://docs.docker.com/get-docker/) | |
| - **NVIDIA Driver**: For GPU support (check with `nvidia-smi`) | |
| - **NVIDIA Container Toolkit**: [Install Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) | |
| ### Verify GPU Access | |
| ```bash | |
| # Check NVIDIA driver | |
| nvidia-smi | |
| # Verify Docker can see GPU | |
| docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi | |
| ``` | |
| --- | |
| ## Quick Start (One Command) | |
| ### For Linux/macOS | |
| ```bash | |
| # Download and run the complete setup script | |
| curl -O https://raw.githubusercontent.com/your-repo/setup_ollama_openelm.sh | |
| chmod +x setup_ollama_openelm.sh | |
| ./setup_ollama_openelm.sh | |
| ``` | |
| ### For Windows (PowerShell) | |
| ```powershell | |
| # Run each command manually (see Manual Setup below) | |
| ``` | |
| --- | |
| ## Manual Setup | |
| ### Step 1: Start Ollama Container | |
| ```bash | |
| # Start Ollama with GPU support | |
| docker run -d \ | |
| --name ollama \ | |
| -v ollama:/root/.ollama \ | |
| -p 127.0.0.1:11434:11434 \ | |
| --gpus all \ | |
| ollama/ollama | |
| # Verify it's running | |
| docker ps | grep ollama | |
| ``` | |
| ### Step 2: Pull OpenELM Model | |
| ```bash | |
| # Pull the 3B parameter model (2.1 GB) | |
| docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct | |
| # Verify installation | |
| docker exec ollama ollama list | |
| ``` | |
| Expected output: | |
| ``` | |
| NAME ID SIZE MODIFIED | |
| apple/OpenELM-3B-Instruct:latest abc123... 2.1 GB About a minute ago | |
| ``` | |
| ### Step 3: Install Python Dependencies | |
| ```bash | |
| # Create virtual environment (optional but recommended) | |
| python3 -m venv ollama_env | |
| source ollama_env/bin/activate # Linux/macOS | |
| # or: .\ollama_env\Scripts\activate # Windows | |
| # Install dependencies | |
| pip install -r requirements_local.txt | |
| ``` | |
| ### Step 4: Run the API Server | |
| ```bash | |
| # Start the FastAPI server (runs on port 8001) | |
| python app_ollama.py | |
| ``` | |
| Or using uvicorn directly: | |
| ```bash | |
| uvicorn app_ollama:app --host 0.0.0.0 --port 8001 | |
| ``` | |
| --- | |
| ## Testing | |
| ### Test 1: Verify Ollama is Running | |
| ```bash | |
| # Check Ollama status | |
| curl http://127.0.0.1:11434/api/tags | |
| # Should return something like: | |
| # {"models":[{"name":"apple/OpenELM-3B-Instruct"...}]} | |
| ``` | |
| ### Test 2: Quick Generation Test | |
| ```bash | |
| # Test basic generation | |
| curl http://127.0.0.1:11434/api/generate \ | |
| -d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Say hello!", "stream": false}' | |
| ``` | |
| ### Test 3: Run Test Scripts | |
| ```bash | |
| # Make test scripts executable | |
| chmod +x test_curl.sh | |
| # Run curl tests | |
| ./test_curl.sh | |
| # Run Python tests (requires openai package) | |
| python test_python.py | |
| ``` | |
| ### Test 4: Test the Full API Server | |
| ```bash | |
| # Test OpenAI format endpoint | |
| curl -X POST http://127.0.0.1:8001/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "apple/OpenELM-3B-Instruct", | |
| "messages": [{"role": "user", "content": "Hello!"}], | |
| "max_tokens": 100 | |
| }' | |
| # Test Anthropic format endpoint | |
| curl -X POST http://127.0.0.1:8001/v1/messages \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "apple/OpenELM-3B-Instruct", | |
| "messages": [{"role": "user", "content": "Hello!"}], | |
| "max_tokens": 100 | |
| }' | |
| ``` | |
| --- | |
| ## API Usage | |
| ### Using OpenAI SDK (Python) | |
| ```python | |
| from openai import OpenAI | |
| # Connect to local Ollama | |
| client = OpenAI( | |
| base_url="http://127.0.0.1:11434/v1", | |
| api_key="ollama", # Any string works | |
| ) | |
| # Basic usage | |
| response = client.chat.completions.create( | |
| model="apple/OpenELM-3B-Instruct", | |
| messages=[ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", "content": "Explain quantum computing simply."} | |
| ], | |
| max_tokens=200, | |
| temperature=0.7 | |
| ) | |
| print(response.choices[0].message.content) | |
| ``` | |
| ### Using Anthropic SDK (Python) | |
| ```python | |
| import anthropic | |
| # Connect to local Ollama (via API server) | |
| client = anthropic.Anthropic( | |
| base_url="http://127.0.0.1:8001/v1", | |
| api_key="ollama", # Any string works | |
| ) | |
| # Basic usage | |
| message = client.messages.create( | |
| model="apple/OpenELM-3B-Instruct", | |
| messages=[{"role": "user", "content": "Hello!"}], | |
| max_tokens=100 | |
| ) | |
| print(message.content[0].text) | |
| ``` | |
| ### Using cURL | |
| ```bash | |
| # Basic generation | |
| curl http://127.0.0.1:11434/api/generate \ | |
| -d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Your prompt here"}' | |
| # Chat completion (OpenAI format) | |
| curl http://127.0.0.1:8001/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "apple/OpenELM-3B-Instruct", | |
| "messages": [{"role": "user", "content": "Your prompt here"}], | |
| "max_tokens": 100 | |
| }' | |
| ``` | |
| --- | |
| ## Docker Compose Setup | |
| For easier deployment, use Docker Compose: | |
| ### Step 1: Start All Services | |
| ```bash | |
| # Start Ollama and API server together | |
| docker-compose up -d | |
| # View logs | |
| docker-compose logs -f | |
| ``` | |
| ### Step 2: Access the Services | |
| - **Ollama API**: http://localhost:11434 | |
| - **FastAPI Server**: http://localhost:8001 | |
| ### Step 3: Stop Services | |
| ```bash | |
| docker-compose down | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| ### Issue: GPU Not Detected | |
| **Error**: `Error response from daemon: could not select device driver "" with capabilities: [[gpu]]` | |
| **Solution**: | |
| ```bash | |
| # Install NVIDIA Container Toolkit | |
| distribution=$(. /etc/ossa;echo $ID$VERSION_ID) | |
| curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - | |
| curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ | |
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list | |
| sudo apt-get update | |
| sudo apt-get install -y nvidia-container-toolkit | |
| sudo systemctl restart docker | |
| ``` | |
| ### Issue: Model Download Fails | |
| **Error**: `Error: pull model manifest` | |
| **Solution**: | |
| ```bash | |
| # Check network connection | |
| curl -I https://huggingface.co | |
| # Retry with verbose output | |
| docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct --verbose | |
| ``` | |
| ### Issue: API Server Can't Connect to Ollama | |
| **Error**: `Connection refused` or `Ollama not responding` | |
| **Solution**: | |
| ```bash | |
| # Check if Ollama is running | |
| docker ps | grep ollama | |
| # Check Ollama logs | |
| docker logs ollama | |
| # Restart Ollama | |
| docker restart ollama | |
| ``` | |
| ### Issue: Out of Memory | |
| **Error**: `CUDA out of memory` | |
| **Solution**: | |
| - Reduce `max_tokens` parameter | |
| - Use smaller batch sizes | |
| - Restart the Ollama container to free memory | |
| ### Issue: Port Already in Use | |
| **Error**: `Address already in use` | |
| **Solution**: | |
| ```bash | |
| # Find the process using the port | |
| lsof -i :11434 # Linux/macOS | |
| netstat -ano | findstr :11434 # Windows | |
| # Kill the process or use a different port | |
| ``` | |
| --- | |
| ## File Structure | |
| ``` | |
| ollama-openelm/ | |
| βββ setup_ollama_openelm.sh # Complete setup script | |
| βββ app_ollama.py # FastAPI server | |
| βββ requirements_local.txt # Python dependencies | |
| βββ docker-compose.yml # Docker Compose configuration | |
| βββ Dockerfile.api # API server Docker image | |
| βββ test_python.py # Python test script | |
| βββ test_curl.sh # cURL test script | |
| βββ README.md # This file | |
| ``` | |
| --- | |
| ## Environment Variables | |
| ### For API Server | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `OLLAMA_BASE_URL` | `http://127.0.0.1:11434` | Ollama server URL | | |
| | `OLLAMA_MODEL` | `apple/OpenELM-3B-Instruct` | Model name | | |
| | `PORT` | `8001` | API server port | | |
| ### For Ollama Container | |
| | Variable | Description | | |
| |----------|-------------| | |
| | `OLLAMA_HOST` | Override the Ollama server URL | | |
| | `OLLAMA_MODELS` | Path to model storage | | |
| --- | |
| ## Performance Tips | |
| 1. **GPU Memory**: The 3B model uses ~6GB GPU memory | |
| 2. **CPU Inference**: Falls back to CPU if no GPU available (slower) | |
| 3. **Batch Size**: Use `num_predict` to control output length | |
| 4. **Temperature**: Lower values (0.0-0.5) for more deterministic output | |
| --- | |
| ## Additional Resources | |
| - [Ollama Documentation](https://ollama.com/) | |
| - [OpenELM Model Card](https://huggingface.co/apple/OpenELM-3B-Instruct) | |
| - [OpenAI API Compatibility](https://platform.openai.com/docs/api-reference) | |
| - [FastAPI Documentation](https://fastapi.tiangolo.com/) | |
| --- | |
| ## License | |
| This setup is provided for educational and research purposes. The OpenELM models from Apple are released under their respective licenses. | |