agentic-api / README_LOCAL.md
MiniMax Agent
Add complete local Ollama setup with OpenELM - includes setup script, API server, test scripts, and documentation
41831f1
# Complete Ollama OpenELM Setup Guide
This guide provides complete instructions to set up a local Ollama instance with Apple's OpenELM model and use it via OpenAI/Anthropic compatible APIs.
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [Quick Start (One Command)](#quick-start-one-command)
3. [Manual Setup](#manual-setup)
4. [Testing](#testing)
5. [API Usage](#api-usage)
6. [Docker Compose Setup](#docker-compose-setup)
7. [Troubleshooting](#troubleshooting)
---
## Prerequisites
### Required Software
- **Docker**: [Install Docker](https://docs.docker.com/get-docker/)
- **NVIDIA Driver**: For GPU support (check with `nvidia-smi`)
- **NVIDIA Container Toolkit**: [Install Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
### Verify GPU Access
```bash
# Check NVIDIA driver
nvidia-smi
# Verify Docker can see GPU
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```
---
## Quick Start (One Command)
### For Linux/macOS
```bash
# Download and run the complete setup script
curl -O https://raw.githubusercontent.com/your-repo/setup_ollama_openelm.sh
chmod +x setup_ollama_openelm.sh
./setup_ollama_openelm.sh
```
### For Windows (PowerShell)
```powershell
# Run each command manually (see Manual Setup below)
```
---
## Manual Setup
### Step 1: Start Ollama Container
```bash
# Start Ollama with GPU support
docker run -d \
--name ollama \
-v ollama:/root/.ollama \
-p 127.0.0.1:11434:11434 \
--gpus all \
ollama/ollama
# Verify it's running
docker ps | grep ollama
```
### Step 2: Pull OpenELM Model
```bash
# Pull the 3B parameter model (2.1 GB)
docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct
# Verify installation
docker exec ollama ollama list
```
Expected output:
```
NAME ID SIZE MODIFIED
apple/OpenELM-3B-Instruct:latest abc123... 2.1 GB About a minute ago
```
### Step 3: Install Python Dependencies
```bash
# Create virtual environment (optional but recommended)
python3 -m venv ollama_env
source ollama_env/bin/activate # Linux/macOS
# or: .\ollama_env\Scripts\activate # Windows
# Install dependencies
pip install -r requirements_local.txt
```
### Step 4: Run the API Server
```bash
# Start the FastAPI server (runs on port 8001)
python app_ollama.py
```
Or using uvicorn directly:
```bash
uvicorn app_ollama:app --host 0.0.0.0 --port 8001
```
---
## Testing
### Test 1: Verify Ollama is Running
```bash
# Check Ollama status
curl http://127.0.0.1:11434/api/tags
# Should return something like:
# {"models":[{"name":"apple/OpenELM-3B-Instruct"...}]}
```
### Test 2: Quick Generation Test
```bash
# Test basic generation
curl http://127.0.0.1:11434/api/generate \
-d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Say hello!", "stream": false}'
```
### Test 3: Run Test Scripts
```bash
# Make test scripts executable
chmod +x test_curl.sh
# Run curl tests
./test_curl.sh
# Run Python tests (requires openai package)
python test_python.py
```
### Test 4: Test the Full API Server
```bash
# Test OpenAI format endpoint
curl -X POST http://127.0.0.1:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "apple/OpenELM-3B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'
# Test Anthropic format endpoint
curl -X POST http://127.0.0.1:8001/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "apple/OpenELM-3B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'
```
---
## API Usage
### Using OpenAI SDK (Python)
```python
from openai import OpenAI
# Connect to local Ollama
client = OpenAI(
base_url="http://127.0.0.1:11434/v1",
api_key="ollama", # Any string works
)
# Basic usage
response = client.chat.completions.create(
model="apple/OpenELM-3B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing simply."}
],
max_tokens=200,
temperature=0.7
)
print(response.choices[0].message.content)
```
### Using Anthropic SDK (Python)
```python
import anthropic
# Connect to local Ollama (via API server)
client = anthropic.Anthropic(
base_url="http://127.0.0.1:8001/v1",
api_key="ollama", # Any string works
)
# Basic usage
message = client.messages.create(
model="apple/OpenELM-3B-Instruct",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=100
)
print(message.content[0].text)
```
### Using cURL
```bash
# Basic generation
curl http://127.0.0.1:11434/api/generate \
-d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Your prompt here"}'
# Chat completion (OpenAI format)
curl http://127.0.0.1:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "apple/OpenELM-3B-Instruct",
"messages": [{"role": "user", "content": "Your prompt here"}],
"max_tokens": 100
}'
```
---
## Docker Compose Setup
For easier deployment, use Docker Compose:
### Step 1: Start All Services
```bash
# Start Ollama and API server together
docker-compose up -d
# View logs
docker-compose logs -f
```
### Step 2: Access the Services
- **Ollama API**: http://localhost:11434
- **FastAPI Server**: http://localhost:8001
### Step 3: Stop Services
```bash
docker-compose down
```
---
## Troubleshooting
### Issue: GPU Not Detected
**Error**: `Error response from daemon: could not select device driver "" with capabilities: [[gpu]]`
**Solution**:
```bash
# Install NVIDIA Container Toolkit
distribution=$(. /etc/ossa;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```
### Issue: Model Download Fails
**Error**: `Error: pull model manifest`
**Solution**:
```bash
# Check network connection
curl -I https://huggingface.co
# Retry with verbose output
docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct --verbose
```
### Issue: API Server Can't Connect to Ollama
**Error**: `Connection refused` or `Ollama not responding`
**Solution**:
```bash
# Check if Ollama is running
docker ps | grep ollama
# Check Ollama logs
docker logs ollama
# Restart Ollama
docker restart ollama
```
### Issue: Out of Memory
**Error**: `CUDA out of memory`
**Solution**:
- Reduce `max_tokens` parameter
- Use smaller batch sizes
- Restart the Ollama container to free memory
### Issue: Port Already in Use
**Error**: `Address already in use`
**Solution**:
```bash
# Find the process using the port
lsof -i :11434 # Linux/macOS
netstat -ano | findstr :11434 # Windows
# Kill the process or use a different port
```
---
## File Structure
```
ollama-openelm/
β”œβ”€β”€ setup_ollama_openelm.sh # Complete setup script
β”œβ”€β”€ app_ollama.py # FastAPI server
β”œβ”€β”€ requirements_local.txt # Python dependencies
β”œβ”€β”€ docker-compose.yml # Docker Compose configuration
β”œβ”€β”€ Dockerfile.api # API server Docker image
β”œβ”€β”€ test_python.py # Python test script
β”œβ”€β”€ test_curl.sh # cURL test script
└── README.md # This file
```
---
## Environment Variables
### For API Server
| Variable | Default | Description |
|----------|---------|-------------|
| `OLLAMA_BASE_URL` | `http://127.0.0.1:11434` | Ollama server URL |
| `OLLAMA_MODEL` | `apple/OpenELM-3B-Instruct` | Model name |
| `PORT` | `8001` | API server port |
### For Ollama Container
| Variable | Description |
|----------|-------------|
| `OLLAMA_HOST` | Override the Ollama server URL |
| `OLLAMA_MODELS` | Path to model storage |
---
## Performance Tips
1. **GPU Memory**: The 3B model uses ~6GB GPU memory
2. **CPU Inference**: Falls back to CPU if no GPU available (slower)
3. **Batch Size**: Use `num_predict` to control output length
4. **Temperature**: Lower values (0.0-0.5) for more deterministic output
---
## Additional Resources
- [Ollama Documentation](https://ollama.com/)
- [OpenELM Model Card](https://huggingface.co/apple/OpenELM-3B-Instruct)
- [OpenAI API Compatibility](https://platform.openai.com/docs/api-reference)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
---
## License
This setup is provided for educational and research purposes. The OpenELM models from Apple are released under their respective licenses.