Spaces:

megharudushi
/

agentic-api

Runtime error

App Files Files Community

agentic-api / README_LOCAL.md

MiniMax Agent

Add complete local Ollama setup with OpenELM - includes setup script, API server, test scripts, and documentation

41831f1 2 months ago

preview code

raw

history blame contribute delete

8.77 kB

Complete Ollama OpenELM Setup Guide

This guide provides complete instructions to set up a local Ollama instance with Apple's OpenELM model and use it via OpenAI/Anthropic compatible APIs.

Prerequisites
Quick Start (One Command)
Manual Setup
Testing
API Usage
Docker Compose Setup
Troubleshooting

Prerequisites

Required Software

Docker: Install Docker
NVIDIA Driver: For GPU support (check with nvidia-smi)
NVIDIA Container Toolkit: Install Guide

Verify GPU Access

# Check NVIDIA driver
nvidia-smi

# Verify Docker can see GPU
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

Quick Start (One Command)

For Linux/macOS

# Download and run the complete setup script
curl -O https://raw.githubusercontent.com/your-repo/setup_ollama_openelm.sh
chmod +x setup_ollama_openelm.sh
./setup_ollama_openelm.sh

For Windows (PowerShell)

# Run each command manually (see Manual Setup below)

Manual Setup

Step 1: Start Ollama Container

# Start Ollama with GPU support
docker run -d \
    --name ollama \
    -v ollama:/root/.ollama \
    -p 127.0.0.1:11434:11434 \
    --gpus all \
    ollama/ollama

# Verify it's running
docker ps | grep ollama

Step 2: Pull OpenELM Model

# Pull the 3B parameter model (2.1 GB)
docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct

# Verify installation
docker exec ollama ollama list

Expected output:

NAME                             	ID          	SIZE  	MODIFIED
apple/OpenELM-3B-Instruct:latest	abc123...	2.1 GB	About a minute ago

Step 3: Install Python Dependencies

# Create virtual environment (optional but recommended)
python3 -m venv ollama_env
source ollama_env/bin/activate  # Linux/macOS
# or: .\ollama_env\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements_local.txt

Step 4: Run the API Server

# Start the FastAPI server (runs on port 8001)
python app_ollama.py

Or using uvicorn directly:

uvicorn app_ollama:app --host 0.0.0.0 --port 8001

Testing

Test 1: Verify Ollama is Running

# Check Ollama status
curl http://127.0.0.1:11434/api/tags

# Should return something like:
# {"models":[{"name":"apple/OpenELM-3B-Instruct"...}]}

Test 2: Quick Generation Test

# Test basic generation
curl http://127.0.0.1:11434/api/generate \
    -d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Say hello!", "stream": false}'

Test 3: Run Test Scripts

# Make test scripts executable
chmod +x test_curl.sh

# Run curl tests
./test_curl.sh

# Run Python tests (requires openai package)
python test_python.py

Test 4: Test the Full API Server

# Test OpenAI format endpoint
curl -X POST http://127.0.0.1:8001/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "apple/OpenELM-3B-Instruct",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 100
    }'

# Test Anthropic format endpoint
curl -X POST http://127.0.0.1:8001/v1/messages \
    -H "Content-Type: application/json" \
    -d '{
        "model": "apple/OpenELM-3B-Instruct",
        "messages": [{"role": "user", "content": "Hello!"}],
        "max_tokens": 100
    }'

API Usage

Using OpenAI SDK (Python)

from openai import OpenAI

# Connect to local Ollama
client = OpenAI(
    base_url="http://127.0.0.1:11434/v1",
    api_key="ollama",  # Any string works
)

# Basic usage
response = client.chat.completions.create(
    model="apple/OpenELM-3B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing simply."}
    ],
    max_tokens=200,
    temperature=0.7
)

print(response.choices[0].message.content)

Using Anthropic SDK (Python)

import anthropic

# Connect to local Ollama (via API server)
client = anthropic.Anthropic(
    base_url="http://127.0.0.1:8001/v1",
    api_key="ollama",  # Any string works
)

# Basic usage
message = client.messages.create(
    model="apple/OpenELM-3B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=100
)

print(message.content[0].text)

Using cURL

# Basic generation
curl http://127.0.0.1:11434/api/generate \
    -d '{"model": "apple/OpenELM-3B-Instruct", "prompt": "Your prompt here"}'

# Chat completion (OpenAI format)
curl http://127.0.0.1:8001/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "apple/OpenELM-3B-Instruct",
        "messages": [{"role": "user", "content": "Your prompt here"}],
        "max_tokens": 100
    }'

Docker Compose Setup

For easier deployment, use Docker Compose:

Step 1: Start All Services

# Start Ollama and API server together
docker-compose up -d

# View logs
docker-compose logs -f

Step 2: Access the Services

Ollama API: http://localhost:11434
FastAPI Server: http://localhost:8001

Step 3: Stop Services

docker-compose down

Troubleshooting

Issue: GPU Not Detected

Error: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]

Solution:

# Install NVIDIA Container Toolkit
distribution=$(. /etc/ossa;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Issue: Model Download Fails

Error: Error: pull model manifest

Solution:

# Check network connection
curl -I https://huggingface.co

# Retry with verbose output
docker exec -it ollama ollama pull apple/OpenELM-3B-Instruct --verbose

Issue: API Server Can't Connect to Ollama

Error: Connection refused or Ollama not responding

Solution:

# Check if Ollama is running
docker ps | grep ollama

# Check Ollama logs
docker logs ollama

# Restart Ollama
docker restart ollama

Issue: Out of Memory

Error: CUDA out of memory

Solution:

Reduce max_tokens parameter
Use smaller batch sizes
Restart the Ollama container to free memory

Issue: Port Already in Use

Error: Address already in use

Solution:

# Find the process using the port
lsof -i :11434  # Linux/macOS
netstat -ano | findstr :11434  # Windows

# Kill the process or use a different port

File Structure

ollama-openelm/
├── setup_ollama_openelm.sh    # Complete setup script
├── app_ollama.py              # FastAPI server
├── requirements_local.txt     # Python dependencies
├── docker-compose.yml         # Docker Compose configuration
├── Dockerfile.api             # API server Docker image
├── test_python.py             # Python test script
├── test_curl.sh               # cURL test script
└── README.md                  # This file

Environment Variables

For API Server

Variable	Default	Description
`OLLAMA_BASE_URL`	`http://127.0.0.1:11434`	Ollama server URL
`OLLAMA_MODEL`	`apple/OpenELM-3B-Instruct`	Model name
`PORT`	`8001`	API server port

For Ollama Container

Variable	Description
`OLLAMA_HOST`	Override the Ollama server URL
`OLLAMA_MODELS`	Path to model storage

Performance Tips

GPU Memory: The 3B model uses ~6GB GPU memory
CPU Inference: Falls back to CPU if no GPU available (slower)
Batch Size: Use num_predict to control output length
Temperature: Lower values (0.0-0.5) for more deterministic output

Additional Resources

License

This setup is provided for educational and research purposes. The OpenELM models from Apple are released under their respective licenses.

Complete Ollama OpenELM Setup Guide

Table of Contents

Prerequisites

Required Software

Verify GPU Access

Quick Start (One Command)

For Linux/macOS

For Windows (PowerShell)

Manual Setup

Step 1: Start Ollama Container

Step 2: Pull OpenELM Model

Step 3: Install Python Dependencies

Step 4: Run the API Server

Testing

Test 1: Verify Ollama is Running

Test 2: Quick Generation Test

Test 3: Run Test Scripts

Test 4: Test the Full API Server

API Usage

Using OpenAI SDK (Python)

Using Anthropic SDK (Python)

Using cURL

Docker Compose Setup

Step 1: Start All Services

Step 2: Access the Services

Step 3: Stop Services

Troubleshooting

Issue: GPU Not Detected

Issue: Model Download Fails

Issue: API Server Can't Connect to Ollama

Issue: Out of Memory

Issue: Port Already in Use

File Structure

Environment Variables

For API Server

For Ollama Container

Performance Tips

Additional Resources

License