|
|
--- |
|
|
title: Ollama Generate API |
|
|
emoji: π¦ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: docker |
|
|
pinned: false |
|
|
app_port: 7860 |
|
|
--- |
|
|
|
|
|
# Ollama Generate API |
|
|
|
|
|
A simple REST API for text generation using Ollama models on Hugging Face Spaces. |
|
|
|
|
|
## Features |
|
|
|
|
|
- π¦ Generate text using Ollama models |
|
|
- ποΈ Configurable parameters (temperature, top_p, max_tokens) |
|
|
- π Health monitoring |
|
|
- π Simple and lightweight API |
|
|
|
|
|
## API Endpoints |
|
|
|
|
|
### Health Check |
|
|
- `GET /health` - Check if Ollama service is running |
|
|
- `GET /` - API information and usage examples |
|
|
|
|
|
### Text Generation |
|
|
- `POST /generate` - Generate text completion |
|
|
|
|
|
## Usage Examples |
|
|
|
|
|
### Check Health |
|
|
```bash |
|
|
curl "https://your-space.hf.space/health" |
|
|
``` |
|
|
|
|
|
### Generate Text |
|
|
```bash |
|
|
curl -X POST "https://your-space.hf.space/generate" \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-d '{ |
|
|
"model": "tinyllama", |
|
|
"prompt": "The future of AI is", |
|
|
"temperature": 0.7, |
|
|
"max_tokens": 100 |
|
|
}' |
|
|
``` |
|
|
|
|
|
### API Information |
|
|
```bash |
|
|
curl "https://your-space.hf.space/" |
|
|
``` |
|
|
|
|
|
## Request Parameters |
|
|
|
|
|
| Parameter | Type | Default | Description | |
|
|
|-----------|------|---------|-------------| |
|
|
| `model` | string | required | Model name (e.g., "tinyllama") | |
|
|
| `prompt` | string | required | Input text prompt | |
|
|
| `temperature` | float | 0.7 | Sampling temperature (0.0-2.0) | |
|
|
| `top_p` | float | 0.9 | Top-p sampling (0.0-1.0) | |
|
|
| `max_tokens` | integer | 512 | Maximum tokens to generate (1-4096) | |
|
|
|
|
|
## Supported Models |
|
|
|
|
|
This API works with any Ollama model. Recommended lightweight models for Hugging Face Spaces: |
|
|
|
|
|
- `tinyllama` - Very small and fast (~600MB) |
|
|
- `phi` - Small but capable (~1.6GB) |
|
|
- `llama2:7b` - Larger but more capable (~3.8GB) |
|
|
|
|
|
## Interactive Documentation |
|
|
|
|
|
Once deployed, visit `/docs` for interactive API documentation powered by FastAPI. |
|
|
|
|
|
## Setup Notes |
|
|
|
|
|
- The startup script automatically pulls the `tinyllama` model |
|
|
- First generation may be slower as the model loads |
|
|
- Lightweight models are recommended for better performance on limited resources |
|
|
|
|
|
## Example Response |
|
|
|
|
|
```json |
|
|
{ |
|
|
"model": "tinyllama", |
|
|
"response": "The future of AI is bright and full of possibilities...", |
|
|
"done": true, |
|
|
"total_duration": 1234567890, |
|
|
"load_duration": 123456789, |
|
|
"prompt_eval_count": 10, |
|
|
"eval_count": 25 |
|
|
} |
|
|
``` |
|
|
|
|
|
## Resource Requirements |
|
|
|
|
|
- **TinyLlama**: ~1GB RAM, very fast |
|
|
- **Phi models**: ~2GB RAM, good balance |
|
|
- **Llama2 7B**: ~8GB RAM, high quality |
|
|
|
|
|
For Hugging Face Spaces free tier, stick with TinyLlama or Phi models for best performance. |