---
title: Gemme4
emoji: 💎
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
---

# Gemma 4 E2B FastAPI

FastAPI wrapper around a llama.cpp server running Gemma 4 E2B Instruct (multimodal).

## Endpoints

| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Server health + model info |
| GET | `/v1/models` | List models |
| POST | `/v1/chat/completions` | OpenAI-compatible chat (streaming supported) |
| POST | `/chat` | Simplified chat |
| POST | `/generate` | Text generation from a prompt |
| POST | `/vision` | Multimodal: text + image (URL or base64) |

## Usage

### Chat
```bash
curl -X POST https://<space-url>/chat \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 512}'
```

### Vision
```bash
curl -X POST https://<space-url>/vision \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is in this image?", "image": "https://example.com/image.jpg"}'
```

### Streaming
```bash
curl -X POST https://<space-url>/chat \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Tell me a story"}], "stream": true}'
```