Spaces:
Running
Running
metadata
title: Gemme4
emoji: 💎
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
Gemma 4 E2B FastAPI
FastAPI wrapper around a llama.cpp server running Gemma 4 E2B Instruct (multimodal).
Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /health |
Server health + model info |
| GET | /v1/models |
List models |
| POST | /v1/chat/completions |
OpenAI-compatible chat (streaming supported) |
| POST | /chat |
Simplified chat |
| POST | /generate |
Text generation from a prompt |
| POST | /vision |
Multimodal: text + image (URL or base64) |
Usage
Chat
curl -X POST https://<space-url>/chat \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 512}'
Vision
curl -X POST https://<space-url>/vision \
-H "Content-Type: application/json" \
-d '{"prompt": "What is in this image?", "image": "https://example.com/image.jpg"}'
Streaming
curl -X POST https://<space-url>/chat \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Tell me a story"}], "stream": true}'