Spaces:
Sleeping
Sleeping
File size: 1,179 Bytes
42b373b e5ba726 42b373b e5ba726 42b373b e5ba726 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
---
title: Gemma 3 270M Text Generation API
emoji: π€
colorFrom: purple
colorTo: indigo
sdk: docker
sdk_version: "0.0.1"
app_file: app.py
pinned: false
---
# Gemma 3 270M FastAPI Inference
This project provides a high-performance FastAPI-based inference server for the Google Gemma 3 270M language model using llama-cpp-python. It features thread-pool based asynchronous processing, rate limiting, and optimized GGUF model loading for fast response times.
This Space hosts the **Google Gemma 3 270M** model behind a FastAPI backend with:
- β‘ **High Performance**: llama-cpp-python with GGUF model format for faster inference
- π **Rate Limiting**: IP-based request throttling
- ποΈ **Flexible Input**: Support for both chat messages and direct prompts
- π **Monitoring**: Built-in health checks and metrics
- π **Production Ready**: Comprehensive error handling and logging
- π§ **Configurable**: Environment-based configuration with CPU/GPU support
- π³ **Docker Support**: Ready-to-deploy containerization
- πΎ **Memory Efficient**: GGUF quantized models for reduced memory usage
## Usage
### Health Check
```bash
curl https://<your-space>/health
|