---
title: Gemma 3 270M Text Generation API
emoji: 🤖
colorFrom: purple
colorTo: indigo
sdk: docker
sdk_version: "0.0.1"
app_file: app.py
pinned: false
---

# Gemma 3 270M FastAPI Inference

This project provides a high-performance FastAPI-based inference server for the Google Gemma 3 270M language model using llama-cpp-python. It features thread-pool based asynchronous processing, rate limiting, and optimized GGUF model loading for fast response times.

This Space hosts the **Google Gemma 3 270M** model behind a FastAPI backend with:

- ⚡ **High Performance**: llama-cpp-python with GGUF model format for faster inference
- 🔒 **Rate Limiting**: IP-based request throttling
- 🎛️ **Flexible Input**: Support for both chat messages and direct prompts
- 📊 **Monitoring**: Built-in health checks and metrics
- 🚀 **Production Ready**: Comprehensive error handling and logging
- 🔧 **Configurable**: Environment-based configuration with CPU/GPU support
- 🐳 **Docker Support**: Ready-to-deploy containerization
- 💾 **Memory Efficient**: GGUF quantized models for reduced memory usage

## Usage

### Health Check

```bash
curl https://<your-space>/health