--- title: Gemma 3 270M Text Generation API emoji: 🤖 colorFrom: purple colorTo: indigo sdk: docker sdk_version: "0.0.1" app_file: app.py pinned: false --- # Gemma 3 270M FastAPI Inference This project provides a high-performance FastAPI-based inference server for the Google Gemma 3 270M language model using llama-cpp-python. It features thread-pool based asynchronous processing, rate limiting, and optimized GGUF model loading for fast response times. This Space hosts the **Google Gemma 3 270M** model behind a FastAPI backend with: - ⚡ **High Performance**: llama-cpp-python with GGUF model format for faster inference - 🔒 **Rate Limiting**: IP-based request throttling - 🎛️ **Flexible Input**: Support for both chat messages and direct prompts - 📊 **Monitoring**: Built-in health checks and metrics - 🚀 **Production Ready**: Comprehensive error handling and logging - 🔧 **Configurable**: Environment-based configuration with CPU/GPU support - 🐳 **Docker Support**: Ready-to-deploy containerization - 💾 **Memory Efficient**: GGUF quantized models for reduced memory usage ## Usage ### Health Check ```bash curl https:///health