Spaces:
Sleeping
Sleeping
| title: Gemma 3 270M Text Generation API | |
| emoji: π€ | |
| colorFrom: purple | |
| colorTo: indigo | |
| sdk: docker | |
| sdk_version: "0.0.1" | |
| app_file: app.py | |
| pinned: false | |
| # Gemma 3 270M FastAPI Inference | |
| This project provides a high-performance FastAPI-based inference server for the Google Gemma 3 270M language model using llama-cpp-python. It features thread-pool based asynchronous processing, rate limiting, and optimized GGUF model loading for fast response times. | |
| This Space hosts the **Google Gemma 3 270M** model behind a FastAPI backend with: | |
| - β‘ **High Performance**: llama-cpp-python with GGUF model format for faster inference | |
| - π **Rate Limiting**: IP-based request throttling | |
| - ποΈ **Flexible Input**: Support for both chat messages and direct prompts | |
| - π **Monitoring**: Built-in health checks and metrics | |
| - π **Production Ready**: Comprehensive error handling and logging | |
| - π§ **Configurable**: Environment-based configuration with CPU/GPU support | |
| - π³ **Docker Support**: Ready-to-deploy containerization | |
| - πΎ **Memory Efficient**: GGUF quantized models for reduced memory usage | |
| ## Usage | |
| ### Health Check | |
| ```bash | |
| curl https://<your-space>/health | |