Gemma-3-270M / README.md
unknown
Fixed the model optimzation speed
e5ba726
---
title: Gemma 3 270M Text Generation API
emoji: πŸ€–
colorFrom: purple
colorTo: indigo
sdk: docker
sdk_version: "0.0.1"
app_file: app.py
pinned: false
---
# Gemma 3 270M FastAPI Inference
This project provides a high-performance FastAPI-based inference server for the Google Gemma 3 270M language model using llama-cpp-python. It features thread-pool based asynchronous processing, rate limiting, and optimized GGUF model loading for fast response times.
This Space hosts the **Google Gemma 3 270M** model behind a FastAPI backend with:
- ⚑ **High Performance**: llama-cpp-python with GGUF model format for faster inference
- πŸ”’ **Rate Limiting**: IP-based request throttling
- πŸŽ›οΈ **Flexible Input**: Support for both chat messages and direct prompts
- πŸ“Š **Monitoring**: Built-in health checks and metrics
- πŸš€ **Production Ready**: Comprehensive error handling and logging
- πŸ”§ **Configurable**: Environment-based configuration with CPU/GPU support
- 🐳 **Docker Support**: Ready-to-deploy containerization
- πŸ’Ύ **Memory Efficient**: GGUF quantized models for reduced memory usage
## Usage
### Health Check
```bash
curl https://<your-space>/health