File size: 1,179 Bytes
42b373b
e5ba726
 
 
 
42b373b
e5ba726
 
42b373b
 
 
e5ba726
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
title: Gemma 3 270M Text Generation API
emoji: πŸ€–
colorFrom: purple
colorTo: indigo
sdk: docker
sdk_version: "0.0.1"
app_file: app.py
pinned: false
---

# Gemma 3 270M FastAPI Inference

This project provides a high-performance FastAPI-based inference server for the Google Gemma 3 270M language model using llama-cpp-python. It features thread-pool based asynchronous processing, rate limiting, and optimized GGUF model loading for fast response times.

This Space hosts the **Google Gemma 3 270M** model behind a FastAPI backend with:

- ⚑ **High Performance**: llama-cpp-python with GGUF model format for faster inference
- πŸ”’ **Rate Limiting**: IP-based request throttling
- πŸŽ›οΈ **Flexible Input**: Support for both chat messages and direct prompts
- πŸ“Š **Monitoring**: Built-in health checks and metrics
- πŸš€ **Production Ready**: Comprehensive error handling and logging
- πŸ”§ **Configurable**: Environment-based configuration with CPU/GPU support
- 🐳 **Docker Support**: Ready-to-deploy containerization
- πŸ’Ύ **Memory Efficient**: GGUF quantized models for reduced memory usage

## Usage

### Health Check

```bash
curl https://<your-space>/health