Spaces:

LegendDeep
/

Gemma-3-270M

Sleeping

Gemma-3-270M / README.md

unknown

Fixed the model optimzation speed

e5ba726 about 2 months ago

1.18 kB

	---
	title: Gemma 3 270M Text Generation API
	emoji: 🤖
	colorFrom: purple
	colorTo: indigo
	sdk: docker
	sdk_version: "0.0.1"
	app_file: app.py
	pinned: false
	---

	# Gemma 3 270M FastAPI Inference

	This project provides a high-performance FastAPI-based inference server for the Google Gemma 3 270M language model using llama-cpp-python. It features thread-pool based asynchronous processing, rate limiting, and optimized GGUF model loading for fast response times.

	This Space hosts the Google Gemma 3 270M model behind a FastAPI backend with:

	- ⚡ High Performance: llama-cpp-python with GGUF model format for faster inference
	- 🔒 Rate Limiting: IP-based request throttling
	- 🎛️ Flexible Input: Support for both chat messages and direct prompts
	- 📊 Monitoring: Built-in health checks and metrics
	- 🚀 Production Ready: Comprehensive error handling and logging
	- 🔧 Configurable: Environment-based configuration with CPU/GPU support
	- 🐳 Docker Support: Ready-to-deploy containerization
	- 💾 Memory Efficient: GGUF quantized models for reduced memory usage

	## Usage

	### Health Check

	```bash
	curl https://<your-space>/health