Spaces:

khushalcodiste
/

gemme4

Sleeping

App Files Files Community

khushalcodiste commited on Apr 7

Commit

6c84960

1 Parent(s): 6bfa874

Add application file

Browse files

Files changed (13) hide show

.dockerignore +26 -0
.env +9 -0
.env.example +9 -0
.gitignore +10 -0
.python-version +1 -0
Dockerfile +33 -0
README.md +213 -10
docker-compose.yml +24 -0
main.py +163 -0
pyproject.toml +7 -0
requirements.txt +6 -0
setup.sh +51 -0
uv.lock +8 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,26 @@

+__pycache__
+*.pyc
+*.pyo
+*.pyd
+.Python
+env
+venv
+.venv
+pip-log.txt
+pip-delete-this-directory.txt
+.tox
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.log
+.git
+.mypy_cache
+.pytest_cache
+.hypothesis
+.env
+.env.local
+.env.production
+.env.staging

.env ADDED Viewed

	@@ -0,0 +1,9 @@

+# HuggingFace model configuration
+MODEL_NAME=google/gemma-4-E2B-it
+# Application configuration
+APP_HOST=0.0.0.0
+APP_PORT=8001
+# Logging
+LOG_LEVEL=INFO

.env.example ADDED Viewed

	@@ -0,0 +1,9 @@

+# HuggingFace model configuration
+MODEL_NAME=google/gemma-4-E2B-it
+# Application configuration
+APP_HOST=0.0.0.0
+APP_PORT=8001
+# Logging
+LOG_LEVEL=INFO

.gitignore ADDED Viewed

	@@ -0,0 +1,10 @@

+# Python-generated files
+__pycache__/
+*.py[oc]
+build/
+dist/
+wheels/
+*.egg-info
+# Virtual environments
+.venv

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.11

Dockerfile ADDED Viewed

	@@ -0,0 +1,33 @@

+# Base image
+FROM python:3.10-slim
+# Install system dependencies (including curl for healthcheck)
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    curl \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Create user (HF requirement)
+RUN useradd -m -u 1000 user
+# Set working directory
+WORKDIR /home/user/app
+# Copy requirements first (for caching)
+COPY --chown=user requirements.txt .
+# Install dependencies
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
+# Copy app
+COPY --chown=user . .
+# Switch to user
+USER user
+# Expose port (default 8001, but configurable)
+EXPOSE 8001
+# Run FastAPI with APP_PORT environment variable (default 8001)
+CMD ["sh", "-c", "uvicorn main:app --host 0.0.0.0 --port ${APP_PORT:-8001}"]

README.md CHANGED Viewed

@@ -1,10 +1,213 @@
----
-title: Gemme4
-emoji: 🐢
-colorFrom: indigo
-colorTo: blue
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Gemma4 FastAPI Application
+A FastAPI application that integrates with HuggingFace to serve the Gemma-4-E2B model via REST API endpoints.
+## Features
+- **Text Generation**: Generate text using Gemma-4's advanced reasoning capabilities
+- **Chat Interface**: Interactive chat with conversation memory
+- **Thinking Mode**: Enable Gemma-4's internal reasoning process
+- **Streaming Support**: Real-time streaming responses
+- **Health Monitoring**: Service health checks and model status
+- **Docker Containerization**: Easy deployment with Docker Compose
+- **GPU Support**: Automatic GPU detection and optimization
+- **Local Execution**: No cloud dependencies, runs entirely on your hardware
+## Prerequisites
+- Docker and Docker Compose
+- At least 8GB RAM (16GB recommended for optimal performance)
+- NVIDIA GPU with CUDA support (optional, CPU mode available)
+- HuggingFace account (optional, for faster downloads)
+## Quick Start
+1. **Clone the repository**
+   ```bash
+   git clone <repository-url>
+   cd gemma4-fastapi
+   ```
+2. **Configure environment** (optional)
+   ```bash
+   cp .env.example .env
+   # Edit .env with your preferred settings if desired
+   ```
+3. **Run the setup script**
+   ```bash
+   chmod +x setup.sh
+   ./setup.sh
+   ```
+   Or manually:
+   ```bash
+   # Build and start the application
+   docker compose up --build -d
+   # Wait for the application to be ready
+   # The first startup may take several minutes as the model downloads
+   sleep 120
+   curl http://localhost:8001/api/health
+   ```
+4. **Test the API**
+   ```bash
+   curl http://localhost:8001/api/health
+   ```
+## API Endpoints
+### Health Check
+- `GET /api/health` - Check service and model status
+### Text Generation
+- `POST /api/generate` - Generate text from a prompt
+### Chat
+- `POST /api/chat` - Chat with the model
+## API Usage Examples
+### Text Generation
+```bash
+curl -X POST "http://localhost:8001/api/generate" \
+     -H "Content-Type: application/json" \
+     -d '{
+       "prompt": "Explain quantum computing in simple terms",
+       "think": false,
+       "stream": false
+     }'
+```
+### Chat
+```bash
+curl -X POST "http://localhost:8001/api/chat" \
+     -H "Content-Type: application/json" \
+     -d '{
+       "messages": [
+         {"role": "user", "content": "Hello, how are you?"}
+       ],
+       "think": false,
+       "stream": false
+     }'
+```
+### Streaming Response
+```bash
+curl -X POST "http://localhost:8001/api/generate" \
+     -H "Content-Type: application/json" \
+     -d '{
+       "prompt": "Write a short story",
+       "stream": true
+     }'
+```
+## Configuration
+Environment variables in `.env`:
+- `MODEL_NAME`: HuggingFace model to use (default: google/gemma-4-E2B)
+- `APP_HOST`: FastAPI host (default: 0.0.0.0)
+- `APP_PORT`: FastAPI port (default: 8001)
+- `LOG_LEVEL`: Logging level (default: INFO)
+## Available Models
+The application works with any causal language model from HuggingFace. Some recommended options:
+- `google/gemma-4-E2B` - Efficient 2B model (default)
+- `google/gemma-2-2b-it` - Gemma 2 2B instruction-tuned
+- `google/gemma-2-9b` - Gemma 2 9B for better quality
+- `meta-llama/Llama-2-7b` - Llama 2 7B
+- Any other causal language model from HuggingFace
+## Development
+### Local Development (without Docker)
+1. **Create a virtual environment**
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   ```
+2. **Install dependencies**
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. **Run the application**
+   ```bash
+   uvicorn app.main:app --reload
+   ```
+### Running Tests
+```bash
+pytest
+```
+## Docker Commands
+```bash
+# Build the image
+docker compose build
+# Start services
+docker compose up
+# Start in background
+docker compose up -d
+# View logs
+docker compose logs -f
+# Stop services
+docker compose down
+# Rebuild and restart
+docker compose up --build --force-recreate
+```
+## Troubleshooting
+### Model Download Issues
+If the model is taking too long to download on first startup:
+- The model is being downloaded from HuggingFace (this can take 10+ minutes depending on connection)
+- You can monitor progress in the logs: `docker compose logs -f gemma4-app`
+- The model cache is stored in a Docker volume for faster subsequent startups
+### Memory Issues
+If you encounter out-of-memory errors:
+- The model downloads are large. E2B variant is 2B parameters (~5-6GB)
+- Ensure you have at least 16GB total RAM available
+- For CPU-only mode, consider using a smaller model variant
+### Connection Issues
+- Verify the API is running: `curl http://localhost:8001/api/health`
+- Check Docker network: `docker compose ps`
+- View logs: `docker compose logs gemma4-app`
+### GPU Not Being Used
+- Check that NVIDIA Docker runtime is installed: `docker run --rm --runtime=nvidia nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi`
+- Verify the container has GPU access: `docker compose logs gemma4-app` (should show "Using device: cuda")
+## API Documentation
+Once running, visit `http://localhost:8001/docs` for interactive API documentation (Swagger UI).
+## Performance Tips
+1. **GPU Usage**: If you have an NVIDIA GPU with CUDA, the app will automatically use it for faster inference
+2. **Model Caching**: The model is cached in Docker after first download
+3. **Batch Processing**: For best performance with multiple requests, use streaming mode
+4. **Memory Management**: Keep the container memory settings high enough for smooth operation
+## License
+[Add your license here]
+## Contributing
+[Add contribution guidelines here]

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,24 @@

+version: '3.8'
+services:
+  gemma4-app:
+    build: .
+    container_name: gemma4-api
+    ports:
+      - "${APP_PORT:-8001}:${APP_PORT:-8001}"
+    environment:
+      - MODEL_NAME=${MODEL_NAME:-google/gemma-4-E2B}
+      - APP_PORT=${APP_PORT:-8001}
+      - LOG_LEVEL=${LOG_LEVEL:-INFO}
+      - HF_HOME=/home/user/.cache/huggingface
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:${APP_PORT:-8001}/"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 120s
+    volumes:
+      - model_cache:/home/user/.cache/huggingface
+volumes:
+  model_cache:

main.py ADDED Viewed

	@@ -0,0 +1,163 @@

+import time
+import logging
+import os
+from typing import Optional
+from fastapi import FastAPI, Request, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel
+from transformers import pipeline
+# =========================
+# 🔥 LOGGING CONFIG
+# =========================
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+)
+logger = logging.getLogger("gemma-api")
+# =========================
+# 🚀 APP INIT
+# =========================
+app = FastAPI(
+    title="Gemma 4 API",
+    version="1.0.0",
+)
+# =========================
+# 🌐 CORS CONFIG
+# =========================
+origins = [
+    "*",  # ⚠️ change in production
+]
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=origins,
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# FastAPI uses CORSMiddleware to inject proper headers for cross-origin requests :contentReference[oaicite:0]{index=0}
+# =========================
+# ⏱️ REQUEST LOGGING MIDDLEWARE
+# =========================
+@app.middleware("http")
+async def log_requests(request: Request, call_next):
+    start_time = time.time()
+    logger.info(f"➡️ Incoming request: {request.method} {request.url}")
+    try:
+        response = await call_next(request)
+    except Exception as e:
+        logger.exception(f"❌ Unhandled error: {str(e)}")
+        raise
+    process_time = time.time() - start_time
+    logger.info(f"⬅️ Completed in {process_time:.4f}s | Status: {response.status_code}")
+    response.headers["X-Process-Time"] = str(process_time)
+    return response
+# Middleware is ideal for logging request/response lifecycle globally :contentReference[oaicite:1]{index=1}
+# =========================
+# 📦 MODEL LOADING
+# =========================
+pipe = None
+@app.on_event("startup")
+def load_model():
+    global pipe
+    try:
+        logger.info("🔄 Loading Gemma 4 model...")
+        pipe = pipeline(
+            "text-generation",
+            model="google/gemma-4-E2B",
+            device_map="auto",
+            torch_dtype="auto",
+        )
+        logger.info("✅ Model loaded successfully")
+    except Exception as e:
+        logger.exception("❌ Failed to load model")
+        raise e
+# =========================
+# 📥 REQUEST MODEL
+# =========================
+class GenerateRequest(BaseModel):
+    prompt: str
+    max_tokens: Optional[int] = 100
+    temperature: Optional[float] = 0.7
+# =========================
+# 📤 RESPONSE ENDPOINT
+# =========================
+@app.post("/generate")
+async def generate(req: GenerateRequest):
+    if pipe is None:
+        raise HTTPException(status_code=500, detail="Model not loaded")
+    try:
+        logger.info(f"🧠 Generating for prompt: {req.prompt[:50]}...")
+        output = pipe(
+            req.prompt,
+            max_new_tokens=req.max_tokens,
+            temperature=req.temperature,
+            do_sample=True,
+        )
+        result = output[0]["generated_text"]
+        logger.info("✅ Generation successful")
+        return {
+            "success": True,
+            "response": result
+        }
+    except Exception as e:
+        logger.exception("❌ Generation failed")
+        raise HTTPException(status_code=500, detail=str(e))
+# =========================
+# ❤️ HEALTH CHECK
+# =========================
+@app.get("/")
+@app.get("/api/health")
+async def health():
+    return {
+        "status": "ok",
+        "model_loaded": pipe is not None
+    }
+# =========================
+# ❗ GLOBAL ERROR HANDLER
+# =========================
+@app.exception_handler(Exception)
+async def global_exception_handler(request: Request, exc: Exception):
+    logger.exception(f"🔥 Global error: {str(exc)}")
+    return JSONResponse(
+        status_code=500,
+        content={
+            "success": False,
+            "error": str(exc)
+        },
+    )

pyproject.toml ADDED Viewed

	@@ -0,0 +1,7 @@

+[project]
+name = "gemma4"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = []

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi
+uvicorn[standard]
+torch
+accelerate
+sentencepiece
+transformers>=4.42.0

setup.sh ADDED Viewed

	@@ -0,0 +1,51 @@

+#!/bin/bash
+# Setup script for Gemma4 FastAPI application
+echo "Setting up Gemma4 FastAPI application..."
+# Check if Docker is installed
+if ! command -v docker &> /dev/null; then
+    echo "Error: Docker is not installed. Please install Docker first."
+    exit 1
+fi
+# Check if Docker Compose is available
+if ! docker compose version &> /dev/null; then
+    echo "Error: Docker Compose plugin is not available. Please install Docker Compose or enable the Docker Compose plugin."
+    exit 1
+fi
+# Load .env values if present
+if [ -f .env ]; then
+    set -o allexport
+    . .env
+    set +o allexport
+fi
+MODEL_NAME=${MODEL_NAME:-google/gemma-4-E2B}
+APP_PORT=${APP_PORT:-8001}
+echo "Building Docker images..."
+docker compose build
+echo "Starting FastAPI application..."
+docker compose up -d
+echo "Waiting for application to be ready..."
+until curl -sSf http://localhost:${APP_PORT}/ >/dev/null 2>&1; do
+    printf '.'
+    sleep 3
+done
+echo ""
+echo "Setup complete!"
+echo ""
+echo "API will be available at: http://localhost:${APP_PORT}"
+echo "API documentation at: http://localhost:${APP_PORT}/docs"
+echo ""
+echo "To check health: curl http://localhost:${APP_PORT}/"
+echo "To check logs: docker compose logs -f"
+echo "To stop services: docker compose down"
+echo ""
+echo "Note: First time startup may take several minutes as the model is downloaded."

uv.lock ADDED Viewed

	@@ -0,0 +1,8 @@

+version = 1
+revision = 3
+requires-python = ">=3.11"
+[[package]]
+name = "gemma4"
+version = "0.1.0"
+source = { virtual = "." }