Spaces:

khushalcodiste
/

gemme4

Running

App Files Files Community

khushalcodiste commited on Apr 7

Commit

3a5ea54

1 Parent(s): 6c84960

Add application file

Browse files

Files changed (3) hide show

.gitattributes +9 -33
.gitignore +1 -0
README.md +69 -45

.gitattributes CHANGED Viewed

@@ -1,35 +1,11 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

+*.py text eol=lf
+*.sh text eol=lf
+*.yml text eol=lf
+*.yaml text eol=lf
+*.json text eol=lf
+*.md text eol=lf
+*.txt text eol=lf
+# HuggingFace model cache should not be committed
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text

.gitignore CHANGED Viewed

@@ -8,3 +8,4 @@ wheels/
 # Virtual environments
 .venv

 # Virtual environments
 .venv
+Auth.txt

README.md CHANGED Viewed

@@ -1,24 +1,35 @@
 # Gemma4 FastAPI Application
-A FastAPI application that integrates with HuggingFace to serve the Gemma-4-E2B model via REST API endpoints.
 ## Features
 - **Text Generation**: Generate text using Gemma-4's advanced reasoning capabilities
-- **Chat Interface**: Interactive chat with conversation memory
-- **Thinking Mode**: Enable Gemma-4's internal reasoning process
-- **Streaming Support**: Real-time streaming responses
 - **Health Monitoring**: Service health checks and model status
 - **Docker Containerization**: Easy deployment with Docker Compose
-- **GPU Support**: Automatic GPU detection and optimization
-- **Local Execution**: No cloud dependencies, runs entirely on your hardware
 ## Prerequisites
 - Docker and Docker Compose
-- At least 8GB RAM (16GB recommended for optimal performance)
-- NVIDIA GPU with CUDA support (optional, CPU mode available)
-- HuggingFace account (optional, for faster downloads)
 ## Quick Start
@@ -48,58 +59,57 @@ A FastAPI application that integrates with HuggingFace to serve the Gemma-4-E2B
    # Wait for the application to be ready
    # The first startup may take several minutes as the model downloads
    sleep 120
-   curl http://localhost:8001/api/health
    ```
 4. **Test the API**
    ```bash
-   curl http://localhost:8001/api/health
    ```
 ## API Endpoints
 ### Health Check
-- `GET /api/health` - Check service and model status
 ### Text Generation
-- `POST /api/generate` - Generate text from a prompt
-### Chat
-- `POST /api/chat` - Chat with the model
 ## API Usage Examples
 ### Text Generation
 ```bash
-curl -X POST "http://localhost:8001/api/generate" \
      -H "Content-Type: application/json" \
      -d '{
        "prompt": "Explain quantum computing in simple terms",
-       "think": false,
-       "stream": false
      }'
 ```
-### Chat
-```bash
-curl -X POST "http://localhost:8001/api/chat" \
-     -H "Content-Type: application/json" \
-     -d '{
-       "messages": [
-         {"role": "user", "content": "Hello, how are you?"}
-       ],
-       "think": false,
-       "stream": false
-     }'
 ```
-### Streaming Response
 ```bash
-curl -X POST "http://localhost:8001/api/generate" \
      -H "Content-Type: application/json" \
      -d '{
-       "prompt": "Write a short story",
-       "stream": true
      }'
 ```
@@ -114,13 +124,14 @@ Environment variables in `.env`:
 ## Available Models
-The application works with any causal language model from HuggingFace. Some recommended options:
-- `google/gemma-4-E2B` - Efficient 2B model (default)
 - `google/gemma-2-2b-it` - Gemma 2 2B instruction-tuned
-- `google/gemma-2-9b` - Gemma 2 9B for better quality
 - `meta-llama/Llama-2-7b` - Llama 2 7B
-- Any other causal language model from HuggingFace
 ## Development
@@ -189,9 +200,22 @@ If you encounter out-of-memory errors:
 - Check Docker network: `docker compose ps`
 - View logs: `docker compose logs gemma4-app`
-### GPU Not Being Used
-- Check that NVIDIA Docker runtime is installed: `docker run --rm --runtime=nvidia nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi`
-- Verify the container has GPU access: `docker compose logs gemma4-app` (should show "Using device: cuda")
 ## API Documentation
@@ -199,10 +223,10 @@ Once running, visit `http://localhost:8001/docs` for interactive API documentati
 ## Performance Tips
-1. **GPU Usage**: If you have an NVIDIA GPU with CUDA, the app will automatically use it for faster inference
-2. **Model Caching**: The model is cached in Docker after first download
-3. **Batch Processing**: For best performance with multiple requests, use streaming mode
-4. **Memory Management**: Keep the container memory settings high enough for smooth operation
 ## License

+---
+title: Gemma4 FastAPI API
+emoji: 🚀
+colorFrom: purple
+colorTo: blue
+sdk: docker
+sdk_version: "latest"
+app_file: main.py
+pinned: false
+short_description: Text generation with Gemma-4-E2B via FastAPI
+---
 # Gemma4 FastAPI Application
+A FastAPI application that serves the Gemma-4-E2B model via REST API endpoints with enterprise-grade reliability and monitoring.
 ## Features
 - **Text Generation**: Generate text using Gemma-4's advanced reasoning capabilities
 - **Health Monitoring**: Service health checks and model status
 - **Docker Containerization**: Easy deployment with Docker Compose
+- **CPU/GPU Support**: Automatic device detection
+- **Local Execution**: No cloud dependencies, runs on your hardware
+- **FastAPI**: Interactive API documentation at `/docs` and `/redoc`
 ## Prerequisites
 - Docker and Docker Compose
+- At least 4GB RAM (8GB+ recommended)
+- CPU: Works on any modern CPU
+- Optional: NVIDIA GPU for faster inference (can be enabled via docker-compose config)
+- HuggingFace account (optional, for faster model downloads)
 ## Quick Start
    # Wait for the application to be ready
    # The first startup may take several minutes as the model downloads
    sleep 120
+   curl http://localhost:8001/
    ```
 4. **Test the API**
    ```bash
+   curl http://localhost:8001/
    ```
 ## API Endpoints
 ### Health Check
+- `GET /` - Check service and model status
+- `GET /api/health` - Alias for health check
 ### Text Generation
+- `POST /generate` - Generate text from a prompt
 ## API Usage Examples
+### Health Check
+```bash
+curl http://localhost:8001/
+```
 ### Text Generation
 ```bash
+curl -X POST "http://localhost:8001/generate" \
      -H "Content-Type: application/json" \
      -d '{
        "prompt": "Explain quantum computing in simple terms",
+       "max_tokens": 200,
+       "temperature": 0.7
      }'
 ```
+Response:
+```json
+{
+  "success": true,
+  "response": "Quantum computing is..."
+}
 ```
+### Generate with Different Parameters
 ```bash
+curl -X POST "http://localhost:8001/generate" \
      -H "Content-Type: application/json" \
      -d '{
+       "prompt": "Write a poem about AI",
+       "max_tokens": 150,
+       "temperature": 0.9
      }'
 ```
 ## Available Models
+The application works with any causal language model from HuggingFace. Default and recommended options:
+- `google/gemma-4-E2B` - Efficient 2B model (default, lightweight)
 - `google/gemma-2-2b-it` - Gemma 2 2B instruction-tuned
+- `google/gemma-2-9b` - Gemma 2 9B (better quality, needs more RAM)
 - `meta-llama/Llama-2-7b` - Llama 2 7B
+To change models, update the `MODEL_NAME` environment variable in `.env`.
 ## Development
 - Check Docker network: `docker compose ps`
 - View logs: `docker compose logs gemma4-app`
+### Application Won't Start
+- Check system resources: `docker stats`
+- Ensure port 8001 is available: `lsof -i :8001` (or `netstat -ano | findstr :8001` on Windows)
+- Increase Docker memory if needed via Docker Desktop settings
+## HuggingFace Spaces Deployment
+This repository is configured for deployment on [HuggingFace Spaces](https://huggingface.co/spaces):
+1. Fork or clone this repository
+2. Create a new Space on HuggingFace (Docker SDK)
+3. Connect your repository
+4. The app will deploy automatically
+5. Access via the Space URL
+Note: First startup may take 5-10 minutes as the model downloads.
 ## API Documentation
 ## Performance Tips
+1. **CPU-based**: Current setup uses CPU (optimized for resource efficiency)
+2. **Model Caching**: The model is cached after first download
+3. **Memory**: 4GB minimum, 8GB+ recommended for smooth operation
+4. **Inference Speed**: Depends on hardware; typically 10-30 tokens/second on modern CPUs
 ## License