Spaces:

goabonga
/

hf-inference-api

Sleeping

App Files Files Community

goabonga commited on Jan 4

Commit

b98ed7e

unverified ·

0 Parent(s):

Initial commit: HF Inference API with Gradio interface

Browse files

- FastAPI REST API for model inference (/predict, /health endpoints)
- Gradio web interface for interactive testing
- Two inference modes: HF Inference API (lightweight) or local model
- Support for multiple tasks: text-classification, text-generation, summarization, translation, fill-mask, question-answering
- Docker support for containerized deployment
- Ready for Hugging Face Spaces deployment

Files changed (12) hide show

.env.example +40 -0
.gitignore +25 -0
Dockerfile +27 -0
README.md +303 -0
app.py +76 -0
app/__init__.py +0 -0
app/config.py +30 -0
app/inference.py +128 -0
app/main.py +86 -0
app/models.py +29 -0
requirements-dev.txt +10 -0
requirements.txt +4 -0

.env.example ADDED Viewed

	@@ -0,0 +1,40 @@

+# Hugging Face Inference API Configuration
+# ============================================
+# Mode: API (recommended) or Local
+# ============================================
+# Use HF Inference API (true) or load model locally (false)
+HF_USE_API=true
+# HF API token (get it from https://huggingface.co/settings/tokens)
+# Required if HF_USE_API=true
+HF_API_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
+# ============================================
+# Model Configuration
+# ============================================
+# Model to use (any Hugging Face model ID)
+HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english
+# Task type (text-classification, text-generation, summarization, etc.)
+HF_TASK=text-classification
+# ============================================
+# Server Configuration
+# ============================================
+HF_HOST=0.0.0.0
+HF_PORT=8000
+# ============================================
+# Local Mode Only (ignored if HF_USE_API=true)
+# ============================================
+# Device (cpu, cuda, cuda:0, etc.)
+HF_DEVICE=cpu
+# Maximum batch size for inference
+HF_MAX_BATCH_SIZE=32

.gitignore ADDED Viewed

	@@ -0,0 +1,25 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+venv/
+.venv/
+ENV/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+# Environment
+.env
+# Models cache
+.cache/
+models/
+# Logs
+*.log

Dockerfile ADDED Viewed

	@@ -0,0 +1,27 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for layer caching
+COPY requirements.txt requirements-dev.txt ./
+RUN pip install --no-cache-dir -r requirements-dev.txt
+# Copy application code
+COPY app/ ./app/
+# Create non-root user
+RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
+USER appuser
+# Set environment variables
+ENV HF_HOME=/app/.cache/huggingface
+ENV TRANSFORMERS_CACHE=/app/.cache/huggingface
+EXPOSE 8000
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

README.md ADDED Viewed

	@@ -0,0 +1,303 @@

+---
+title: HF Inference API
+emoji: 🤗
+colorFrom: yellow
+colorTo: pink
+sdk: gradio
+sdk_version: 6.2.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# Hugging Face Inference API
+REST API and Gradio interface for Hugging Face model inference.
+## Features
+- **Two inference modes**: HF Inference API (lightweight) or local model loading
+- **REST API**: FastAPI with automatic OpenAPI documentation
+- **Gradio UI**: Web interface for interactive testing
+- **HF Spaces ready**: Deploy directly to Hugging Face Spaces
+## Quick Start
+### 1. Installation
+```bash
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate
+# Install dependencies
+pip install -r requirements.txt
+# For local model inference (optional)
+pip install transformers torch
+# Copy and configure environment
+cp .env.example .env
+```
+### 2. Configure
+Edit `.env` with your settings:
+```bash
+# Use HF Inference API (recommended)
+HF_USE_API=true
+HF_API_TOKEN=hf_xxxxxxxxxxxxx
+# Or load models locally
+HF_USE_API=false
+```
+### 3. Run
+```bash
+# Option A: REST API (FastAPI)
+python -m app.main
+# Option B: Gradio interface
+python app.py
+```
+## Running Options
+### REST API (FastAPI)
+```bash
+python -m app.main
+```
+- URL: http://localhost:8000
+- Swagger: http://localhost:8000/docs
+- ReDoc: http://localhost:8000/redoc
+### Gradio Interface
+```bash
+python app.py
+```
+- URL: http://localhost:7860
+### Docker
+```bash
+# Build
+docker build -t hf-inference-api .
+# Run with HF API
+docker run -p 8000:8000 \
+  -e HF_USE_API=true \
+  -e HF_API_TOKEN=hf_xxxxx \
+  -e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
+  hf-inference-api
+# Run with local model
+docker run -p 8000:8000 \
+  -e HF_USE_API=false \
+  -e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
+  hf-inference-api
+```
+### Hugging Face Spaces
+1. Create a new Space at https://huggingface.co/new-space
+2. Select **Gradio** as SDK
+3. Push these files:
+   - `app.py`
+   - `requirements.txt`
+   - `app/` folder
+4. Add `HF_API_TOKEN` in Space Settings > Secrets
+## API Endpoints
+### Health Check
+```bash
+curl http://localhost:8000/health
+```
+Response:
+```json
+{
+  "status": "ok",
+  "model_loaded": true,
+  "model_name": "distilbert-base-uncased-finetuned-sst-2-english"
+}
+```
+### Inference
+```bash
+curl -X POST http://localhost:8000/predict \
+  -H "Content-Type: application/json" \
+  -d '{"inputs": "I love this product!"}'
+```
+Response:
+```json
+{
+  "predictions": [[{"label": "POSITIVE", "score": 0.9998}]],
+  "model_name": "distilbert-base-uncased-finetuned-sst-2-english"
+}
+```
+### Batch Inference
+```bash
+curl -X POST http://localhost:8000/predict \
+  -H "Content-Type: application/json" \
+  -d '{"inputs": ["I love this!", "This is terrible."]}'
+```
+### With Parameters
+```bash
+curl -X POST http://localhost:8000/predict \
+  -H "Content-Type: application/json" \
+  -d '{
+    "inputs": "The capital of France is",
+    "parameters": {"max_new_tokens": 50}
+  }'
+```
+## Configuration
+### Environment Variables
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `HF_USE_API` | `true` | Use HF Inference API (`true`) or local model (`false`) |
+| `HF_API_TOKEN` | `None` | HF API token (required if `HF_USE_API=true`) |
+| `HF_MODEL_NAME` | `cardiffnlp/twitter-roberta-base-sentiment-latest` | Hugging Face model ID |
+| `HF_TASK` | `text-classification` | Pipeline task type |
+| `HF_HOST` | `0.0.0.0` | Server host |
+| `HF_PORT` | `8000` | Server port |
+| `HF_DEVICE` | `cpu` | Device for local inference (`cpu`, `cuda`, `cuda:0`) |
+| `HF_MAX_BATCH_SIZE` | `32` | Maximum batch size for local inference |
+### Inference Modes
+#### HF Inference API (Recommended)
+```bash
+HF_USE_API=true
+HF_API_TOKEN=hf_xxxxxxxxxxxxx
+```
+Pros:
+- No model download required
+- Lightweight (no torch/transformers)
+- Fast startup
+- Free tier available
+Cons:
+- Requires internet connection
+- Rate limits on free tier
+- API token required
+#### Local Model
+```bash
+HF_USE_API=false
+```
+Requires additional dependencies:
+```bash
+pip install transformers torch
+```
+Pros:
+- No internet required after download
+- No rate limits
+- Full control
+Cons:
+- Large dependencies (~2GB for torch)
+- Model download on first run
+- More RAM/CPU required
+## Supported Tasks
+| Task | Description | Example Model |
+|------|-------------|---------------|
+| `text-classification` | Classify text into categories | `distilbert-base-uncased-finetuned-sst-2-english` |
+| `sentiment-analysis` | Analyze sentiment (alias for text-classification) | `nlptown/bert-base-multilingual-uncased-sentiment` |
+| `text-generation` | Generate text from prompt | `gpt2`, `mistralai/Mistral-7B-v0.1` |
+| `summarization` | Summarize long text | `facebook/bart-large-cnn` |
+| `translation` | Translate text | `Helsinki-NLP/opus-mt-en-fr` |
+| `fill-mask` | Fill in masked tokens | `bert-base-uncased` |
+| `question-answering` | Answer questions given context | `deepset/roberta-base-squad2` |
+| `feature-extraction` | Extract embeddings | `sentence-transformers/all-MiniLM-L6-v2` |
+## Project Structure
+```
+hf-inference-api/
+├── app/
+│   ├── __init__.py
+│   ├── config.py        # Settings (pydantic-settings)
+│   ├── inference.py     # Inference engine (API + local)
+│   ├── main.py          # FastAPI application
+│   └── models.py        # Pydantic models
+├── app.py               # Gradio interface
+├── .env.example         # Environment template
+├── .gitignore
+├── Dockerfile
+├── README.md
+└── requirements.txt
+```
+## Examples
+### Text Classification
+```bash
+HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english
+HF_TASK=text-classification
+```
+```bash
+curl -X POST http://localhost:8000/predict \
+  -d '{"inputs": "I love this movie!"}'
+```
+### Text Generation
+```bash
+HF_MODEL_NAME=gpt2
+HF_TASK=text-generation
+```
+```bash
+curl -X POST http://localhost:8000/predict \
+  -d '{"inputs": "Once upon a time", "parameters": {"max_new_tokens": 50}}'
+```
+### Summarization
+```bash
+HF_MODEL_NAME=facebook/bart-large-cnn
+HF_TASK=summarization
+```
+```bash
+curl -X POST http://localhost:8000/predict \
+  -d '{"inputs": "Long article text here..."}'
+```
+### Translation (EN -> FR)
+```bash
+HF_MODEL_NAME=Helsinki-NLP/opus-mt-en-fr
+HF_TASK=translation
+```
+```bash
+curl -X POST http://localhost:8000/predict \
+  -d '{"inputs": "Hello, how are you?"}'
+```

app.py ADDED Viewed

	@@ -0,0 +1,76 @@

+"""Gradio interface for Hugging Face inference."""
+import gradio as gr
+from huggingface_hub import InferenceClient
+try:
+    import spaces
+    SPACES_AVAILABLE = True
+except ImportError:
+    SPACES_AVAILABLE = False
+from app.config import get_settings
+settings = get_settings()
+client = InferenceClient(model=settings.model_name, token=settings.api_token)
+def _predict(text: str) -> str:
+    """Run inference on the input text."""
+    if not text.strip():
+        return "Please enter some text."
+    task = settings.task
+    try:
+        if task in ("text-classification", "sentiment-analysis"):
+            results = client.text_classification(text)
+            output = "\n".join(
+                [f"{r['label']}: {r['score']:.2%}" for r in results]
+            )
+        elif task == "text-generation":
+            output = client.text_generation(text, max_new_tokens=100)
+        elif task == "summarization":
+            output = client.summarization(text)
+        elif task == "translation":
+            output = client.translation(text)
+        elif task == "fill-mask":
+            results = client.fill_mask(text)
+            output = "\n".join(
+                [f"{r['token_str']}: {r['score']:.2%}" for r in results]
+            )
+        else:
+            output = str(client.post(json={"inputs": text}))
+        return output
+    except Exception as e:
+        return f"Error: {e}"
+# Apply @spaces.GPU decorator only on HF Spaces
+if SPACES_AVAILABLE:
+    predict = spaces.GPU(duration=60)(_predict)
+else:
+    predict = _predict
+demo = gr.Interface(
+    fn=predict,
+    inputs=gr.Textbox(
+        label="Input Text",
+        placeholder="Enter text here...",
+        lines=4,
+    ),
+    outputs=gr.Textbox(label="Result", lines=6),
+    title="Hugging Face Inference",
+    description=f"Model: **{settings.model_name}** | Task: **{settings.task}**",
+    examples=[
+        ["I love this product! It's amazing."],
+        ["This is the worst experience ever."],
+        ["The weather is nice today."],
+    ],
+    flagging_mode="never",
+)
+if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7860)

app/__init__.py ADDED Viewed

File without changes

app/config.py ADDED Viewed

	@@ -0,0 +1,30 @@

+"""Configuration settings for the inference API."""
+from functools import lru_cache
+from pydantic_settings import BaseSettings
+class Settings(BaseSettings):
+    """Application settings loaded from environment variables."""
+    model_name: str = "distilbert-base-uncased-finetuned-sst-2-english"
+    task: str = "text-classification"
+    host: str = "0.0.0.0"
+    port: int = 8000
+    max_batch_size: int = 32
+    device: str = "cpu"
+    # HF Inference API settings
+    use_api: bool = True  # True = use HF API, False = load model locally
+    api_token: str | None = None  # HF API token (required if use_api=True)
+    class Config:
+        env_file = ".env"
+        env_prefix = "HF_"
+@lru_cache
+def get_settings() -> Settings:
+    """Get cached settings instance."""
+    return Settings()

app/inference.py ADDED Viewed

	@@ -0,0 +1,128 @@

+"""Inference engine using Hugging Face API or local transformers."""
+import logging
+from typing import Any
+from huggingface_hub import InferenceClient
+from .config import Settings
+logger = logging.getLogger(__name__)
+class InferenceEngine:
+    """Handles model loading and inference."""
+    def __init__(self, settings: Settings) -> None:
+        """Initialize the inference engine."""
+        self.settings = settings
+        self.client: InferenceClient | None = None
+        self.pipeline = None
+        self.model_loaded = False
+        self.use_api = settings.use_api
+    def load_model(self) -> None:
+        """Load the model (API client or local pipeline)."""
+        if self.use_api:
+            self._init_api_client()
+        else:
+            self._init_local_pipeline()
+    def _init_api_client(self) -> None:
+        """Initialize the HF Inference API client."""
+        logger.info(
+            "Initializing HF Inference API client for model: %s",
+            self.settings.model_name,
+        )
+        self.client = InferenceClient(
+            model=self.settings.model_name,
+            token=self.settings.api_token,
+        )
+        self.model_loaded = True
+        logger.info("HF Inference API client ready")
+    def _init_local_pipeline(self) -> None:
+        """Load the model locally using transformers."""
+        try:
+            from transformers import pipeline
+        except ImportError:
+            raise ImportError(
+                "transformers and torch are required for local inference. "
+                "Install them with: pip install transformers torch"
+            )
+        logger.info(
+            "Loading local model: %s for task: %s",
+            self.settings.model_name,
+            self.settings.task,
+        )
+        self.pipeline = pipeline(
+            task=self.settings.task,
+            model=self.settings.model_name,
+            device=self.settings.device if self.settings.device != "cpu" else -1,
+        )
+        self.model_loaded = True
+        logger.info("Local model loaded successfully")
+    def predict(
+        self, inputs: str | list[str], parameters: dict[str, Any] | None = None
+    ) -> list[Any]:
+        """Run inference on the input(s)."""
+        if not self.model_loaded:
+            raise RuntimeError("Model not loaded")
+        if self.use_api:
+            return self._predict_api(inputs, parameters)
+        else:
+            return self._predict_local(inputs, parameters)
+    def _predict_api(
+        self, inputs: str | list[str], parameters: dict[str, Any] | None = None
+    ) -> list[Any]:
+        """Run inference using HF Inference API."""
+        params = parameters or {}
+        task = self.settings.task
+        if isinstance(inputs, str):
+            inputs_list = [inputs]
+        else:
+            inputs_list = inputs
+        results = []
+        for text in inputs_list:
+            result = self._call_api(task, text, params)
+            results.append(result)
+        return results
+    def _call_api(self, task: str, text: str, params: dict[str, Any]) -> Any:
+        """Call the appropriate API method based on task."""
+        if task in ("text-classification", "sentiment-analysis"):
+            return self.client.text_classification(text, **params)
+        elif task == "text-generation":
+            return self.client.text_generation(text, **params)
+        elif task == "summarization":
+            return self.client.summarization(text, **params)
+        elif task == "translation":
+            return self.client.translation(text, **params)
+        elif task == "fill-mask":
+            return self.client.fill_mask(text, **params)
+        elif task == "question-answering":
+            context = params.pop("context", "")
+            return self.client.question_answering(question=text, context=context)
+        elif task == "feature-extraction":
+            return self.client.feature_extraction(text, **params)
+        else:
+            # Generic post for unsupported tasks
+            return self.client.post(json={"inputs": text, **params})
+    def _predict_local(
+        self, inputs: str | list[str], parameters: dict[str, Any] | None = None
+    ) -> list[Any]:
+        """Run inference using local transformers pipeline."""
+        params = parameters or {}
+        results = self.pipeline(inputs, **params)
+        if isinstance(inputs, str):
+            return [results]
+        return results

app/main.py ADDED Viewed

	@@ -0,0 +1,86 @@

+"""Main FastAPI application for Hugging Face inference API."""
+import logging
+from contextlib import asynccontextmanager
+from fastapi import FastAPI, HTTPException
+from .config import get_settings
+from .inference import InferenceEngine
+from .models import HealthResponse, InferenceRequest, InferenceResponse
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+settings = get_settings()
+engine = InferenceEngine(settings)
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Handle application startup and shutdown."""
+    logger.info("Starting inference API...")
+    engine.load_model()
+    yield
+    logger.info("Shutting down inference API...")
+app = FastAPI(
+    title="Hugging Face Inference API",
+    description="REST API for Hugging Face model inference",
+    version="1.0.0",
+    lifespan=lifespan,
+)
+@app.get("/health", response_model=HealthResponse)
+async def health_check() -> HealthResponse:
+    """Check API and model health status."""
+    return HealthResponse(
+        status="ok",
+        model_loaded=engine.model_loaded,
+        model_name=settings.model_name if engine.model_loaded else None,
+    )
+@app.post("/predict", response_model=InferenceResponse)
+async def predict(request: InferenceRequest) -> InferenceResponse:
+    """Run inference on the provided input(s)."""
+    if not engine.model_loaded:
+        raise HTTPException(status_code=503, detail="Model not loaded")
+    try:
+        predictions = engine.predict(request.inputs, request.parameters)
+        return InferenceResponse(
+            predictions=predictions,
+            model_name=settings.model_name,
+        )
+    except Exception as e:
+        logger.exception("Inference failed")
+        raise HTTPException(status_code=500, detail=str(e))
+@app.get("/")
+async def root():
+    """Root endpoint with API information."""
+    return {
+        "name": "Hugging Face Inference API",
+        "version": "1.0.0",
+        "model": settings.model_name,
+        "task": settings.task,
+        "docs": "/docs",
+    }
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(
+        "app.main:app",
+        host=settings.host,
+        port=settings.port,
+        reload=True,
+    )

app/models.py ADDED Viewed

	@@ -0,0 +1,29 @@

+"""Pydantic models for API requests and responses."""
+from typing import Any
+from pydantic import BaseModel, Field
+class InferenceRequest(BaseModel):
+    """Request model for inference endpoint."""
+    inputs: str | list[str] = Field(..., description="Text input(s) for inference")
+    parameters: dict[str, Any] = Field(
+        default_factory=dict, description="Optional model parameters"
+    )
+class InferenceResponse(BaseModel):
+    """Response model for inference endpoint."""
+    predictions: list[Any] = Field(..., description="Model predictions")
+    model_name: str = Field(..., description="Name of the model used")
+class HealthResponse(BaseModel):
+    """Response model for health check endpoint."""
+    status: str = "ok"
+    model_loaded: bool = False
+    model_name: str | None = None

requirements-dev.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+# Full requirements for local development
+-r requirements.txt
+fastapi>=0.109.0
+uvicorn[standard]>=0.27.0
+gradio>=4.0.0
+# Local inference (optional - only needed if HF_USE_API=false)
+# transformers>=4.37.0
+# torch>=2.1.0

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+# Requirements for HF Spaces deployment
+huggingface_hub>=0.20.0
+pydantic>=2.5.0
+pydantic-settings>=2.1.0