Spaces:

goabonga
/

hf-inference-api

Sleeping

App Files Files Community

hf-inference-api / README.md

goabonga

Initial commit: HF Inference API with Gradio interface

b98ed7e unverified 30 days ago

preview code

raw

history blame contribute delete

6.42 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: HF Inference API
emoji: 🤗
colorFrom: yellow
colorTo: pink
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
license: mit

Hugging Face Inference API

REST API and Gradio interface for Hugging Face model inference.

Features

Two inference modes: HF Inference API (lightweight) or local model loading
REST API: FastAPI with automatic OpenAPI documentation
Gradio UI: Web interface for interactive testing
HF Spaces ready: Deploy directly to Hugging Face Spaces

Quick Start

1. Installation

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# For local model inference (optional)
pip install transformers torch

# Copy and configure environment
cp .env.example .env

2. Configure

Edit .env with your settings:

# Use HF Inference API (recommended)
HF_USE_API=true
HF_API_TOKEN=hf_xxxxxxxxxxxxx

# Or load models locally
HF_USE_API=false

3. Run

# Option A: REST API (FastAPI)
python -m app.main

# Option B: Gradio interface
python app.py

Running Options

REST API (FastAPI)

python -m app.main

Gradio Interface

python app.py

URL: http://localhost:7860

Docker

# Build
docker build -t hf-inference-api .

# Run with HF API
docker run -p 8000:8000 \
  -e HF_USE_API=true \
  -e HF_API_TOKEN=hf_xxxxx \
  -e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
  hf-inference-api

# Run with local model
docker run -p 8000:8000 \
  -e HF_USE_API=false \
  -e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
  hf-inference-api

Hugging Face Spaces

Create a new Space at https://huggingface.co/new-space
Select Gradio as SDK
Push these files:
- app.py
- requirements.txt
- app/ folder
Add HF_API_TOKEN in Space Settings > Secrets

API Endpoints

Health Check

curl http://localhost:8000/health

Response:

{
  "status": "ok",
  "model_loaded": true,
  "model_name": "distilbert-base-uncased-finetuned-sst-2-english"
}

Inference

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": "I love this product!"}'

Response:

{
  "predictions": [[{"label": "POSITIVE", "score": 0.9998}]],
  "model_name": "distilbert-base-uncased-finetuned-sst-2-english"
}

Batch Inference

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": ["I love this!", "This is terrible."]}'

With Parameters

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": "The capital of France is",
    "parameters": {"max_new_tokens": 50}
  }'

Configuration

Environment Variables

Variable	Default	Description
`HF_USE_API`	`true`	Use HF Inference API (`true`) or local model (`false`)
`HF_API_TOKEN`	`None`	HF API token (required if `HF_USE_API=true`)
`HF_MODEL_NAME`	`cardiffnlp/twitter-roberta-base-sentiment-latest`	Hugging Face model ID
`HF_TASK`	`text-classification`	Pipeline task type
`HF_HOST`	`0.0.0.0`	Server host
`HF_PORT`	`8000`	Server port
`HF_DEVICE`	`cpu`	Device for local inference (`cpu`, `cuda`, `cuda:0`)
`HF_MAX_BATCH_SIZE`	`32`	Maximum batch size for local inference

Inference Modes

HF Inference API (Recommended)

HF_USE_API=true
HF_API_TOKEN=hf_xxxxxxxxxxxxx

Pros:

No model download required
Lightweight (no torch/transformers)
Fast startup
Free tier available

Cons:

Requires internet connection
Rate limits on free tier
API token required

Local Model

HF_USE_API=false

Requires additional dependencies:

pip install transformers torch

Pros:

No internet required after download
No rate limits
Full control

Cons:

Large dependencies (~2GB for torch)
Model download on first run
More RAM/CPU required

Supported Tasks

Task	Description	Example Model
`text-classification`	Classify text into categories	`distilbert-base-uncased-finetuned-sst-2-english`
`sentiment-analysis`	Analyze sentiment (alias for text-classification)	`nlptown/bert-base-multilingual-uncased-sentiment`
`text-generation`	Generate text from prompt	`gpt2`, `mistralai/Mistral-7B-v0.1`
`summarization`	Summarize long text	`facebook/bart-large-cnn`
`translation`	Translate text	`Helsinki-NLP/opus-mt-en-fr`
`fill-mask`	Fill in masked tokens	`bert-base-uncased`
`question-answering`	Answer questions given context	`deepset/roberta-base-squad2`
`feature-extraction`	Extract embeddings	`sentence-transformers/all-MiniLM-L6-v2`

Project Structure

hf-inference-api/
├── app/
│   ├── __init__.py
│   ├── config.py        # Settings (pydantic-settings)
│   ├── inference.py     # Inference engine (API + local)
│   ├── main.py          # FastAPI application
│   └── models.py        # Pydantic models
├── app.py               # Gradio interface
├── .env.example         # Environment template
├── .gitignore
├── Dockerfile
├── README.md
└── requirements.txt

Examples

Text Classification

HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english
HF_TASK=text-classification

curl -X POST http://localhost:8000/predict \
  -d '{"inputs": "I love this movie!"}'

Text Generation

HF_MODEL_NAME=gpt2
HF_TASK=text-generation

curl -X POST http://localhost:8000/predict \
  -d '{"inputs": "Once upon a time", "parameters": {"max_new_tokens": 50}}'

Summarization

HF_MODEL_NAME=facebook/bart-large-cnn
HF_TASK=summarization

curl -X POST http://localhost:8000/predict \
  -d '{"inputs": "Long article text here..."}'

Translation (EN -> FR)

HF_MODEL_NAME=Helsinki-NLP/opus-mt-en-fr
HF_TASK=translation

curl -X POST http://localhost:8000/predict \
  -d '{"inputs": "Hello, how are you?"}'