Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
metadata
title: HF Inference API
emoji: π€
colorFrom: yellow
colorTo: pink
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
license: mit
Hugging Face Inference API
REST API and Gradio interface for Hugging Face model inference.
Features
- Two inference modes: HF Inference API (lightweight) or local model loading
- REST API: FastAPI with automatic OpenAPI documentation
- Gradio UI: Web interface for interactive testing
- HF Spaces ready: Deploy directly to Hugging Face Spaces
Quick Start
1. Installation
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# For local model inference (optional)
pip install transformers torch
# Copy and configure environment
cp .env.example .env
2. Configure
Edit .env with your settings:
# Use HF Inference API (recommended)
HF_USE_API=true
HF_API_TOKEN=hf_xxxxxxxxxxxxx
# Or load models locally
HF_USE_API=false
3. Run
# Option A: REST API (FastAPI)
python -m app.main
# Option B: Gradio interface
python app.py
Running Options
REST API (FastAPI)
python -m app.main
- URL: http://localhost:8000
- Swagger: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Gradio Interface
python app.py
Docker
# Build
docker build -t hf-inference-api .
# Run with HF API
docker run -p 8000:8000 \
-e HF_USE_API=true \
-e HF_API_TOKEN=hf_xxxxx \
-e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
hf-inference-api
# Run with local model
docker run -p 8000:8000 \
-e HF_USE_API=false \
-e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
hf-inference-api
Hugging Face Spaces
- Create a new Space at https://huggingface.co/new-space
- Select Gradio as SDK
- Push these files:
app.pyrequirements.txtapp/folder
- Add
HF_API_TOKENin Space Settings > Secrets
API Endpoints
Health Check
curl http://localhost:8000/health
Response:
{
"status": "ok",
"model_loaded": true,
"model_name": "distilbert-base-uncased-finetuned-sst-2-english"
}
Inference
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"inputs": "I love this product!"}'
Response:
{
"predictions": [[{"label": "POSITIVE", "score": 0.9998}]],
"model_name": "distilbert-base-uncased-finetuned-sst-2-english"
}
Batch Inference
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"inputs": ["I love this!", "This is terrible."]}'
With Parameters
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": "The capital of France is",
"parameters": {"max_new_tokens": 50}
}'
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
HF_USE_API |
true |
Use HF Inference API (true) or local model (false) |
HF_API_TOKEN |
None |
HF API token (required if HF_USE_API=true) |
HF_MODEL_NAME |
cardiffnlp/twitter-roberta-base-sentiment-latest |
Hugging Face model ID |
HF_TASK |
text-classification |
Pipeline task type |
HF_HOST |
0.0.0.0 |
Server host |
HF_PORT |
8000 |
Server port |
HF_DEVICE |
cpu |
Device for local inference (cpu, cuda, cuda:0) |
HF_MAX_BATCH_SIZE |
32 |
Maximum batch size for local inference |
Inference Modes
HF Inference API (Recommended)
HF_USE_API=true
HF_API_TOKEN=hf_xxxxxxxxxxxxx
Pros:
- No model download required
- Lightweight (no torch/transformers)
- Fast startup
- Free tier available
Cons:
- Requires internet connection
- Rate limits on free tier
- API token required
Local Model
HF_USE_API=false
Requires additional dependencies:
pip install transformers torch
Pros:
- No internet required after download
- No rate limits
- Full control
Cons:
- Large dependencies (~2GB for torch)
- Model download on first run
- More RAM/CPU required
Supported Tasks
| Task | Description | Example Model |
|---|---|---|
text-classification |
Classify text into categories | distilbert-base-uncased-finetuned-sst-2-english |
sentiment-analysis |
Analyze sentiment (alias for text-classification) | nlptown/bert-base-multilingual-uncased-sentiment |
text-generation |
Generate text from prompt | gpt2, mistralai/Mistral-7B-v0.1 |
summarization |
Summarize long text | facebook/bart-large-cnn |
translation |
Translate text | Helsinki-NLP/opus-mt-en-fr |
fill-mask |
Fill in masked tokens | bert-base-uncased |
question-answering |
Answer questions given context | deepset/roberta-base-squad2 |
feature-extraction |
Extract embeddings | sentence-transformers/all-MiniLM-L6-v2 |
Project Structure
hf-inference-api/
βββ app/
β βββ __init__.py
β βββ config.py # Settings (pydantic-settings)
β βββ inference.py # Inference engine (API + local)
β βββ main.py # FastAPI application
β βββ models.py # Pydantic models
βββ app.py # Gradio interface
βββ .env.example # Environment template
βββ .gitignore
βββ Dockerfile
βββ README.md
βββ requirements.txt
Examples
Text Classification
HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english
HF_TASK=text-classification
curl -X POST http://localhost:8000/predict \
-d '{"inputs": "I love this movie!"}'
Text Generation
HF_MODEL_NAME=gpt2
HF_TASK=text-generation
curl -X POST http://localhost:8000/predict \
-d '{"inputs": "Once upon a time", "parameters": {"max_new_tokens": 50}}'
Summarization
HF_MODEL_NAME=facebook/bart-large-cnn
HF_TASK=summarization
curl -X POST http://localhost:8000/predict \
-d '{"inputs": "Long article text here..."}'
Translation (EN -> FR)
HF_MODEL_NAME=Helsinki-NLP/opus-mt-en-fr
HF_TASK=translation
curl -X POST http://localhost:8000/predict \
-d '{"inputs": "Hello, how are you?"}'