hf-inference-api / README.md
goabonga's picture
Initial commit: HF Inference API with Gradio interface
b98ed7e unverified
---
title: HF Inference API
emoji: πŸ€—
colorFrom: yellow
colorTo: pink
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
license: mit
---
# Hugging Face Inference API
REST API and Gradio interface for Hugging Face model inference.
## Features
- **Two inference modes**: HF Inference API (lightweight) or local model loading
- **REST API**: FastAPI with automatic OpenAPI documentation
- **Gradio UI**: Web interface for interactive testing
- **HF Spaces ready**: Deploy directly to Hugging Face Spaces
## Quick Start
### 1. Installation
```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# For local model inference (optional)
pip install transformers torch
# Copy and configure environment
cp .env.example .env
```
### 2. Configure
Edit `.env` with your settings:
```bash
# Use HF Inference API (recommended)
HF_USE_API=true
HF_API_TOKEN=hf_xxxxxxxxxxxxx
# Or load models locally
HF_USE_API=false
```
### 3. Run
```bash
# Option A: REST API (FastAPI)
python -m app.main
# Option B: Gradio interface
python app.py
```
## Running Options
### REST API (FastAPI)
```bash
python -m app.main
```
- URL: http://localhost:8000
- Swagger: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
### Gradio Interface
```bash
python app.py
```
- URL: http://localhost:7860
### Docker
```bash
# Build
docker build -t hf-inference-api .
# Run with HF API
docker run -p 8000:8000 \
-e HF_USE_API=true \
-e HF_API_TOKEN=hf_xxxxx \
-e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
hf-inference-api
# Run with local model
docker run -p 8000:8000 \
-e HF_USE_API=false \
-e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
hf-inference-api
```
### Hugging Face Spaces
1. Create a new Space at https://huggingface.co/new-space
2. Select **Gradio** as SDK
3. Push these files:
- `app.py`
- `requirements.txt`
- `app/` folder
4. Add `HF_API_TOKEN` in Space Settings > Secrets
## API Endpoints
### Health Check
```bash
curl http://localhost:8000/health
```
Response:
```json
{
"status": "ok",
"model_loaded": true,
"model_name": "distilbert-base-uncased-finetuned-sst-2-english"
}
```
### Inference
```bash
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"inputs": "I love this product!"}'
```
Response:
```json
{
"predictions": [[{"label": "POSITIVE", "score": 0.9998}]],
"model_name": "distilbert-base-uncased-finetuned-sst-2-english"
}
```
### Batch Inference
```bash
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"inputs": ["I love this!", "This is terrible."]}'
```
### With Parameters
```bash
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"inputs": "The capital of France is",
"parameters": {"max_new_tokens": 50}
}'
```
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `HF_USE_API` | `true` | Use HF Inference API (`true`) or local model (`false`) |
| `HF_API_TOKEN` | `None` | HF API token (required if `HF_USE_API=true`) |
| `HF_MODEL_NAME` | `cardiffnlp/twitter-roberta-base-sentiment-latest` | Hugging Face model ID |
| `HF_TASK` | `text-classification` | Pipeline task type |
| `HF_HOST` | `0.0.0.0` | Server host |
| `HF_PORT` | `8000` | Server port |
| `HF_DEVICE` | `cpu` | Device for local inference (`cpu`, `cuda`, `cuda:0`) |
| `HF_MAX_BATCH_SIZE` | `32` | Maximum batch size for local inference |
### Inference Modes
#### HF Inference API (Recommended)
```bash
HF_USE_API=true
HF_API_TOKEN=hf_xxxxxxxxxxxxx
```
Pros:
- No model download required
- Lightweight (no torch/transformers)
- Fast startup
- Free tier available
Cons:
- Requires internet connection
- Rate limits on free tier
- API token required
#### Local Model
```bash
HF_USE_API=false
```
Requires additional dependencies:
```bash
pip install transformers torch
```
Pros:
- No internet required after download
- No rate limits
- Full control
Cons:
- Large dependencies (~2GB for torch)
- Model download on first run
- More RAM/CPU required
## Supported Tasks
| Task | Description | Example Model |
|------|-------------|---------------|
| `text-classification` | Classify text into categories | `distilbert-base-uncased-finetuned-sst-2-english` |
| `sentiment-analysis` | Analyze sentiment (alias for text-classification) | `nlptown/bert-base-multilingual-uncased-sentiment` |
| `text-generation` | Generate text from prompt | `gpt2`, `mistralai/Mistral-7B-v0.1` |
| `summarization` | Summarize long text | `facebook/bart-large-cnn` |
| `translation` | Translate text | `Helsinki-NLP/opus-mt-en-fr` |
| `fill-mask` | Fill in masked tokens | `bert-base-uncased` |
| `question-answering` | Answer questions given context | `deepset/roberta-base-squad2` |
| `feature-extraction` | Extract embeddings | `sentence-transformers/all-MiniLM-L6-v2` |
## Project Structure
```
hf-inference-api/
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ config.py # Settings (pydantic-settings)
β”‚ β”œβ”€β”€ inference.py # Inference engine (API + local)
β”‚ β”œβ”€β”€ main.py # FastAPI application
β”‚ └── models.py # Pydantic models
β”œβ”€β”€ app.py # Gradio interface
β”œβ”€β”€ .env.example # Environment template
β”œβ”€β”€ .gitignore
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ README.md
└── requirements.txt
```
## Examples
### Text Classification
```bash
HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english
HF_TASK=text-classification
```
```bash
curl -X POST http://localhost:8000/predict \
-d '{"inputs": "I love this movie!"}'
```
### Text Generation
```bash
HF_MODEL_NAME=gpt2
HF_TASK=text-generation
```
```bash
curl -X POST http://localhost:8000/predict \
-d '{"inputs": "Once upon a time", "parameters": {"max_new_tokens": 50}}'
```
### Summarization
```bash
HF_MODEL_NAME=facebook/bart-large-cnn
HF_TASK=summarization
```
```bash
curl -X POST http://localhost:8000/predict \
-d '{"inputs": "Long article text here..."}'
```
### Translation (EN -> FR)
```bash
HF_MODEL_NAME=Helsinki-NLP/opus-mt-en-fr
HF_TASK=translation
```
```bash
curl -X POST http://localhost:8000/predict \
-d '{"inputs": "Hello, how are you?"}'
```