Spaces:

goabonga
/

hf-inference-api

Sleeping

App Files Files Community

hf-inference-api / README.md

goabonga

Initial commit: HF Inference API with Gradio interface

b98ed7e unverified about 1 month ago

preview code

raw

history blame contribute delete

6.42 kB

	---
	title: HF Inference API
	emoji: 🤗
	colorFrom: yellow
	colorTo: pink
	sdk: gradio
	sdk_version: 6.2.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# Hugging Face Inference API

	REST API and Gradio interface for Hugging Face model inference.

	## Features

	- Two inference modes: HF Inference API (lightweight) or local model loading
	- REST API: FastAPI with automatic OpenAPI documentation
	- Gradio UI: Web interface for interactive testing
	- HF Spaces ready: Deploy directly to Hugging Face Spaces

	## Quick Start

	### 1. Installation

	```bash
	# Create virtual environment
	python -m venv venv
	source venv/bin/activate

	# Install dependencies
	pip install -r requirements.txt

	# For local model inference (optional)
	pip install transformers torch

	# Copy and configure environment
	cp .env.example .env
	```

	### 2. Configure

	Edit `.env` with your settings:

	```bash
	# Use HF Inference API (recommended)
	HF_USE_API=true
	HF_API_TOKEN=hf_xxxxxxxxxxxxx

	# Or load models locally
	HF_USE_API=false
	```

	### 3. Run

	```bash
	# Option A: REST API (FastAPI)
	python -m app.main

	# Option B: Gradio interface
	python app.py
	```

	## Running Options

	### REST API (FastAPI)

	```bash
	python -m app.main
	```

	- URL: http://localhost:8000
	- Swagger: http://localhost:8000/docs
	- ReDoc: http://localhost:8000/redoc

	### Gradio Interface

	```bash
	python app.py
	```

	- URL: http://localhost:7860

	### Docker

	```bash
	# Build
	docker build -t hf-inference-api .

	# Run with HF API
	docker run -p 8000:8000 \
	-e HF_USE_API=true \
	-e HF_API_TOKEN=hf_xxxxx \
	-e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
	hf-inference-api

	# Run with local model
	docker run -p 8000:8000 \
	-e HF_USE_API=false \
	-e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
	hf-inference-api
	```

	### Hugging Face Spaces

	1. Create a new Space at https://huggingface.co/new-space
	2. Select Gradio as SDK
	3. Push these files:
	- `app.py`
	- `requirements.txt`
	- `app/` folder
	4. Add `HF_API_TOKEN` in Space Settings > Secrets

	## API Endpoints

	### Health Check

	```bash
	curl http://localhost:8000/health
	```

	Response:
	```json
	{
	"status": "ok",
	"model_loaded": true,
	"model_name": "distilbert-base-uncased-finetuned-sst-2-english"
	}
	```

	### Inference

	```bash
	curl -X POST http://localhost:8000/predict \
	-H "Content-Type: application/json" \
	-d '{"inputs": "I love this product!"}'
	```

	Response:
	```json
	{
	"predictions": [[{"label": "POSITIVE", "score": 0.9998}]],
	"model_name": "distilbert-base-uncased-finetuned-sst-2-english"
	}
	```

	### Batch Inference

	```bash
	curl -X POST http://localhost:8000/predict \
	-H "Content-Type: application/json" \
	-d '{"inputs": ["I love this!", "This is terrible."]}'
	```

	### With Parameters

	```bash
	curl -X POST http://localhost:8000/predict \
	-H "Content-Type: application/json" \
	-d '{
	"inputs": "The capital of France is",
	"parameters": {"max_new_tokens": 50}
	}'
	```

	## Configuration

	### Environment Variables

	\| Variable \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `HF_USE_API` \| `true` \| Use HF Inference API (`true`) or local model (`false`) \|
	\| `HF_API_TOKEN` \| `None` \| HF API token (required if `HF_USE_API=true`) \|
	\| `HF_MODEL_NAME` \| `cardiffnlp/twitter-roberta-base-sentiment-latest` \| Hugging Face model ID \|
	\| `HF_TASK` \| `text-classification` \| Pipeline task type \|
	\| `HF_HOST` \| `0.0.0.0` \| Server host \|
	\| `HF_PORT` \| `8000` \| Server port \|
	\| `HF_DEVICE` \| `cpu` \| Device for local inference (`cpu`, `cuda`, `cuda:0`) \|
	\| `HF_MAX_BATCH_SIZE` \| `32` \| Maximum batch size for local inference \|

	### Inference Modes

	#### HF Inference API (Recommended)

	```bash
	HF_USE_API=true
	HF_API_TOKEN=hf_xxxxxxxxxxxxx
	```

	Pros:
	- No model download required
	- Lightweight (no torch/transformers)
	- Fast startup
	- Free tier available

	Cons:
	- Requires internet connection
	- Rate limits on free tier
	- API token required

	#### Local Model

	```bash
	HF_USE_API=false
	```

	Requires additional dependencies:
	```bash
	pip install transformers torch
	```

	Pros:
	- No internet required after download
	- No rate limits
	- Full control

	Cons:
	- Large dependencies (~2GB for torch)
	- Model download on first run
	- More RAM/CPU required

	## Supported Tasks

	\| Task \| Description \| Example Model \|
	\|------\|-------------\|---------------\|
	\| `text-classification` \| Classify text into categories \| `distilbert-base-uncased-finetuned-sst-2-english` \|
	\| `sentiment-analysis` \| Analyze sentiment (alias for text-classification) \| `nlptown/bert-base-multilingual-uncased-sentiment` \|
	\| `text-generation` \| Generate text from prompt \| `gpt2`, `mistralai/Mistral-7B-v0.1` \|
	\| `summarization` \| Summarize long text \| `facebook/bart-large-cnn` \|
	\| `translation` \| Translate text \| `Helsinki-NLP/opus-mt-en-fr` \|
	\| `fill-mask` \| Fill in masked tokens \| `bert-base-uncased` \|
	\| `question-answering` \| Answer questions given context \| `deepset/roberta-base-squad2` \|
	\| `feature-extraction` \| Extract embeddings \| `sentence-transformers/all-MiniLM-L6-v2` \|

	## Project Structure

	```
	hf-inference-api/
	├── app/
	│ ├── __init__.py
	│ ├── config.py # Settings (pydantic-settings)
	│ ├── inference.py # Inference engine (API + local)
	│ ├── main.py # FastAPI application
	│ └── models.py # Pydantic models
	├── app.py # Gradio interface
	├── .env.example # Environment template
	├── .gitignore
	├── Dockerfile
	├── README.md
	└── requirements.txt
	```

	## Examples

	### Text Classification

	```bash
	HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english
	HF_TASK=text-classification
	```

	```bash
	curl -X POST http://localhost:8000/predict \
	-d '{"inputs": "I love this movie!"}'
	```

	### Text Generation

	```bash
	HF_MODEL_NAME=gpt2
	HF_TASK=text-generation
	```

	```bash
	curl -X POST http://localhost:8000/predict \
	-d '{"inputs": "Once upon a time", "parameters": {"max_new_tokens": 50}}'
	```

	### Summarization

	```bash
	HF_MODEL_NAME=facebook/bart-large-cnn
	HF_TASK=summarization
	```

	```bash
	curl -X POST http://localhost:8000/predict \
	-d '{"inputs": "Long article text here..."}'
	```

	### Translation (EN -> FR)

	```bash
	HF_MODEL_NAME=Helsinki-NLP/opus-mt-en-fr
	HF_TASK=translation
	```

	```bash
	curl -X POST http://localhost:8000/predict \
	-d '{"inputs": "Hello, how are you?"}'
	```