Spaces:
Sleeping
Sleeping
| title: HF Inference API | |
| emoji: π€ | |
| colorFrom: yellow | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 6.2.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # Hugging Face Inference API | |
| REST API and Gradio interface for Hugging Face model inference. | |
| ## Features | |
| - **Two inference modes**: HF Inference API (lightweight) or local model loading | |
| - **REST API**: FastAPI with automatic OpenAPI documentation | |
| - **Gradio UI**: Web interface for interactive testing | |
| - **HF Spaces ready**: Deploy directly to Hugging Face Spaces | |
| ## Quick Start | |
| ### 1. Installation | |
| ```bash | |
| # Create virtual environment | |
| python -m venv venv | |
| source venv/bin/activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # For local model inference (optional) | |
| pip install transformers torch | |
| # Copy and configure environment | |
| cp .env.example .env | |
| ``` | |
| ### 2. Configure | |
| Edit `.env` with your settings: | |
| ```bash | |
| # Use HF Inference API (recommended) | |
| HF_USE_API=true | |
| HF_API_TOKEN=hf_xxxxxxxxxxxxx | |
| # Or load models locally | |
| HF_USE_API=false | |
| ``` | |
| ### 3. Run | |
| ```bash | |
| # Option A: REST API (FastAPI) | |
| python -m app.main | |
| # Option B: Gradio interface | |
| python app.py | |
| ``` | |
| ## Running Options | |
| ### REST API (FastAPI) | |
| ```bash | |
| python -m app.main | |
| ``` | |
| - URL: http://localhost:8000 | |
| - Swagger: http://localhost:8000/docs | |
| - ReDoc: http://localhost:8000/redoc | |
| ### Gradio Interface | |
| ```bash | |
| python app.py | |
| ``` | |
| - URL: http://localhost:7860 | |
| ### Docker | |
| ```bash | |
| # Build | |
| docker build -t hf-inference-api . | |
| # Run with HF API | |
| docker run -p 8000:8000 \ | |
| -e HF_USE_API=true \ | |
| -e HF_API_TOKEN=hf_xxxxx \ | |
| -e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \ | |
| hf-inference-api | |
| # Run with local model | |
| docker run -p 8000:8000 \ | |
| -e HF_USE_API=false \ | |
| -e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \ | |
| hf-inference-api | |
| ``` | |
| ### Hugging Face Spaces | |
| 1. Create a new Space at https://huggingface.co/new-space | |
| 2. Select **Gradio** as SDK | |
| 3. Push these files: | |
| - `app.py` | |
| - `requirements.txt` | |
| - `app/` folder | |
| 4. Add `HF_API_TOKEN` in Space Settings > Secrets | |
| ## API Endpoints | |
| ### Health Check | |
| ```bash | |
| curl http://localhost:8000/health | |
| ``` | |
| Response: | |
| ```json | |
| { | |
| "status": "ok", | |
| "model_loaded": true, | |
| "model_name": "distilbert-base-uncased-finetuned-sst-2-english" | |
| } | |
| ``` | |
| ### Inference | |
| ```bash | |
| curl -X POST http://localhost:8000/predict \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"inputs": "I love this product!"}' | |
| ``` | |
| Response: | |
| ```json | |
| { | |
| "predictions": [[{"label": "POSITIVE", "score": 0.9998}]], | |
| "model_name": "distilbert-base-uncased-finetuned-sst-2-english" | |
| } | |
| ``` | |
| ### Batch Inference | |
| ```bash | |
| curl -X POST http://localhost:8000/predict \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"inputs": ["I love this!", "This is terrible."]}' | |
| ``` | |
| ### With Parameters | |
| ```bash | |
| curl -X POST http://localhost:8000/predict \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "inputs": "The capital of France is", | |
| "parameters": {"max_new_tokens": 50} | |
| }' | |
| ``` | |
| ## Configuration | |
| ### Environment Variables | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `HF_USE_API` | `true` | Use HF Inference API (`true`) or local model (`false`) | | |
| | `HF_API_TOKEN` | `None` | HF API token (required if `HF_USE_API=true`) | | |
| | `HF_MODEL_NAME` | `cardiffnlp/twitter-roberta-base-sentiment-latest` | Hugging Face model ID | | |
| | `HF_TASK` | `text-classification` | Pipeline task type | | |
| | `HF_HOST` | `0.0.0.0` | Server host | | |
| | `HF_PORT` | `8000` | Server port | | |
| | `HF_DEVICE` | `cpu` | Device for local inference (`cpu`, `cuda`, `cuda:0`) | | |
| | `HF_MAX_BATCH_SIZE` | `32` | Maximum batch size for local inference | | |
| ### Inference Modes | |
| #### HF Inference API (Recommended) | |
| ```bash | |
| HF_USE_API=true | |
| HF_API_TOKEN=hf_xxxxxxxxxxxxx | |
| ``` | |
| Pros: | |
| - No model download required | |
| - Lightweight (no torch/transformers) | |
| - Fast startup | |
| - Free tier available | |
| Cons: | |
| - Requires internet connection | |
| - Rate limits on free tier | |
| - API token required | |
| #### Local Model | |
| ```bash | |
| HF_USE_API=false | |
| ``` | |
| Requires additional dependencies: | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| Pros: | |
| - No internet required after download | |
| - No rate limits | |
| - Full control | |
| Cons: | |
| - Large dependencies (~2GB for torch) | |
| - Model download on first run | |
| - More RAM/CPU required | |
| ## Supported Tasks | |
| | Task | Description | Example Model | | |
| |------|-------------|---------------| | |
| | `text-classification` | Classify text into categories | `distilbert-base-uncased-finetuned-sst-2-english` | | |
| | `sentiment-analysis` | Analyze sentiment (alias for text-classification) | `nlptown/bert-base-multilingual-uncased-sentiment` | | |
| | `text-generation` | Generate text from prompt | `gpt2`, `mistralai/Mistral-7B-v0.1` | | |
| | `summarization` | Summarize long text | `facebook/bart-large-cnn` | | |
| | `translation` | Translate text | `Helsinki-NLP/opus-mt-en-fr` | | |
| | `fill-mask` | Fill in masked tokens | `bert-base-uncased` | | |
| | `question-answering` | Answer questions given context | `deepset/roberta-base-squad2` | | |
| | `feature-extraction` | Extract embeddings | `sentence-transformers/all-MiniLM-L6-v2` | | |
| ## Project Structure | |
| ``` | |
| hf-inference-api/ | |
| βββ app/ | |
| β βββ __init__.py | |
| β βββ config.py # Settings (pydantic-settings) | |
| β βββ inference.py # Inference engine (API + local) | |
| β βββ main.py # FastAPI application | |
| β βββ models.py # Pydantic models | |
| βββ app.py # Gradio interface | |
| βββ .env.example # Environment template | |
| βββ .gitignore | |
| βββ Dockerfile | |
| βββ README.md | |
| βββ requirements.txt | |
| ``` | |
| ## Examples | |
| ### Text Classification | |
| ```bash | |
| HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english | |
| HF_TASK=text-classification | |
| ``` | |
| ```bash | |
| curl -X POST http://localhost:8000/predict \ | |
| -d '{"inputs": "I love this movie!"}' | |
| ``` | |
| ### Text Generation | |
| ```bash | |
| HF_MODEL_NAME=gpt2 | |
| HF_TASK=text-generation | |
| ``` | |
| ```bash | |
| curl -X POST http://localhost:8000/predict \ | |
| -d '{"inputs": "Once upon a time", "parameters": {"max_new_tokens": 50}}' | |
| ``` | |
| ### Summarization | |
| ```bash | |
| HF_MODEL_NAME=facebook/bart-large-cnn | |
| HF_TASK=summarization | |
| ``` | |
| ```bash | |
| curl -X POST http://localhost:8000/predict \ | |
| -d '{"inputs": "Long article text here..."}' | |
| ``` | |
| ### Translation (EN -> FR) | |
| ```bash | |
| HF_MODEL_NAME=Helsinki-NLP/opus-mt-en-fr | |
| HF_TASK=translation | |
| ``` | |
| ```bash | |
| curl -X POST http://localhost:8000/predict \ | |
| -d '{"inputs": "Hello, how are you?"}' | |
| ``` | |