goabonga commited on
Commit
b98ed7e
·
unverified ·
0 Parent(s):

Initial commit: HF Inference API with Gradio interface

Browse files

- FastAPI REST API for model inference (/predict, /health endpoints)
- Gradio web interface for interactive testing
- Two inference modes: HF Inference API (lightweight) or local model
- Support for multiple tasks: text-classification, text-generation, summarization, translation, fill-mask, question-answering
- Docker support for containerized deployment
- Ready for Hugging Face Spaces deployment

Files changed (12) hide show
  1. .env.example +40 -0
  2. .gitignore +25 -0
  3. Dockerfile +27 -0
  4. README.md +303 -0
  5. app.py +76 -0
  6. app/__init__.py +0 -0
  7. app/config.py +30 -0
  8. app/inference.py +128 -0
  9. app/main.py +86 -0
  10. app/models.py +29 -0
  11. requirements-dev.txt +10 -0
  12. requirements.txt +4 -0
.env.example ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Inference API Configuration
2
+
3
+ # ============================================
4
+ # Mode: API (recommended) or Local
5
+ # ============================================
6
+
7
+ # Use HF Inference API (true) or load model locally (false)
8
+ HF_USE_API=true
9
+
10
+ # HF API token (get it from https://huggingface.co/settings/tokens)
11
+ # Required if HF_USE_API=true
12
+ HF_API_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx
13
+
14
+ # ============================================
15
+ # Model Configuration
16
+ # ============================================
17
+
18
+ # Model to use (any Hugging Face model ID)
19
+ HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english
20
+
21
+ # Task type (text-classification, text-generation, summarization, etc.)
22
+ HF_TASK=text-classification
23
+
24
+
25
+ # ============================================
26
+ # Server Configuration
27
+ # ============================================
28
+
29
+ HF_HOST=0.0.0.0
30
+ HF_PORT=8000
31
+
32
+ # ============================================
33
+ # Local Mode Only (ignored if HF_USE_API=true)
34
+ # ============================================
35
+
36
+ # Device (cpu, cuda, cuda:0, etc.)
37
+ HF_DEVICE=cpu
38
+
39
+ # Maximum batch size for inference
40
+ HF_MAX_BATCH_SIZE=32
.gitignore ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ venv/
8
+ .venv/
9
+ ENV/
10
+
11
+ # IDE
12
+ .idea/
13
+ .vscode/
14
+ *.swp
15
+ *.swo
16
+
17
+ # Environment
18
+ .env
19
+
20
+ # Models cache
21
+ .cache/
22
+ models/
23
+
24
+ # Logs
25
+ *.log
Dockerfile ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y --no-install-recommends \
7
+ build-essential \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ # Copy requirements first for layer caching
11
+ COPY requirements.txt requirements-dev.txt ./
12
+ RUN pip install --no-cache-dir -r requirements-dev.txt
13
+
14
+ # Copy application code
15
+ COPY app/ ./app/
16
+
17
+ # Create non-root user
18
+ RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
19
+ USER appuser
20
+
21
+ # Set environment variables
22
+ ENV HF_HOME=/app/.cache/huggingface
23
+ ENV TRANSFORMERS_CACHE=/app/.cache/huggingface
24
+
25
+ EXPOSE 8000
26
+
27
+ CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
README.md ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: HF Inference API
3
+ emoji: 🤗
4
+ colorFrom: yellow
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 6.2.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # Hugging Face Inference API
14
+
15
+ REST API and Gradio interface for Hugging Face model inference.
16
+
17
+ ## Features
18
+
19
+ - **Two inference modes**: HF Inference API (lightweight) or local model loading
20
+ - **REST API**: FastAPI with automatic OpenAPI documentation
21
+ - **Gradio UI**: Web interface for interactive testing
22
+ - **HF Spaces ready**: Deploy directly to Hugging Face Spaces
23
+
24
+ ## Quick Start
25
+
26
+ ### 1. Installation
27
+
28
+ ```bash
29
+ # Create virtual environment
30
+ python -m venv venv
31
+ source venv/bin/activate
32
+
33
+ # Install dependencies
34
+ pip install -r requirements.txt
35
+
36
+ # For local model inference (optional)
37
+ pip install transformers torch
38
+
39
+ # Copy and configure environment
40
+ cp .env.example .env
41
+ ```
42
+
43
+ ### 2. Configure
44
+
45
+ Edit `.env` with your settings:
46
+
47
+ ```bash
48
+ # Use HF Inference API (recommended)
49
+ HF_USE_API=true
50
+ HF_API_TOKEN=hf_xxxxxxxxxxxxx
51
+
52
+ # Or load models locally
53
+ HF_USE_API=false
54
+ ```
55
+
56
+ ### 3. Run
57
+
58
+ ```bash
59
+ # Option A: REST API (FastAPI)
60
+ python -m app.main
61
+
62
+ # Option B: Gradio interface
63
+ python app.py
64
+ ```
65
+
66
+ ## Running Options
67
+
68
+ ### REST API (FastAPI)
69
+
70
+ ```bash
71
+ python -m app.main
72
+ ```
73
+
74
+ - URL: http://localhost:8000
75
+ - Swagger: http://localhost:8000/docs
76
+ - ReDoc: http://localhost:8000/redoc
77
+
78
+ ### Gradio Interface
79
+
80
+ ```bash
81
+ python app.py
82
+ ```
83
+
84
+ - URL: http://localhost:7860
85
+
86
+ ### Docker
87
+
88
+ ```bash
89
+ # Build
90
+ docker build -t hf-inference-api .
91
+
92
+ # Run with HF API
93
+ docker run -p 8000:8000 \
94
+ -e HF_USE_API=true \
95
+ -e HF_API_TOKEN=hf_xxxxx \
96
+ -e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
97
+ hf-inference-api
98
+
99
+ # Run with local model
100
+ docker run -p 8000:8000 \
101
+ -e HF_USE_API=false \
102
+ -e HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english \
103
+ hf-inference-api
104
+ ```
105
+
106
+ ### Hugging Face Spaces
107
+
108
+ 1. Create a new Space at https://huggingface.co/new-space
109
+ 2. Select **Gradio** as SDK
110
+ 3. Push these files:
111
+ - `app.py`
112
+ - `requirements.txt`
113
+ - `app/` folder
114
+ 4. Add `HF_API_TOKEN` in Space Settings > Secrets
115
+
116
+ ## API Endpoints
117
+
118
+ ### Health Check
119
+
120
+ ```bash
121
+ curl http://localhost:8000/health
122
+ ```
123
+
124
+ Response:
125
+ ```json
126
+ {
127
+ "status": "ok",
128
+ "model_loaded": true,
129
+ "model_name": "distilbert-base-uncased-finetuned-sst-2-english"
130
+ }
131
+ ```
132
+
133
+ ### Inference
134
+
135
+ ```bash
136
+ curl -X POST http://localhost:8000/predict \
137
+ -H "Content-Type: application/json" \
138
+ -d '{"inputs": "I love this product!"}'
139
+ ```
140
+
141
+ Response:
142
+ ```json
143
+ {
144
+ "predictions": [[{"label": "POSITIVE", "score": 0.9998}]],
145
+ "model_name": "distilbert-base-uncased-finetuned-sst-2-english"
146
+ }
147
+ ```
148
+
149
+ ### Batch Inference
150
+
151
+ ```bash
152
+ curl -X POST http://localhost:8000/predict \
153
+ -H "Content-Type: application/json" \
154
+ -d '{"inputs": ["I love this!", "This is terrible."]}'
155
+ ```
156
+
157
+ ### With Parameters
158
+
159
+ ```bash
160
+ curl -X POST http://localhost:8000/predict \
161
+ -H "Content-Type: application/json" \
162
+ -d '{
163
+ "inputs": "The capital of France is",
164
+ "parameters": {"max_new_tokens": 50}
165
+ }'
166
+ ```
167
+
168
+ ## Configuration
169
+
170
+ ### Environment Variables
171
+
172
+ | Variable | Default | Description |
173
+ |----------|---------|-------------|
174
+ | `HF_USE_API` | `true` | Use HF Inference API (`true`) or local model (`false`) |
175
+ | `HF_API_TOKEN` | `None` | HF API token (required if `HF_USE_API=true`) |
176
+ | `HF_MODEL_NAME` | `cardiffnlp/twitter-roberta-base-sentiment-latest` | Hugging Face model ID |
177
+ | `HF_TASK` | `text-classification` | Pipeline task type |
178
+ | `HF_HOST` | `0.0.0.0` | Server host |
179
+ | `HF_PORT` | `8000` | Server port |
180
+ | `HF_DEVICE` | `cpu` | Device for local inference (`cpu`, `cuda`, `cuda:0`) |
181
+ | `HF_MAX_BATCH_SIZE` | `32` | Maximum batch size for local inference |
182
+
183
+ ### Inference Modes
184
+
185
+ #### HF Inference API (Recommended)
186
+
187
+ ```bash
188
+ HF_USE_API=true
189
+ HF_API_TOKEN=hf_xxxxxxxxxxxxx
190
+ ```
191
+
192
+ Pros:
193
+ - No model download required
194
+ - Lightweight (no torch/transformers)
195
+ - Fast startup
196
+ - Free tier available
197
+
198
+ Cons:
199
+ - Requires internet connection
200
+ - Rate limits on free tier
201
+ - API token required
202
+
203
+ #### Local Model
204
+
205
+ ```bash
206
+ HF_USE_API=false
207
+ ```
208
+
209
+ Requires additional dependencies:
210
+ ```bash
211
+ pip install transformers torch
212
+ ```
213
+
214
+ Pros:
215
+ - No internet required after download
216
+ - No rate limits
217
+ - Full control
218
+
219
+ Cons:
220
+ - Large dependencies (~2GB for torch)
221
+ - Model download on first run
222
+ - More RAM/CPU required
223
+
224
+ ## Supported Tasks
225
+
226
+ | Task | Description | Example Model |
227
+ |------|-------------|---------------|
228
+ | `text-classification` | Classify text into categories | `distilbert-base-uncased-finetuned-sst-2-english` |
229
+ | `sentiment-analysis` | Analyze sentiment (alias for text-classification) | `nlptown/bert-base-multilingual-uncased-sentiment` |
230
+ | `text-generation` | Generate text from prompt | `gpt2`, `mistralai/Mistral-7B-v0.1` |
231
+ | `summarization` | Summarize long text | `facebook/bart-large-cnn` |
232
+ | `translation` | Translate text | `Helsinki-NLP/opus-mt-en-fr` |
233
+ | `fill-mask` | Fill in masked tokens | `bert-base-uncased` |
234
+ | `question-answering` | Answer questions given context | `deepset/roberta-base-squad2` |
235
+ | `feature-extraction` | Extract embeddings | `sentence-transformers/all-MiniLM-L6-v2` |
236
+
237
+ ## Project Structure
238
+
239
+ ```
240
+ hf-inference-api/
241
+ ├── app/
242
+ │ ├── __init__.py
243
+ │ ├── config.py # Settings (pydantic-settings)
244
+ │ ├── inference.py # Inference engine (API + local)
245
+ │ ├── main.py # FastAPI application
246
+ │ └── models.py # Pydantic models
247
+ ├── app.py # Gradio interface
248
+ ├── .env.example # Environment template
249
+ ├── .gitignore
250
+ ├── Dockerfile
251
+ ├── README.md
252
+ └── requirements.txt
253
+ ```
254
+
255
+ ## Examples
256
+
257
+ ### Text Classification
258
+
259
+ ```bash
260
+ HF_MODEL_NAME=distilbert-base-uncased-finetuned-sst-2-english
261
+ HF_TASK=text-classification
262
+ ```
263
+
264
+ ```bash
265
+ curl -X POST http://localhost:8000/predict \
266
+ -d '{"inputs": "I love this movie!"}'
267
+ ```
268
+
269
+ ### Text Generation
270
+
271
+ ```bash
272
+ HF_MODEL_NAME=gpt2
273
+ HF_TASK=text-generation
274
+ ```
275
+
276
+ ```bash
277
+ curl -X POST http://localhost:8000/predict \
278
+ -d '{"inputs": "Once upon a time", "parameters": {"max_new_tokens": 50}}'
279
+ ```
280
+
281
+ ### Summarization
282
+
283
+ ```bash
284
+ HF_MODEL_NAME=facebook/bart-large-cnn
285
+ HF_TASK=summarization
286
+ ```
287
+
288
+ ```bash
289
+ curl -X POST http://localhost:8000/predict \
290
+ -d '{"inputs": "Long article text here..."}'
291
+ ```
292
+
293
+ ### Translation (EN -> FR)
294
+
295
+ ```bash
296
+ HF_MODEL_NAME=Helsinki-NLP/opus-mt-en-fr
297
+ HF_TASK=translation
298
+ ```
299
+
300
+ ```bash
301
+ curl -X POST http://localhost:8000/predict \
302
+ -d '{"inputs": "Hello, how are you?"}'
303
+ ```
app.py ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Gradio interface for Hugging Face inference."""
2
+
3
+ import gradio as gr
4
+ from huggingface_hub import InferenceClient
5
+
6
+ try:
7
+ import spaces
8
+ SPACES_AVAILABLE = True
9
+ except ImportError:
10
+ SPACES_AVAILABLE = False
11
+
12
+ from app.config import get_settings
13
+
14
+ settings = get_settings()
15
+ client = InferenceClient(model=settings.model_name, token=settings.api_token)
16
+
17
+
18
+ def _predict(text: str) -> str:
19
+ """Run inference on the input text."""
20
+ if not text.strip():
21
+ return "Please enter some text."
22
+
23
+ task = settings.task
24
+
25
+ try:
26
+ if task in ("text-classification", "sentiment-analysis"):
27
+ results = client.text_classification(text)
28
+ output = "\n".join(
29
+ [f"{r['label']}: {r['score']:.2%}" for r in results]
30
+ )
31
+ elif task == "text-generation":
32
+ output = client.text_generation(text, max_new_tokens=100)
33
+ elif task == "summarization":
34
+ output = client.summarization(text)
35
+ elif task == "translation":
36
+ output = client.translation(text)
37
+ elif task == "fill-mask":
38
+ results = client.fill_mask(text)
39
+ output = "\n".join(
40
+ [f"{r['token_str']}: {r['score']:.2%}" for r in results]
41
+ )
42
+ else:
43
+ output = str(client.post(json={"inputs": text}))
44
+
45
+ return output
46
+ except Exception as e:
47
+ return f"Error: {e}"
48
+
49
+
50
+ # Apply @spaces.GPU decorator only on HF Spaces
51
+ if SPACES_AVAILABLE:
52
+ predict = spaces.GPU(duration=60)(_predict)
53
+ else:
54
+ predict = _predict
55
+
56
+
57
+ demo = gr.Interface(
58
+ fn=predict,
59
+ inputs=gr.Textbox(
60
+ label="Input Text",
61
+ placeholder="Enter text here...",
62
+ lines=4,
63
+ ),
64
+ outputs=gr.Textbox(label="Result", lines=6),
65
+ title="Hugging Face Inference",
66
+ description=f"Model: **{settings.model_name}** | Task: **{settings.task}**",
67
+ examples=[
68
+ ["I love this product! It's amazing."],
69
+ ["This is the worst experience ever."],
70
+ ["The weather is nice today."],
71
+ ],
72
+ flagging_mode="never",
73
+ )
74
+
75
+ if __name__ == "__main__":
76
+ demo.launch(server_name="0.0.0.0", server_port=7860)
app/__init__.py ADDED
File without changes
app/config.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Configuration settings for the inference API."""
2
+
3
+ from functools import lru_cache
4
+
5
+ from pydantic_settings import BaseSettings
6
+
7
+
8
+ class Settings(BaseSettings):
9
+ """Application settings loaded from environment variables."""
10
+
11
+ model_name: str = "distilbert-base-uncased-finetuned-sst-2-english"
12
+ task: str = "text-classification"
13
+ host: str = "0.0.0.0"
14
+ port: int = 8000
15
+ max_batch_size: int = 32
16
+ device: str = "cpu"
17
+
18
+ # HF Inference API settings
19
+ use_api: bool = True # True = use HF API, False = load model locally
20
+ api_token: str | None = None # HF API token (required if use_api=True)
21
+
22
+ class Config:
23
+ env_file = ".env"
24
+ env_prefix = "HF_"
25
+
26
+
27
+ @lru_cache
28
+ def get_settings() -> Settings:
29
+ """Get cached settings instance."""
30
+ return Settings()
app/inference.py ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Inference engine using Hugging Face API or local transformers."""
2
+
3
+ import logging
4
+ from typing import Any
5
+
6
+ from huggingface_hub import InferenceClient
7
+
8
+ from .config import Settings
9
+
10
+ logger = logging.getLogger(__name__)
11
+
12
+
13
+ class InferenceEngine:
14
+ """Handles model loading and inference."""
15
+
16
+ def __init__(self, settings: Settings) -> None:
17
+ """Initialize the inference engine."""
18
+ self.settings = settings
19
+ self.client: InferenceClient | None = None
20
+ self.pipeline = None
21
+ self.model_loaded = False
22
+ self.use_api = settings.use_api
23
+
24
+ def load_model(self) -> None:
25
+ """Load the model (API client or local pipeline)."""
26
+ if self.use_api:
27
+ self._init_api_client()
28
+ else:
29
+ self._init_local_pipeline()
30
+
31
+ def _init_api_client(self) -> None:
32
+ """Initialize the HF Inference API client."""
33
+ logger.info(
34
+ "Initializing HF Inference API client for model: %s",
35
+ self.settings.model_name,
36
+ )
37
+ self.client = InferenceClient(
38
+ model=self.settings.model_name,
39
+ token=self.settings.api_token,
40
+ )
41
+ self.model_loaded = True
42
+ logger.info("HF Inference API client ready")
43
+
44
+ def _init_local_pipeline(self) -> None:
45
+ """Load the model locally using transformers."""
46
+ try:
47
+ from transformers import pipeline
48
+ except ImportError:
49
+ raise ImportError(
50
+ "transformers and torch are required for local inference. "
51
+ "Install them with: pip install transformers torch"
52
+ )
53
+
54
+ logger.info(
55
+ "Loading local model: %s for task: %s",
56
+ self.settings.model_name,
57
+ self.settings.task,
58
+ )
59
+ self.pipeline = pipeline(
60
+ task=self.settings.task,
61
+ model=self.settings.model_name,
62
+ device=self.settings.device if self.settings.device != "cpu" else -1,
63
+ )
64
+ self.model_loaded = True
65
+ logger.info("Local model loaded successfully")
66
+
67
+ def predict(
68
+ self, inputs: str | list[str], parameters: dict[str, Any] | None = None
69
+ ) -> list[Any]:
70
+ """Run inference on the input(s)."""
71
+ if not self.model_loaded:
72
+ raise RuntimeError("Model not loaded")
73
+
74
+ if self.use_api:
75
+ return self._predict_api(inputs, parameters)
76
+ else:
77
+ return self._predict_local(inputs, parameters)
78
+
79
+ def _predict_api(
80
+ self, inputs: str | list[str], parameters: dict[str, Any] | None = None
81
+ ) -> list[Any]:
82
+ """Run inference using HF Inference API."""
83
+ params = parameters or {}
84
+ task = self.settings.task
85
+
86
+ if isinstance(inputs, str):
87
+ inputs_list = [inputs]
88
+ else:
89
+ inputs_list = inputs
90
+
91
+ results = []
92
+ for text in inputs_list:
93
+ result = self._call_api(task, text, params)
94
+ results.append(result)
95
+
96
+ return results
97
+
98
+ def _call_api(self, task: str, text: str, params: dict[str, Any]) -> Any:
99
+ """Call the appropriate API method based on task."""
100
+ if task in ("text-classification", "sentiment-analysis"):
101
+ return self.client.text_classification(text, **params)
102
+ elif task == "text-generation":
103
+ return self.client.text_generation(text, **params)
104
+ elif task == "summarization":
105
+ return self.client.summarization(text, **params)
106
+ elif task == "translation":
107
+ return self.client.translation(text, **params)
108
+ elif task == "fill-mask":
109
+ return self.client.fill_mask(text, **params)
110
+ elif task == "question-answering":
111
+ context = params.pop("context", "")
112
+ return self.client.question_answering(question=text, context=context)
113
+ elif task == "feature-extraction":
114
+ return self.client.feature_extraction(text, **params)
115
+ else:
116
+ # Generic post for unsupported tasks
117
+ return self.client.post(json={"inputs": text, **params})
118
+
119
+ def _predict_local(
120
+ self, inputs: str | list[str], parameters: dict[str, Any] | None = None
121
+ ) -> list[Any]:
122
+ """Run inference using local transformers pipeline."""
123
+ params = parameters or {}
124
+ results = self.pipeline(inputs, **params)
125
+
126
+ if isinstance(inputs, str):
127
+ return [results]
128
+ return results
app/main.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Main FastAPI application for Hugging Face inference API."""
2
+
3
+ import logging
4
+ from contextlib import asynccontextmanager
5
+
6
+ from fastapi import FastAPI, HTTPException
7
+
8
+ from .config import get_settings
9
+ from .inference import InferenceEngine
10
+ from .models import HealthResponse, InferenceRequest, InferenceResponse
11
+
12
+ logging.basicConfig(
13
+ level=logging.INFO,
14
+ format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
15
+ )
16
+ logger = logging.getLogger(__name__)
17
+
18
+ settings = get_settings()
19
+ engine = InferenceEngine(settings)
20
+
21
+
22
+ @asynccontextmanager
23
+ async def lifespan(app: FastAPI):
24
+ """Handle application startup and shutdown."""
25
+ logger.info("Starting inference API...")
26
+ engine.load_model()
27
+ yield
28
+ logger.info("Shutting down inference API...")
29
+
30
+
31
+ app = FastAPI(
32
+ title="Hugging Face Inference API",
33
+ description="REST API for Hugging Face model inference",
34
+ version="1.0.0",
35
+ lifespan=lifespan,
36
+ )
37
+
38
+
39
+ @app.get("/health", response_model=HealthResponse)
40
+ async def health_check() -> HealthResponse:
41
+ """Check API and model health status."""
42
+ return HealthResponse(
43
+ status="ok",
44
+ model_loaded=engine.model_loaded,
45
+ model_name=settings.model_name if engine.model_loaded else None,
46
+ )
47
+
48
+
49
+ @app.post("/predict", response_model=InferenceResponse)
50
+ async def predict(request: InferenceRequest) -> InferenceResponse:
51
+ """Run inference on the provided input(s)."""
52
+ if not engine.model_loaded:
53
+ raise HTTPException(status_code=503, detail="Model not loaded")
54
+
55
+ try:
56
+ predictions = engine.predict(request.inputs, request.parameters)
57
+ return InferenceResponse(
58
+ predictions=predictions,
59
+ model_name=settings.model_name,
60
+ )
61
+ except Exception as e:
62
+ logger.exception("Inference failed")
63
+ raise HTTPException(status_code=500, detail=str(e))
64
+
65
+
66
+ @app.get("/")
67
+ async def root():
68
+ """Root endpoint with API information."""
69
+ return {
70
+ "name": "Hugging Face Inference API",
71
+ "version": "1.0.0",
72
+ "model": settings.model_name,
73
+ "task": settings.task,
74
+ "docs": "/docs",
75
+ }
76
+
77
+
78
+ if __name__ == "__main__":
79
+ import uvicorn
80
+
81
+ uvicorn.run(
82
+ "app.main:app",
83
+ host=settings.host,
84
+ port=settings.port,
85
+ reload=True,
86
+ )
app/models.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Pydantic models for API requests and responses."""
2
+
3
+ from typing import Any
4
+
5
+ from pydantic import BaseModel, Field
6
+
7
+
8
+ class InferenceRequest(BaseModel):
9
+ """Request model for inference endpoint."""
10
+
11
+ inputs: str | list[str] = Field(..., description="Text input(s) for inference")
12
+ parameters: dict[str, Any] = Field(
13
+ default_factory=dict, description="Optional model parameters"
14
+ )
15
+
16
+
17
+ class InferenceResponse(BaseModel):
18
+ """Response model for inference endpoint."""
19
+
20
+ predictions: list[Any] = Field(..., description="Model predictions")
21
+ model_name: str = Field(..., description="Name of the model used")
22
+
23
+
24
+ class HealthResponse(BaseModel):
25
+ """Response model for health check endpoint."""
26
+
27
+ status: str = "ok"
28
+ model_loaded: bool = False
29
+ model_name: str | None = None
requirements-dev.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Full requirements for local development
2
+ -r requirements.txt
3
+
4
+ fastapi>=0.109.0
5
+ uvicorn[standard]>=0.27.0
6
+ gradio>=4.0.0
7
+
8
+ # Local inference (optional - only needed if HF_USE_API=false)
9
+ # transformers>=4.37.0
10
+ # torch>=2.1.0
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ # Requirements for HF Spaces deployment
2
+ huggingface_hub>=0.20.0
3
+ pydantic>=2.5.0
4
+ pydantic-settings>=2.1.0