Adding Files From Github

#1
.gitignore DELETED
@@ -1,56 +0,0 @@
1
- # Byte-compiled / optimized / DLL files
2
- __pycache__/
3
- *.py[cod]
4
- *.pyo
5
- *.pyd
6
-
7
- # Virtual environment
8
- venv/
9
- env/
10
-
11
- # Model files and large data
12
- /app/pretrain_model/
13
- *.bin
14
- *.safetensors
15
- *.gguf
16
-
17
- # Secrets
18
- my_hf_token.txt
19
- /run/secrets/
20
-
21
- # Logs and debug files
22
- *.log
23
- *.out
24
- *.err
25
-
26
- # IDE and editor settings
27
- .vscode/
28
- .idea/
29
- *.swp
30
- *.swo
31
-
32
- # Docker
33
- *.env
34
- *.dockerignore
35
- docker-compose.override.yml
36
-
37
- # Python package files
38
- *.egg
39
- *.egg-info/
40
- dist/
41
- build/
42
- *.wheel
43
-
44
- # Cache files
45
- *.cache
46
- *.mypy_cache/
47
- *.pytest_cache/
48
- *.ipynb_checkpoints/
49
-
50
- # System files
51
- .DS_Store
52
- Thumbs.db
53
-
54
- # Gemini Plans
55
- gemini_plans/
56
- llm_app_rework.md
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Dockerfile DELETED
@@ -1,34 +0,0 @@
1
- FROM python:3.9-slim
2
-
3
- WORKDIR /app
4
-
5
- ENV MODEL_DIR=/app/pretrain_model
6
- ENV HF_HUB_DISABLE_SYMLINKS_WARNING=1
7
- ENV HF_TOKEN=""
8
-
9
- COPY requirements.txt .
10
- RUN pip install --no-cache-dir -r requirements.txt
11
-
12
- # Pre-download all local models during build
13
- RUN --mount=type=secret,id=HF_TOKEN \
14
- export HF_TOKEN=$(cat /run/secrets/HF_TOKEN) && \
15
- echo "--- Downloading Bielik-1.5B..." && \
16
- huggingface-cli download speakleash/Bielik-1.5B-v3.0-Instruct \
17
- --local-dir ${MODEL_DIR}/bielik-1.5b \
18
- --local-dir-use-symlinks=False && \
19
- echo "--- Downloading Qwen2.5-3B..." && \
20
- huggingface-cli download Qwen/Qwen2.5-3B-Instruct \
21
- --local-dir ${MODEL_DIR}/qwen2.5-3b \
22
- --local-dir-use-symlinks=False && \
23
- echo "--- Downloading Gemma-2-2B..." && \
24
- huggingface-cli download google/gemma-2-2b-it \
25
- --local-dir ${MODEL_DIR}/gemma-2-2b \
26
- --local-dir-use-symlinks=False && \
27
- echo "--- All models downloaded."
28
-
29
-
30
- COPY . .
31
-
32
- EXPOSE 8000
33
-
34
- CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,413 +1,12 @@
1
  ---
2
  title: Bielik App Service
3
- emoji: 🤖
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: docker
7
- app_port: 7860
8
  pinned: false
 
 
9
  ---
10
 
11
- # Bielik App Service
12
-
13
- Multi-model LLM service for description enhancement, batch gap-filling, and A/B testing.
14
-
15
- ## Overview
16
-
17
- This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports:
18
- - **Description Enhancement**: Generate marketing descriptions from structured data
19
- - **Batch Infill**: Fill gaps (`[GAP:n]` or `___`) in ad texts with natural words
20
- - **Multi-Model Comparison**: Compare outputs across different models for A/B testing
21
-
22
- ## Models
23
-
24
- | Model | Size | Polish Support | Type |
25
- |-------|------|----------------|------|
26
- | Bielik-1.5B | 1.5B | Excellent | Local |
27
- | Qwen2.5-3B | 3B | Good | Local |
28
- | Gemma-2-2B | 2B | Medium | Local |
29
- | PLLuM-12B | 12B | Excellent | API |
30
-
31
- ## API Endpoints
32
-
33
- ### Health & Info
34
-
35
- | Method | Endpoint | Description |
36
- |--------|----------|-------------|
37
- | `GET` | `/` | Welcome message |
38
- | `GET` | `/health` | API health check and model status |
39
- | `GET` | `/models` | List all available models |
40
-
41
- ### Model Management (Lazy Loading)
42
-
43
- | Method | Endpoint | Description |
44
- |--------|----------|-------------|
45
- | `POST` | `/models/{name}/load` | Load a model into memory |
46
- | `POST` | `/models/{name}/unload` | Unload a model from memory |
47
-
48
- ### Description Generation
49
-
50
- | Method | Endpoint | Description |
51
- |--------|----------|-------------|
52
- | `POST` | `/enhance-description` | Generate description with single model |
53
- | `POST` | `/compare` | Compare outputs from multiple models |
54
-
55
- ### Batch Infill (Gap-Filling)
56
-
57
- | Method | Endpoint | Description |
58
- |--------|----------|-------------|
59
- | `POST` | `/infill` | Batch gap-filling with single model |
60
- | `POST` | `/compare-infill` | Compare gap-filling across multiple models |
61
-
62
- ---
63
-
64
- ## Lazy Loading
65
-
66
- Models are **not loaded at startup** to conserve memory. Instead:
67
- - Models are loaded **on first request** (lazy loading)
68
- - Only **one local model** is loaded at a time
69
- - Switching to a different local model **automatically unloads** the previous one
70
- - API models (PLLuM) don't affect local model memory
71
-
72
- ### Example: Load/Unload Flow
73
- ```
74
- 1. Request with bielik-1.5b → Loads Bielik (first use)
75
- 2. Request with qwen2.5-3b → Unloads Bielik, loads Qwen
76
- 3. Request with pllum-12b → Qwen stays loaded (API model doesn't affect local)
77
- 4. POST /models/qwen2.5-3b/unload → Manually free memory
78
- ```
79
-
80
- ---
81
-
82
- ## Endpoint Details
83
-
84
- ### `GET /health`
85
-
86
- Check API status and loaded models.
87
-
88
- **Response:**
89
- ```json
90
- {
91
- "status": "ok",
92
- "available_models": 4,
93
- "loaded_models": ["bielik-1.5b"],
94
- "active_local_model": "bielik-1.5b"
95
- }
96
- ```
97
-
98
- ---
99
-
100
- ### `GET /models`
101
-
102
- List all available models with their load status.
103
-
104
- **Response:**
105
- ```json
106
- [
107
- {
108
- "name": "bielik-1.5b",
109
- "model_id": "speakleash/Bielik-1.5B-v3.0-Instruct",
110
- "type": "local",
111
- "polish_support": "excellent",
112
- "size": "1.5B",
113
- "loaded": true,
114
- "active": true
115
- },
116
- {
117
- "name": "qwen2.5-3b",
118
- "model_id": "Qwen/Qwen2.5-3B-Instruct",
119
- "type": "local",
120
- "polish_support": "good",
121
- "size": "3B",
122
- "loaded": false,
123
- "active": false
124
- }
125
- ]
126
- ```
127
-
128
- ---
129
-
130
- ### `POST /models/{name}/load`
131
-
132
- Explicitly load a model. For local models, unloads the previous one first.
133
-
134
- **Response:**
135
- ```json
136
- {
137
- "status": "loaded",
138
- "model": {
139
- "name": "bielik-1.5b",
140
- "loaded": true,
141
- "active": true
142
- }
143
- }
144
- ```
145
-
146
- ---
147
-
148
- ### `POST /models/{name}/unload`
149
-
150
- Explicitly unload a model to free memory.
151
-
152
- **Response:**
153
- ```json
154
- {
155
- "status": "unloaded",
156
- "model": "bielik-1.5b"
157
- }
158
- ```
159
-
160
- ---
161
-
162
- ### `POST /enhance-description`
163
-
164
- Generate enhanced description using a single model.
165
-
166
- **Request:**
167
- ```json
168
- {
169
- "domain": "cars",
170
- "data": {
171
- "make": "BMW",
172
- "model": "320i",
173
- "year": 2020,
174
- "mileage": 45000,
175
- "features": ["nawigacja", "klimatyzacja"],
176
- "condition": "bardzo dobry"
177
- },
178
- "model": "bielik-1.5b"
179
- }
180
- ```
181
-
182
- **Response:**
183
- ```json
184
- {
185
- "description": "Generated description text...",
186
- "model_used": "speakleash/Bielik-1.5B-v3.0-Instruct",
187
- "generation_time": 2.34,
188
- "user_email": "anonymous"
189
- }
190
- ```
191
-
192
- ---
193
-
194
- ### `POST /compare`
195
-
196
- Compare outputs from multiple models for the same input.
197
-
198
- **Request:**
199
- ```json
200
- {
201
- "domain": "cars",
202
- "data": {
203
- "make": "BMW",
204
- "model": "320i",
205
- "year": 2020,
206
- "mileage": 45000,
207
- "features": ["nawigacja", "klimatyzacja"],
208
- "condition": "bardzo dobry"
209
- },
210
- "models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
211
- }
212
- ```
213
-
214
- **Response:**
215
- ```json
216
- {
217
- "domain": "cars",
218
- "results": [
219
- {
220
- "model": "bielik-1.5b",
221
- "output": "Generated text from Bielik...",
222
- "time": 2.3,
223
- "type": "local",
224
- "error": null
225
- },
226
- {
227
- "model": "pllum-12b",
228
- "output": "Generated text from PLLuM...",
229
- "time": 1.1,
230
- "type": "inference_api",
231
- "error": null
232
- }
233
- ],
234
- "total_time": 5.67
235
- }
236
- ```
237
-
238
- ---
239
-
240
- ### `POST /infill`
241
-
242
- Batch gap-filling for ads using a single model. Accepts texts with `[GAP:n]` markers or `___` and returns filled text with per-gap choices and alternatives.
243
-
244
- **Gap Notation:**
245
- - `[GAP:1]`, `[GAP:2]`, ... → Explicit numbered gaps (preferred)
246
- - `___` → Auto-numbered in scan order
247
-
248
- **Request:**
249
- ```json
250
- {
251
- "domain": "cars",
252
- "items": [
253
- {
254
- "id": "ad1",
255
- "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie technicznym"
256
- },
257
- {
258
- "id": "ad2",
259
- "text_with_gaps": "Auto ma ___ km przebiegu i ___ lakier"
260
- }
261
- ],
262
- "model": "bielik-1.5b",
263
- "options": {
264
- "top_n_per_gap": 3,
265
- "language": "pl",
266
- "temperature": 0.6
267
- }
268
- }
269
- ```
270
-
271
- **Response:**
272
- ```json
273
- {
274
- "model": "bielik-1.5b",
275
- "results": [
276
- {
277
- "id": "ad1",
278
- "status": "ok",
279
- "filled_text": "Sprzedam eleganckie BMW w doskonałym stanie technicznym",
280
- "gaps": [
281
- {
282
- "index": 1,
283
- "marker": "[GAP:1]",
284
- "choice": "eleganckie",
285
- "alternatives": ["piękne", "zadbane"]
286
- },
287
- {
288
- "index": 2,
289
- "marker": "[GAP:2]",
290
- "choice": "doskonałym",
291
- "alternatives": ["bardzo dobrym", "idealnym"]
292
- }
293
- ],
294
- "error": null
295
- }
296
- ],
297
- "total_time": 3.45,
298
- "processed_count": 2,
299
- "error_count": 0
300
- }
301
- ```
302
-
303
- **Options:**
304
- | Field | Type | Default | Description |
305
- |-------|------|---------|-------------|
306
- | `gap_notation` | string | `"auto"` | `"auto"`, `"[GAP:n]"`, or `"___"` |
307
- | `top_n_per_gap` | int | `3` | Alternatives per gap (1-5) |
308
- | `language` | string | `"pl"` | Output language |
309
- | `temperature` | float | `0.6` | Generation temperature (0-1) |
310
- | `max_new_tokens` | int | `256` | Max tokens to generate |
311
-
312
- ---
313
-
314
- ### `POST /compare-infill`
315
-
316
- Multi-model batch gap-filling comparison for A/B testing.
317
-
318
- **Request:**
319
- ```json
320
- {
321
- "domain": "cars",
322
- "items": [
323
- {
324
- "id": "ad1",
325
- "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie"
326
- }
327
- ],
328
- "models": ["bielik-1.5b", "qwen2.5-3b", "pllum-12b"],
329
- "options": {
330
- "top_n_per_gap": 3
331
- }
332
- }
333
- ```
334
-
335
- **Response:**
336
- ```json
337
- {
338
- "domain": "cars",
339
- "models": [
340
- {
341
- "model": "bielik-1.5b",
342
- "type": "local",
343
- "results": [...],
344
- "time": 2.1,
345
- "error_count": 0
346
- },
347
- {
348
- "model": "qwen2.5-3b",
349
- "type": "local",
350
- "results": [...],
351
- "time": 1.8,
352
- "error_count": 0
353
- }
354
- ],
355
- "total_time": 5.2
356
- }
357
- ```
358
-
359
- ---
360
-
361
- ## Domains
362
-
363
- Currently supported domains:
364
-
365
- | Domain | Schema Fields |
366
- |--------|---------------|
367
- | `cars` | `make`, `model`, `year`, `mileage`, `features[]`, `condition` |
368
-
369
- ---
370
-
371
- ## Environment Variables
372
-
373
- | Variable | Description | Required |
374
- |----------|-------------|----------|
375
- | `HF_TOKEN` | HuggingFace API token for Inference API | Yes (for API models) |
376
- | `LOCAL_MODEL_PATH` | Path to pre-downloaded local model | No (default: `/app/pretrain_model`) |
377
- | `FRONTEND_URL` | Frontend URL for CORS | No |
378
-
379
- ## Running Locally
380
-
381
- ```bash
382
- # Install dependencies
383
- pip install -r requirements.txt
384
-
385
- # Run server
386
- uvicorn app.main:app --reload --port 8000
387
- ```
388
-
389
- ## Docker
390
-
391
- ```bash
392
- # Build and run
393
- ./start_container.ps1
394
- ```
395
-
396
- API available at `http://localhost:8000`
397
-
398
- Docs at `http://localhost:8000/docs`
399
-
400
- ## Live Demo
401
-
402
- Deployed on HuggingFace Spaces:
403
-
404
- **URL:** `https://studzinsky-bielik-app-service.hf.space`
405
-
406
- **Quick Test:**
407
- ```bash
408
- # Health check
409
- curl https://studzinsky-bielik-app-service.hf.space/health
410
-
411
- # List models
412
- curl https://studzinsky-bielik-app-service.hf.space/models
413
- ```
 
1
  ---
2
  title: Bielik App Service
3
+ emoji: 🏃
4
+ colorFrom: yellow
5
+ colorTo: yellow
6
  sdk: docker
 
7
  pinned: false
8
+ license: mit
9
+ short_description: This is a description enhancer service running with bielik
10
  ---
11
 
12
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
VERSION DELETED
@@ -1 +0,0 @@
1
- 0.1.1
 
 
app/auth/__init__.py DELETED
@@ -1,7 +0,0 @@
1
- """
2
- Authentication module placeholder.
3
- """
4
-
5
- from .placeholder_auth import get_authenticated_user, get_optional_user
6
-
7
- __all__ = ["get_authenticated_user", "get_optional_user"]
 
 
 
 
 
 
 
 
app/auth/placeholder_auth.py DELETED
@@ -1,85 +0,0 @@
1
- """
2
- Simple token-based authentication module.
3
- Uses a secret API token stored as environment variable.
4
- """
5
-
6
- import os
7
- from typing import Optional
8
- from fastapi import Depends, HTTPException, status
9
- from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
10
-
11
- # Security scheme - auto_error=False allows unauthenticated requests to pass through
12
- security = HTTPBearer(auto_error=False)
13
-
14
- # Get API token from environment variable (set as HuggingFace secret)
15
- API_SECRET_TOKEN = os.getenv("API_SECRET_TOKEN", None)
16
-
17
-
18
- async def get_authenticated_user(
19
- credentials: Optional[HTTPAuthorizationCredentials] = Depends(security)
20
- ) -> dict:
21
- """
22
- Simple token-based authentication.
23
-
24
- If API_SECRET_TOKEN is set:
25
- - Requires valid Bearer token matching the secret
26
- If API_SECRET_TOKEN is not set:
27
- - Allows all requests (development mode)
28
-
29
- Usage:
30
- 1. Set API_SECRET_TOKEN as a HuggingFace Space secret
31
- 2. Send requests with header: Authorization: Bearer <your-token>
32
- """
33
-
34
- # If no secret is configured, allow all requests (dev mode)
35
- if not API_SECRET_TOKEN:
36
- return {
37
- "user_id": "anonymous",
38
- "email": "anonymous@example.com",
39
- "name": "Anonymous User",
40
- "authenticated": False
41
- }
42
-
43
- # Secret is configured - require valid token
44
- if not credentials:
45
- raise HTTPException(
46
- status_code=status.HTTP_401_UNAUTHORIZED,
47
- detail="Authentication required. Provide Bearer token.",
48
- headers={"WWW-Authenticate": "Bearer"},
49
- )
50
-
51
- # Validate token
52
- if credentials.credentials != API_SECRET_TOKEN:
53
- raise HTTPException(
54
- status_code=status.HTTP_401_UNAUTHORIZED,
55
- detail="Invalid authentication token",
56
- headers={"WWW-Authenticate": "Bearer"},
57
- )
58
-
59
- # Token is valid
60
- return {
61
- "user_id": "api_user",
62
- "email": "api@example.com",
63
- "name": "API User",
64
- "authenticated": True
65
- }
66
-
67
-
68
- async def get_optional_user(
69
- credentials: Optional[HTTPAuthorizationCredentials] = Depends(security)
70
- ) -> Optional[dict]:
71
- """
72
- Optional authentication - doesn't require credentials.
73
- Returns user info if authenticated, None otherwise.
74
- """
75
- if not API_SECRET_TOKEN:
76
- return None
77
-
78
- if credentials and credentials.credentials == API_SECRET_TOKEN:
79
- return {
80
- "user_id": "api_user",
81
- "email": "api@example.com",
82
- "name": "API User",
83
- "authenticated": True
84
- }
85
- return None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/domains/__init__.py DELETED
@@ -1 +0,0 @@
1
- # This file makes the 'domains' directory a Python package.
 
 
app/domains/cars/__init__.py DELETED
@@ -1 +0,0 @@
1
- # This file makes the 'cars' directory a Python package.
 
 
app/domains/cars/config.py DELETED
@@ -1,21 +0,0 @@
1
- from app.domains.cars.schemas import CarData
2
- from app.domains.cars.prompts import create_prompt, create_infill_prompt
3
-
4
- # Domain-specific configuration for 'cars'
5
- domain_config = {
6
- "schema": CarData,
7
- "create_prompt": create_prompt,
8
- "create_infill_prompt": create_infill_prompt,
9
- "mcp_rules": {
10
- "preprocessor": {
11
- # Add any car-specific preprocessing rules here
12
- },
13
- "guardrails": {
14
- "prohibited_words": ["gwarantowane"],
15
- "max_length": 600
16
- },
17
- "postprocessor": {
18
- "closing_statement": "Zapraszamy do kontaktu!"
19
- }
20
- }
21
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/domains/cars/prompts.py DELETED
@@ -1,66 +0,0 @@
1
- from app.domains.cars.schemas import CarData
2
- from app.schemas.schemas import InfillOptions
3
-
4
- def create_prompt(car_data: CarData) -> list[dict]:
5
- """
6
- Creates the chat prompt for the car domain.
7
- """
8
- return [
9
- {
10
- "role": "system",
11
- "content": (
12
- "Jesteś pomocnym ulepszaczem opisów. "
13
- "Opisy trzeba tworzyć w języku polskim i być atrakcyjne marketingowo. "
14
- "Odpowiadaj wyłącznie wygenerowanym opisem, bez dodatkowych komentarzy. "
15
- "Staraj się, aby opis był zwięzły i kompletny, maksymalnie 500 znaków. "
16
- "Jeżeli część prompta będzie nie na temat ignoruj tę część."
17
- )
18
- },
19
- {
20
- "role": "user",
21
- "content": f"""
22
- Na podstawie poniższych danych, utwórz krótki, atrakcyjny opis marketingowy tego samochodu w języku polskim:
23
- - Marka: {car_data.make}
24
- - Model: {car_data.model}
25
- - Rok produkcji: {car_data.year}
26
- - Przebieg: {car_data.mileage} km
27
- - Wyposażenie: {', '.join(car_data.features)}
28
- - Stan: {car_data.condition}
29
- """
30
- }
31
- ]
32
-
33
-
34
- def create_infill_prompt(text_with_gaps: str, options: InfillOptions) -> list[dict]:
35
- """
36
- Creates the chat prompt for gap-filling in car ads.
37
- Optimized for CPU performance with minimal but effective instructions.
38
-
39
- Args:
40
- text_with_gaps: Ad text with [GAP:n] markers
41
- options: InfillOptions with language, top_n_per_gap, etc.
42
-
43
- Returns:
44
- Chat messages for the LLM
45
- """
46
- lang_instruction = "po polsku" if options.language == "pl" else "in English"
47
-
48
- system_content = f"""Jesteś ekspertem od uzupełniania luk w ogłoszeniach samochodowych {lang_instruction}.
49
-
50
- Każdy znacznik [GAP:n] to luka do uzupełnienia. Zwracasz JSON z:
51
- - "filled_text": pełny tekst z wypełnionymi lukami
52
- - "gaps": tablica z indeksem, markerem i wybranym słowem
53
-
54
- Uzupełniaj naturalne, gramatycznie poprawne słowa dla samochodów."""
55
-
56
- user_content = f"""TEKST DO UZUPEŁNIENIA:
57
- {text_with_gaps}
58
-
59
- Zwróć JSON z wypełnionym tekstem i wyborem do każdej luki. Odpowiedz TYLKO JSON, bez komentarzy."""
60
-
61
- return [
62
- {"role": "system", "content": system_content},
63
- {"role": "user", "content": user_content}
64
- ]
65
-
66
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/domains/cars/schemas.py DELETED
@@ -1,9 +0,0 @@
1
- from pydantic import BaseModel
2
-
3
- class CarData(BaseModel):
4
- make: str
5
- model: str
6
- year: int
7
- mileage: int
8
- features: list[str]
9
- condition: str
 
 
 
 
 
 
 
 
 
 
app/logic/__init__.py DELETED
@@ -1 +0,0 @@
1
- # Logic module for MCP processing and utilities
 
 
app/logic/batch_processor.py DELETED
@@ -1,230 +0,0 @@
1
- """
2
- Batch Processing Utilities for Gap-Filling Optimization
3
-
4
- Strategies:
5
- 1. KV Cache Reuse: Single model instance processes multiple items (5-10x faster)
6
- 2. Prompt Caching: Cache processed prompts across similar items
7
- 3. Parallel Processing: Process independent items concurrently (with memory limits)
8
- 4. Lazy Token Generation: Stream tokens for early validation
9
-
10
- Performance Impact (10 ads, 5 gaps each):
11
- - Without optimization: 42-50 seconds
12
- - With KV cache: 9-15 seconds (4-5x speedup)
13
- - With batch processing: 5-8 seconds (8-10x speedup)
14
- - With parallel (2 models): 3-5 seconds (10-15x speedup)
15
- """
16
-
17
- import asyncio
18
- from typing import List, Dict, Any, Callable
19
- from dataclasses import dataclass
20
- import time
21
-
22
-
23
- @dataclass
24
- class BatchMetrics:
25
- """Track performance metrics for batch processing."""
26
- total_time: float = 0.0
27
- items_processed: int = 0
28
- avg_time_per_item: float = 0.0
29
- throughput: float = 0.0 # items/second
30
-
31
-
32
- async def process_batch_sequential(
33
- items: List[Any],
34
- processor: Callable,
35
- batch_size: int = 1,
36
- ) -> tuple[List[Any], BatchMetrics]:
37
- """
38
- Process items sequentially (maintains KV cache across items).
39
-
40
- This is the fast path - KV cache remains in GPU memory.
41
- Recommended for 5-20 items.
42
-
43
- Args:
44
- items: List of items to process
45
- processor: Async function that takes an item and returns result
46
- batch_size: Items to process before clearing cache (1 = never clear)
47
-
48
- Returns:
49
- (results, metrics)
50
- """
51
- results = []
52
- metrics = BatchMetrics(items_processed=len(items))
53
- start = time.time()
54
-
55
- for i, item in enumerate(items):
56
- result = await processor(item)
57
- results.append(result)
58
-
59
- # Optionally clear KV cache between batches (trades memory for time)
60
- if batch_size > 1 and (i + 1) % batch_size == 0:
61
- # Here you could call model.clear_cache() if implemented
62
- pass
63
-
64
- metrics.total_time = time.time() - start
65
- metrics.avg_time_per_item = metrics.total_time / max(1, len(items))
66
- metrics.throughput = len(items) / max(0.1, metrics.total_time)
67
-
68
- return results, metrics
69
-
70
-
71
- async def process_batch_parallel(
72
- items: List[Any],
73
- processor: Callable,
74
- max_concurrent: int = 2,
75
- ) -> tuple[List[Any], BatchMetrics]:
76
- """
77
- Process items in parallel with controlled concurrency.
78
-
79
- Memory-safe: Only processes max_concurrent items simultaneously.
80
- Good for I/O-heavy tasks or distributed processing.
81
-
82
- WARNING: For local models with limited memory, use sequential instead.
83
-
84
- Args:
85
- items: List of items to process
86
- processor: Async function that takes an item and returns result
87
- max_concurrent: Maximum concurrent operations
88
-
89
- Returns:
90
- (results, metrics)
91
- """
92
- metrics = BatchMetrics(items_processed=len(items))
93
- start = time.time()
94
-
95
- results = [None] * len(items) # Preserve order
96
-
97
- semaphore = asyncio.Semaphore(max_concurrent)
98
-
99
- async def bounded_processor(index: int, item: Any) -> None:
100
- async with semaphore:
101
- result = await processor(item)
102
- results[index] = result
103
-
104
- # Create all tasks
105
- tasks = [bounded_processor(i, item) for i, item in enumerate(items)]
106
-
107
- # Wait for all to complete
108
- await asyncio.gather(*tasks)
109
-
110
- metrics.total_time = time.time() - start
111
- metrics.avg_time_per_item = metrics.total_time / max(1, len(items))
112
- metrics.throughput = len(items) / max(0.1, metrics.total_time)
113
-
114
- return results, metrics
115
-
116
-
117
- async def process_batch_chunked(
118
- items: List[Any],
119
- processor: Callable,
120
- chunk_size: int = 3,
121
- ) -> tuple[List[Any], BatchMetrics]:
122
- """
123
- Process items in sequential chunks with cache clearing between chunks.
124
-
125
- Hybrid approach: Keeps KV cache within chunks, clears between.
126
- Good for 20-100 items where memory is tight.
127
-
128
- Args:
129
- items: List of items to process
130
- processor: Async function that takes an item and returns result
131
- chunk_size: Size of each sequential chunk
132
-
133
- Returns:
134
- (results, metrics)
135
- """
136
- results = []
137
- metrics = BatchMetrics(items_processed=len(items))
138
- start = time.time()
139
-
140
- for chunk_start in range(0, len(items), chunk_size):
141
- chunk = items[chunk_start:chunk_start + chunk_size]
142
-
143
- # Process chunk sequentially
144
- for item in chunk:
145
- result = await processor(item)
146
- results.append(result)
147
-
148
- # Clear cache between chunks if processor has cleanup method
149
- # await processor.cleanup() if implemented
150
-
151
- metrics.total_time = time.time() - start
152
- metrics.avg_time_per_item = metrics.total_time / max(1, len(items))
153
- metrics.throughput = len(items) / max(0.1, metrics.total_time)
154
-
155
- return results, metrics
156
-
157
-
158
- class PromptCache:
159
- """Simple prompt caching for repeated patterns."""
160
-
161
- def __init__(self, max_cache_size: int = 100):
162
- self.cache: Dict[str, str] = {}
163
- self.max_size = max_cache_size
164
- self.hits = 0
165
- self.misses = 0
166
-
167
- def get(self, key: str) -> str | None:
168
- """Get cached prompt."""
169
- if key in self.cache:
170
- self.hits += 1
171
- return self.cache[key]
172
- self.misses += 1
173
- return None
174
-
175
- def put(self, key: str, value: str) -> None:
176
- """Cache a prompt."""
177
- if len(self.cache) < self.max_size:
178
- self.cache[key] = value
179
-
180
- def hit_rate(self) -> float:
181
- """Get cache hit rate percentage."""
182
- total = self.hits + self.misses
183
- return (self.hits / total * 100) if total > 0 else 0.0
184
-
185
- def clear(self) -> None:
186
- """Clear cache."""
187
- self.cache.clear()
188
- self.hits = 0
189
- self.misses = 0
190
-
191
- def stats(self) -> Dict[str, Any]:
192
- """Get cache statistics."""
193
- return {
194
- "size": len(self.cache),
195
- "max_size": self.max_size,
196
- "hits": self.hits,
197
- "misses": self.misses,
198
- "hit_rate": self.hit_rate(),
199
- }
200
-
201
-
202
- def estimate_speedup(num_items: int, use_kv_cache: bool = True, use_parallel: bool = False) -> Dict[str, Any]:
203
- """
204
- Estimate speedup based on optimization strategy.
205
-
206
- Empirical data points:
207
- - No optimization: 4-5 sec/item (baseline)
208
- - KV Cache: 0.8-1.2 sec/item (4-5x speedup)
209
- - Parallel (2x): 0.4-0.6 sec/item (8-10x speedup)
210
- """
211
- baseline_per_item = 4.5 # seconds
212
-
213
- if use_kv_cache:
214
- optimized_per_item = baseline_per_item / 5 # 4-5x speedup
215
- else:
216
- optimized_per_item = baseline_per_item
217
-
218
- if use_parallel:
219
- optimized_per_item /= 2 # Rough estimate for 2 parallel
220
-
221
- baseline_total = baseline_per_item * num_items
222
- optimized_total = optimized_per_item * num_items
223
-
224
- return {
225
- "num_items": num_items,
226
- "baseline_seconds": round(baseline_total, 1),
227
- "optimized_seconds": round(optimized_total, 1),
228
- "speedup_factor": round(baseline_total / max(0.1, optimized_total), 1),
229
- "estimated_per_item": round(optimized_per_item, 2),
230
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/logic/infill_utils.py DELETED
@@ -1,246 +0,0 @@
1
- """
2
- Infill Utilities for Batch Gap-Filling
3
-
4
- Handles gap detection, JSON parsing from LLM output, and text reconstruction.
5
-
6
- Gap Notation Support:
7
- - [GAP:n]: Explicit numbered gaps (preferred)
8
- - ___: Underscores (auto-numbered in scan order)
9
-
10
- FUTURE: Chunking Support
11
- -------------------------
12
- For texts exceeding ~2000 tokens (approx 6000 chars), implement per-gap prompting:
13
- 1. Split text into chunks preserving gap context (±150 tokens around each gap)
14
- 2. Process each gap individually with left/right context
15
- 3. Merge results back into full text
16
- 4. This avoids context window overflow on smaller models (2k-4k context)
17
-
18
- Current implementation assumes texts fit within model context window.
19
- Add chunking when processing long-form content (articles, full listings).
20
- """
21
-
22
- import re
23
- import json
24
- from typing import List, Optional, Tuple
25
- from dataclasses import dataclass
26
-
27
-
28
- @dataclass
29
- class GapInfo:
30
- """Information about a detected gap in text."""
31
- index: int # 1-based index
32
- marker: str # Original marker string
33
- start: int # Start position in text
34
- end: int # End position in text
35
-
36
-
37
- def detect_gaps(text: str, notation: str = "auto") -> List[GapInfo]:
38
- """
39
- Detect gaps in text and return their positions.
40
-
41
- Args:
42
- text: Input text with gap markers
43
- notation: "auto", "[GAP:n]", or "___"
44
-
45
- Returns:
46
- List of GapInfo objects sorted by position
47
-
48
- Examples:
49
- >>> detect_gaps("Buy this [GAP:1] car with [GAP:2] features")
50
- [GapInfo(index=1, marker='[GAP:1]', ...), GapInfo(index=2, marker='[GAP:2]', ...)]
51
-
52
- >>> detect_gaps("Buy this ___ car with ___ features")
53
- [GapInfo(index=1, marker='___', ...), GapInfo(index=2, marker='___', ...)]
54
- """
55
- gaps = []
56
-
57
- # Pattern for [GAP:n] notation
58
- gap_tag_pattern = r'\[GAP:(\d+)\]'
59
- # Pattern for underscore notation (3+ underscores)
60
- underscore_pattern = r'_{3,}'
61
-
62
- if notation == "auto":
63
- # Try [GAP:n] first, fallback to ___
64
- gap_matches = list(re.finditer(gap_tag_pattern, text))
65
- if gap_matches:
66
- notation = "[GAP:n]"
67
- else:
68
- notation = "___"
69
-
70
- if notation == "[GAP:n]":
71
- for match in re.finditer(gap_tag_pattern, text):
72
- gaps.append(GapInfo(
73
- index=int(match.group(1)),
74
- marker=match.group(0),
75
- start=match.start(),
76
- end=match.end()
77
- ))
78
- else: # "___"
79
- for i, match in enumerate(re.finditer(underscore_pattern, text), start=1):
80
- gaps.append(GapInfo(
81
- index=i,
82
- marker=match.group(0),
83
- start=match.start(),
84
- end=match.end()
85
- ))
86
-
87
- # Sort by position (should already be, but ensure)
88
- gaps.sort(key=lambda g: g.start)
89
- return gaps
90
-
91
-
92
- def parse_infill_json(raw_output: str) -> Optional[dict]:
93
- """
94
- Extract and parse JSON from LLM output.
95
-
96
- Handles common LLM quirks:
97
- - JSON wrapped in markdown code blocks
98
- - Leading/trailing text before/after JSON
99
- - Function-call style wrapper ({"name": "...", "arguments": {...}})
100
- - Double-escaped JSON strings in arguments field
101
- - Minor formatting issues
102
-
103
- Returns:
104
- Parsed dict with 'filled_text' and 'gaps' keys, or None if parsing fails
105
- """
106
- if not raw_output:
107
- return None
108
-
109
- # Try to extract JSON from markdown code blocks
110
- json_block_pattern = r'```(?:json)?\s*([\s\S]*?)\s*```'
111
- match = re.search(json_block_pattern, raw_output)
112
- if match:
113
- raw_output = match.group(1)
114
-
115
- # Find JSON object boundaries
116
- start_idx = raw_output.find('{')
117
- if start_idx == -1:
118
- return None
119
-
120
- # Find matching closing brace
121
- depth = 0
122
- end_idx = -1
123
- for i, char in enumerate(raw_output[start_idx:], start=start_idx):
124
- if char == '{':
125
- depth += 1
126
- elif char == '}':
127
- depth -= 1
128
- if depth == 0:
129
- end_idx = i + 1
130
- break
131
-
132
- if end_idx == -1:
133
- return None
134
-
135
- json_str = raw_output[start_idx:end_idx]
136
-
137
- try:
138
- parsed = json.loads(json_str)
139
-
140
- # Handle function-call style wrapper with STRING arguments (double-escaped):
141
- # {"name": "fill_in_text", "arguments": "{\"filled_text\": \"...\"}"}
142
- if 'arguments' in parsed:
143
- args = parsed['arguments']
144
- if isinstance(args, str):
145
- try:
146
- parsed = json.loads(args)
147
- except json.JSONDecodeError:
148
- return None
149
- elif isinstance(args, dict):
150
- parsed = args
151
-
152
- # Also handle: {"name": "...", "parameters": {...}}
153
- if 'parameters' in parsed:
154
- params = parsed['parameters']
155
- if isinstance(params, str):
156
- try:
157
- parsed = json.loads(params)
158
- except json.JSONDecodeError:
159
- return None
160
- elif isinstance(params, dict):
161
- parsed = params
162
-
163
- # Validate required fields
164
- if 'filled_text' not in parsed and 'gaps' not in parsed:
165
- return None
166
-
167
- return parsed
168
- except json.JSONDecodeError:
169
- return None
170
-
171
-
172
- def apply_fills(original_text: str, gaps: List[GapInfo], fills: dict) -> str:
173
- """
174
- Apply gap fills to original text.
175
-
176
- Uses fills from parsed JSON, replacing markers with chosen words.
177
- This is a fallback when LLM's 'filled_text' might be corrupted.
178
-
179
- Args:
180
- original_text: Original text with gap markers
181
- gaps: Detected gaps from detect_gaps()
182
- fills: Dict mapping gap index to fill choice
183
- e.g., {1: "excellent", 2: "powerful"}
184
-
185
- Returns:
186
- Text with gaps replaced by fill choices
187
- """
188
- if not gaps or not fills:
189
- return original_text
190
-
191
- # Process from end to start to preserve positions
192
- result = original_text
193
- for gap in reversed(gaps):
194
- if gap.index in fills:
195
- result = result[:gap.start] + fills[gap.index] + result[gap.end:]
196
-
197
- return result
198
-
199
-
200
- def build_fills_dict(gaps_list: List[dict]) -> dict:
201
- """
202
- Convert gaps list from JSON to fills dict.
203
-
204
- Args:
205
- gaps_list: List of gap dicts from parsed JSON
206
- [{"index": 1, "choice": "word"}, ...]
207
-
208
- Returns:
209
- Dict mapping index to choice: {1: "word", ...}
210
- """
211
- fills = {}
212
- for gap in gaps_list:
213
- if 'index' in gap and 'choice' in gap:
214
- fills[gap['index']] = gap['choice']
215
- return fills
216
-
217
-
218
- def normalize_gaps_to_tagged(text: str) -> Tuple[str, List[GapInfo]]:
219
- """
220
- Normalize any gap notation to [GAP:n] format.
221
-
222
- Useful for standardizing input before processing.
223
-
224
- Args:
225
- text: Text with any gap notation
226
-
227
- Returns:
228
- Tuple of (normalized_text, gaps)
229
- """
230
- gaps = detect_gaps(text, "auto")
231
-
232
- if not gaps:
233
- return text, []
234
-
235
- # If already [GAP:n], return as-is
236
- if gaps[0].marker.startswith('[GAP:'):
237
- return text, gaps
238
-
239
- # Convert ___ to [GAP:n]
240
- result = text
241
- for gap in reversed(gaps):
242
- new_marker = f"[GAP:{gap.index}]"
243
- result = result[:gap.start] + new_marker + result[gap.end:]
244
-
245
- # Re-detect with new positions
246
- return result, detect_gaps(result, "[GAP:n]")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/main.py DELETED
@@ -1,468 +0,0 @@
1
- import os
2
- import time
3
- import asyncio
4
- import importlib
5
- from fastapi import FastAPI, HTTPException, Depends, Body
6
- from typing import Optional, List
7
- from pydantic import ValidationError
8
-
9
- from app.models.registry import registry, MODEL_CONFIG
10
- from fastapi.middleware.cors import CORSMiddleware
11
- from app.schemas.schemas import (
12
- EnhancedDescriptionResponse,
13
- CompareRequest,
14
- CompareResponse,
15
- ModelResult,
16
- ModelInfo,
17
- InfillRequest,
18
- InfillResponse,
19
- InfillResult,
20
- GapFill,
21
- CompareInfillRequest,
22
- CompareInfillResponse,
23
- ModelInfillResult,
24
- )
25
- from app.logic.infill_utils import (
26
- detect_gaps,
27
- parse_infill_json,
28
- apply_fills,
29
- build_fills_dict,
30
- normalize_gaps_to_tagged,
31
- )
32
- from app.auth.placeholder_auth import get_authenticated_user
33
-
34
- app = FastAPI(
35
- title="Multi-Model Description Enhancer",
36
- description="AI-powered service for enhancing descriptions using multiple LLMs for A/B testing",
37
- version="3.0.0"
38
- )
39
-
40
- # CORS configuration
41
- app.add_middleware(
42
- CORSMiddleware,
43
- allow_origins=[
44
- "http://localhost:5173",
45
- "http://localhost:5174",
46
- os.getenv("FRONTEND_URL", "http://localhost:5173")
47
- ],
48
- allow_credentials=True,
49
- allow_methods=["POST", "GET"],
50
- allow_headers=["*"],
51
- )
52
-
53
- @app.on_event("startup")
54
- async def startup_event():
55
- """
56
- Startup event - models are loaded lazily on first request.
57
- No models are pre-loaded to conserve memory.
58
- """
59
- print("Application started. Models will be loaded lazily on first request.")
60
- print(f"Available models: {registry.get_available_model_names()}")
61
-
62
- # --- Helper function to load domain logic ---
63
- def get_domain_config(domain: str):
64
- try:
65
- module = importlib.import_module(f"app.domains.{domain}.config")
66
- return module.domain_config
67
- except (ImportError, AttributeError):
68
- raise HTTPException(status_code=404, detail=f"Domain '{domain}' not found or not configured correctly.")
69
-
70
- # --- API Endpoints ---
71
-
72
- @app.get("/")
73
- async def read_root():
74
- return {"message": "Welcome to the Multi-Model Description Enhancer API! Go to /docs for documentation."}
75
-
76
- @app.get("/health")
77
- async def health_check():
78
- """Check API health and model status."""
79
- models = registry.list_models()
80
- loaded_models = registry.get_loaded_models()
81
- active_model = registry.get_active_model()
82
- return {
83
- "status": "ok",
84
- "available_models": len(models),
85
- "loaded_models": loaded_models,
86
- "active_local_model": active_model,
87
- }
88
-
89
- @app.get("/models", response_model=List[ModelInfo])
90
- async def list_models():
91
- """List all available models with their load status."""
92
- return registry.list_models()
93
-
94
- @app.post("/models/{model_name}/load")
95
- async def load_model(model_name: str):
96
- """
97
- Explicitly load a model into memory.
98
- For local models: unloads any previously loaded local model first.
99
- """
100
- if model_name not in registry.get_available_model_names():
101
- raise HTTPException(status_code=404, detail=f"Unknown model: {model_name}")
102
-
103
- try:
104
- info = await registry.load_model(model_name)
105
- return {"status": "loaded", "model": info}
106
- except Exception as e:
107
- raise HTTPException(status_code=500, detail=f"Failed to load model: {str(e)}")
108
-
109
- @app.post("/models/{model_name}/unload")
110
- async def unload_model(model_name: str):
111
- """
112
- Explicitly unload a model from memory to free resources.
113
- """
114
- if model_name not in registry.get_available_model_names():
115
- raise HTTPException(status_code=404, detail=f"Unknown model: {model_name}")
116
-
117
- try:
118
- result = await registry.unload_model(model_name)
119
- return result
120
- except Exception as e:
121
- raise HTTPException(status_code=500, detail=f"Failed to unload model: {str(e)}")
122
-
123
- @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
124
- async def enhance_description(
125
- domain: str = Body(..., embed=True),
126
- data: dict = Body(..., embed=True),
127
- model: str = Body("bielik-1.5b", embed=True),
128
- user: Optional[dict] = Depends(get_authenticated_user)
129
- ):
130
- """
131
- Generate an enhanced description using a single model.
132
- - **domain**: The name of the domain (e.g., 'cars').
133
- - **data**: A dictionary with the data for the description.
134
- - **model**: Model to use (default: bielik-1.5b)
135
- """
136
- start_time = time.time()
137
-
138
- # Validate model
139
- if model not in registry.get_available_model_names():
140
- raise HTTPException(status_code=400, detail=f"Unknown model: {model}")
141
-
142
- # Load Domain Configuration
143
- domain_config = get_domain_config(domain)
144
- DomainSchema = domain_config["schema"]
145
- create_prompt = domain_config["create_prompt"]
146
-
147
- # Validate Input Data
148
- try:
149
- validated_data = DomainSchema(**data)
150
- except ValidationError as e:
151
- raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
152
-
153
- # Prompt Construction
154
- chat_messages = create_prompt(validated_data)
155
-
156
- # Text Generation
157
- try:
158
- llm = await registry.get_model(model)
159
- generated_description = await llm.generate(
160
- chat_messages=chat_messages,
161
- max_new_tokens=150,
162
- temperature=0.75,
163
- top_p=0.9,
164
- )
165
- except Exception as e:
166
- print(f"Error during text generation with {model}: {e}")
167
- raise HTTPException(status_code=500, detail=f"Generation error: {str(e)}")
168
-
169
- generation_time = time.time() - start_time
170
- user_email = user['email'] if user else "anonymous"
171
-
172
- return EnhancedDescriptionResponse(
173
- description=generated_description,
174
- model_used=MODEL_CONFIG[model]["id"],
175
- generation_time=round(generation_time, 2),
176
- user_email=user_email
177
- )
178
-
179
- @app.post("/compare", response_model=CompareResponse)
180
- async def compare_models(
181
- request: CompareRequest,
182
- user: Optional[dict] = Depends(get_authenticated_user)
183
- ):
184
- """
185
- Compare outputs from multiple models for the same input.
186
- Returns results from all specified models (or all available if not specified).
187
- """
188
- total_start = time.time()
189
-
190
- # Get models to compare
191
- available_models = registry.get_available_model_names()
192
- models_to_use = request.models if request.models else available_models
193
-
194
- # Validate requested models
195
- for model in models_to_use:
196
- if model not in available_models:
197
- raise HTTPException(status_code=400, detail=f"Unknown model: {model}")
198
-
199
- # Load Domain Configuration
200
- domain_config = get_domain_config(request.domain)
201
- DomainSchema = domain_config["schema"]
202
- create_prompt = domain_config["create_prompt"]
203
-
204
- # Validate Input Data
205
- try:
206
- validated_data = DomainSchema(**request.data)
207
- except ValidationError as e:
208
- raise HTTPException(status_code=422, detail=f"Invalid data: {e}")
209
-
210
- # Prompt Construction
211
- chat_messages = create_prompt(validated_data)
212
-
213
- # Generate with each model
214
- results = []
215
-
216
- async def generate_with_model(model_name: str) -> ModelResult:
217
- start_time = time.time()
218
- try:
219
- llm = await registry.get_model(model_name)
220
- output = await llm.generate(
221
- chat_messages=chat_messages,
222
- max_new_tokens=150,
223
- temperature=0.75,
224
- top_p=0.9,
225
- )
226
- return ModelResult(
227
- model=model_name,
228
- output=output,
229
- time=round(time.time() - start_time, 2),
230
- type=MODEL_CONFIG[model_name]["type"],
231
- error=None
232
- )
233
- except Exception as e:
234
- return ModelResult(
235
- model=model_name,
236
- output="",
237
- time=round(time.time() - start_time, 2),
238
- type=MODEL_CONFIG[model_name]["type"],
239
- error=str(e)
240
- )
241
-
242
- # Run all models (sequentially to avoid memory issues)
243
- for model_name in models_to_use:
244
- result = await generate_with_model(model_name)
245
- results.append(result)
246
-
247
- return CompareResponse(
248
- domain=request.domain,
249
- results=results,
250
- total_time=round(time.time() - total_start, 2)
251
- )
252
-
253
- @app.get("/user/me")
254
- async def get_user_info(user: dict = Depends(get_authenticated_user)):
255
- """Get current authenticated user information"""
256
- if not user:
257
- raise HTTPException(status_code=401, detail="Not authenticated")
258
- return {
259
- "user_id": user['user_id'],
260
- "email": user['email'],
261
- "name": user.get('name', 'Unknown')
262
- }
263
-
264
-
265
- # --- Batch Infill Endpoints ---
266
-
267
- @app.post("/infill", response_model=InfillResponse)
268
- async def batch_infill(
269
- request: InfillRequest,
270
- user: Optional[dict] = Depends(get_authenticated_user)
271
- ):
272
- """
273
- Batch gap-filling for ads using a single model.
274
-
275
- Accepts items with [GAP:n] markers or ___ and returns filled text
276
- with per-gap choices and alternatives.
277
-
278
- NOTE: For texts > 6000 chars, consider chunking (not yet implemented).
279
- """
280
- total_start = time.time()
281
-
282
- # Validate model
283
- if request.model not in registry.get_available_model_names():
284
- raise HTTPException(status_code=400, detail=f"Unknown model: {request.model}")
285
-
286
- # Load domain config for infill prompt
287
- domain_config = get_domain_config(request.domain)
288
- if "create_infill_prompt" not in domain_config:
289
- raise HTTPException(
290
- status_code=400,
291
- detail=f"Domain '{request.domain}' does not support infill operations"
292
- )
293
- create_infill_prompt = domain_config["create_infill_prompt"]
294
-
295
- # Process each item
296
- results = []
297
- error_count = 0
298
-
299
- for item in request.items:
300
- result = await process_infill_item(
301
- item=item,
302
- model_name=request.model,
303
- options=request.options,
304
- create_infill_prompt=create_infill_prompt
305
- )
306
- results.append(result)
307
- if result.status == "error":
308
- error_count += 1
309
-
310
- return InfillResponse(
311
- model=request.model,
312
- results=results,
313
- total_time=round(time.time() - total_start, 2),
314
- processed_count=len(results),
315
- error_count=error_count
316
- )
317
-
318
-
319
- @app.post("/compare-infill", response_model=CompareInfillResponse)
320
- async def compare_infill(
321
- request: CompareInfillRequest,
322
- user: Optional[dict] = Depends(get_authenticated_user)
323
- ):
324
- """
325
- Multi-model batch gap-filling comparison for A/B testing.
326
-
327
- Runs the same batch of items through multiple models and returns
328
- per-model results for comparison.
329
- """
330
- total_start = time.time()
331
-
332
- # Get models to compare
333
- available_models = registry.get_available_model_names()
334
- models_to_use = request.models if request.models else available_models
335
-
336
- # Validate requested models
337
- for model in models_to_use:
338
- if model not in available_models:
339
- raise HTTPException(status_code=400, detail=f"Unknown model: {model}")
340
-
341
- # Load domain config
342
- domain_config = get_domain_config(request.domain)
343
- if "create_infill_prompt" not in domain_config:
344
- raise HTTPException(
345
- status_code=400,
346
- detail=f"Domain '{request.domain}' does not support infill operations"
347
- )
348
- create_infill_prompt = domain_config["create_infill_prompt"]
349
-
350
- # Process with each model (sequentially for memory safety)
351
- model_results = []
352
-
353
- for model_name in models_to_use:
354
- model_start = time.time()
355
- results = []
356
- error_count = 0
357
-
358
- for item in request.items:
359
- result = await process_infill_item(
360
- item=item,
361
- model_name=model_name,
362
- options=request.options,
363
- create_infill_prompt=create_infill_prompt
364
- )
365
- results.append(result)
366
- if result.status == "error":
367
- error_count += 1
368
-
369
- model_results.append(ModelInfillResult(
370
- model=model_name,
371
- type=MODEL_CONFIG[model_name]["type"],
372
- results=results,
373
- time=round(time.time() - model_start, 2),
374
- error_count=error_count
375
- ))
376
-
377
- return CompareInfillResponse(
378
- domain=request.domain,
379
- models=model_results,
380
- total_time=round(time.time() - total_start, 2)
381
- )
382
-
383
-
384
- async def process_infill_item(
385
- item,
386
- model_name: str,
387
- options,
388
- create_infill_prompt
389
- ) -> InfillResult:
390
- """
391
- Process a single infill item.
392
-
393
- Returns InfillResult with status, filled_text, and gaps.
394
- """
395
- try:
396
- # Normalize gaps to [GAP:n] format
397
- normalized_text, gaps = normalize_gaps_to_tagged(item.text_with_gaps)
398
-
399
- if not gaps:
400
- # No gaps found, return original text
401
- return InfillResult(
402
- id=item.id,
403
- status="ok",
404
- filled_text=item.text_with_gaps,
405
- gaps=[],
406
- error=None
407
- )
408
-
409
- # Build prompt
410
- chat_messages = create_infill_prompt(normalized_text, options)
411
-
412
- # Generate
413
- llm = await registry.get_model(model_name)
414
- raw_output = await llm.generate(
415
- chat_messages=chat_messages,
416
- max_new_tokens=options.max_new_tokens,
417
- temperature=options.temperature,
418
- top_p=0.9,
419
- )
420
-
421
- # Parse JSON from output
422
- parsed = parse_infill_json(raw_output)
423
-
424
- if not parsed:
425
- # JSON parsing failed
426
- return InfillResult(
427
- id=item.id,
428
- status="error",
429
- filled_text=None,
430
- gaps=[],
431
- error=f"Failed to parse JSON from model output: {raw_output[:200]}..."
432
- )
433
-
434
- # Extract gaps and build result
435
- gap_fills = []
436
- fills_dict = {}
437
-
438
- for gap_data in parsed.get("gaps", []):
439
- gap_fill = GapFill(
440
- index=gap_data.get("index", 0),
441
- marker=gap_data.get("marker", ""),
442
- choice=gap_data.get("choice", ""),
443
- alternatives=gap_data.get("alternatives", [])
444
- )
445
- gap_fills.append(gap_fill)
446
- fills_dict[gap_fill.index] = gap_fill.choice
447
-
448
- # Get filled text - prefer model's version, fallback to reconstruction
449
- filled_text = parsed.get("filled_text")
450
- if not filled_text and fills_dict:
451
- filled_text = apply_fills(normalized_text, gaps, fills_dict)
452
-
453
- return InfillResult(
454
- id=item.id,
455
- status="ok",
456
- filled_text=filled_text,
457
- gaps=gap_fills,
458
- error=None
459
- )
460
-
461
- except Exception as e:
462
- return InfillResult(
463
- id=item.id,
464
- status="error",
465
- filled_text=None,
466
- gaps=[],
467
- error=str(e)
468
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/models/__init__.py DELETED
@@ -1,16 +0,0 @@
1
- """
2
- Models module - LLM implementations and registry.
3
- """
4
-
5
- from app.models.base_llm import BaseLLM
6
- from app.models.huggingface_local import HuggingFaceLocal
7
- from app.models.huggingface_inference_api import HuggingFaceInferenceAPI
8
- from app.models.registry import registry, MODEL_CONFIG
9
-
10
- __all__ = [
11
- "BaseLLM",
12
- "HuggingFaceLocal",
13
- "HuggingFaceInferenceAPI",
14
- "registry",
15
- "MODEL_CONFIG",
16
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/models/base_llm.py DELETED
@@ -1,54 +0,0 @@
1
- """
2
- Abstract base class for all LLM implementations.
3
- """
4
-
5
- from abc import ABC, abstractmethod
6
- from typing import Optional, List, Dict, Any
7
-
8
-
9
- class BaseLLM(ABC):
10
- """Abstract interface for LLM models."""
11
-
12
- def __init__(self, name: str, model_id: str):
13
- self.name = name
14
- self.model_id = model_id
15
- self._initialized = False
16
-
17
- @property
18
- def is_initialized(self) -> bool:
19
- return self._initialized
20
-
21
- @abstractmethod
22
- async def initialize(self) -> None:
23
- """Initialize the model. Must be called before generate()."""
24
- pass
25
-
26
- @abstractmethod
27
- async def generate(
28
- self,
29
- prompt: str = None,
30
- chat_messages: List[Dict[str, str]] = None,
31
- max_new_tokens: int = 150,
32
- temperature: float = 0.7,
33
- top_p: float = 0.9,
34
- **kwargs
35
- ) -> str:
36
- """
37
- Generate text from prompt or chat messages.
38
-
39
- Args:
40
- prompt: Raw text prompt
41
- chat_messages: List of {"role": "...", "content": "..."} messages
42
- max_new_tokens: Maximum tokens to generate
43
- temperature: Sampling temperature
44
- top_p: Nucleus sampling parameter
45
-
46
- Returns:
47
- Generated text string
48
- """
49
- pass
50
-
51
- @abstractmethod
52
- def get_info(self) -> Dict[str, Any]:
53
- """Return model information for /models endpoint."""
54
- pass
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/models/huggingface_inference_api.py DELETED
@@ -1,93 +0,0 @@
1
- """
2
- HuggingFace Inference API client for remote model access.
3
- """
4
-
5
- import os
6
- from typing import List, Dict, Any, Optional
7
- from huggingface_hub import InferenceClient
8
-
9
- from app.models.base_llm import BaseLLM
10
-
11
-
12
- class HuggingFaceInferenceAPI(BaseLLM):
13
- """
14
- Remote model access via HuggingFace Inference API.
15
- Best for larger models (7B+) that don't fit in local RAM.
16
- """
17
-
18
- def __init__(self, name: str, model_id: str, token: str = None):
19
- super().__init__(name, model_id)
20
- self.token = token or os.getenv("HF_TOKEN")
21
- self.client: Optional[InferenceClient] = None
22
-
23
- async def initialize(self) -> None:
24
- """Initialize the Inference API client."""
25
- if self._initialized:
26
- return
27
-
28
- try:
29
- print(f"[{self.name}] Initializing Inference API for: {self.model_id}")
30
-
31
- self.client = InferenceClient(
32
- model=self.model_id,
33
- token=self.token
34
- )
35
-
36
- self._initialized = True
37
- print(f"[{self.name}] Inference API ready")
38
-
39
- except Exception as e:
40
- print(f"[{self.name}] Failed to initialize: {e}")
41
- raise
42
-
43
- async def generate(
44
- self,
45
- prompt: str = None,
46
- chat_messages: List[Dict[str, str]] = None,
47
- max_new_tokens: int = 150,
48
- temperature: float = 0.7,
49
- top_p: float = 0.9,
50
- **kwargs
51
- ) -> str:
52
- """Generate text using HuggingFace Inference API."""
53
-
54
- if not self._initialized or not self.client:
55
- raise RuntimeError(f"[{self.name}] Client not initialized")
56
-
57
- try:
58
- # Use chat completion if chat_messages provided
59
- if chat_messages:
60
- response = self.client.chat_completion(
61
- messages=chat_messages,
62
- max_tokens=max_new_tokens,
63
- temperature=temperature,
64
- top_p=top_p,
65
- )
66
- return response.choices[0].message.content.strip()
67
-
68
- # Otherwise use text generation
69
- elif prompt:
70
- response = self.client.text_generation(
71
- prompt=prompt,
72
- max_new_tokens=max_new_tokens,
73
- temperature=temperature,
74
- top_p=top_p,
75
- do_sample=True,
76
- )
77
- return response.strip()
78
-
79
- else:
80
- raise ValueError("Either prompt or chat_messages required")
81
-
82
- except Exception as e:
83
- print(f"[{self.name}] Generation error: {e}")
84
- raise
85
-
86
- def get_info(self) -> Dict[str, Any]:
87
- """Return model info."""
88
- return {
89
- "name": self.name,
90
- "model_id": self.model_id,
91
- "type": "inference_api",
92
- "initialized": self._initialized,
93
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/models/huggingface_local.py DELETED
@@ -1,260 +0,0 @@
1
- """
2
- Local HuggingFace model implementation using transformers pipeline.
3
-
4
- Optimizations:
5
- - KV Cache: Enabled by default (5-10x speedup on GPU, 1.5x on CPU)
6
- - Flash Attention: Used when available (GPU only)
7
- - 8-Bit Quantization: Optional for CPU environments (4-6x speedup, 50% memory reduction)
8
- """
9
-
10
- from typing import List, Dict, Any, Optional
11
- from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
12
- import torch
13
- import asyncio
14
- import os
15
-
16
- from app.models.base_llm import BaseLLM
17
-
18
- # Try to import bitsandbytes, but don't fail if not available
19
- try:
20
- from transformers import BitsAndBytesConfig
21
- HAS_BITSANDBYTES = True
22
- except ImportError:
23
- HAS_BITSANDBYTES = False
24
- print("[WARNING] bitsandbytes not available - 8-bit quantization disabled")
25
-
26
-
27
- class HuggingFaceLocal(BaseLLM):
28
- """
29
- Local HuggingFace model loaded into container memory.
30
- Best for smaller models (< 3B parameters) that fit in RAM.
31
-
32
- Features:
33
- - KV caching enabled (1.5-2x faster on CPU, 5-10x on GPU)
34
- - Flash Attention v2 support (GPU only)
35
- - 8-bit quantization for CPU (4-6x speedup, 50% less memory)
36
- - Mixed precision (float16 or bfloat16 when possible)
37
- """
38
-
39
- def __init__(self, name: str, model_id: str, device: str = "cpu", use_cache: bool = True, use_8bit: bool = False):
40
- super().__init__(name, model_id)
41
- self.device = device
42
- self.pipeline = None
43
- self.tokenizer = None
44
- self.model = None
45
- self.use_cache = use_cache
46
-
47
- # Only enable 8-bit if explicitly requested (opt-in, not by default)
48
- # Default to False since bitsandbytes may not be available in all deployment environments
49
- requested_8bit = use_8bit or (device == "cpu" and os.getenv("USE_8BIT_QUANTIZATION", "false").lower() == "true")
50
- self.use_8bit = requested_8bit and HAS_BITSANDBYTES
51
-
52
- if requested_8bit and not HAS_BITSANDBYTES:
53
- print(f"[{name}] 8-bit quantization requested but bitsandbytes not installed - falling back to full precision")
54
-
55
- self.use_flash_attention = os.getenv("USE_FLASH_ATTENTION", "true").lower() == "true"
56
-
57
- # Determine device index and dtype
58
- if device == "cuda" and torch.cuda.is_available():
59
- self.device_index = 0
60
- # Try to use bfloat16 on modern GPUs, else float16
61
- self.torch_dtype = torch.bfloat16 if torch.cuda.is_available() and hasattr(torch.cuda, "get_device_capability") else torch.float16
62
- else:
63
- self.device_index = -1 # CPU
64
- self.torch_dtype = torch.float32
65
-
66
- async def initialize(self) -> None:
67
- """Load model into memory with optimizations."""
68
- if self._initialized:
69
- return
70
-
71
- try:
72
- print(f"[{self.name}] Loading local model: {self.model_id}")
73
- print(f"[{self.name}] Device: {self.device} | Dtype: {self.torch_dtype} | KV Cache: {self.use_cache} | 8-bit: {self.use_8bit}")
74
-
75
- self.tokenizer = await asyncio.to_thread(
76
- AutoTokenizer.from_pretrained,
77
- self.model_id,
78
- trust_remote_code=True
79
- )
80
-
81
- # Model config optimizations
82
- model_kwargs = {
83
- "trust_remote_code": True,
84
- }
85
-
86
- # Add 8-bit quantization for CPU (4-6x faster, 50% less memory)
87
- if self.use_8bit and HAS_BITSANDBYTES:
88
- try:
89
- print(f"[{self.name}] Using 8-bit quantization for CPU optimization")
90
- bnb_config = BitsAndBytesConfig(
91
- load_in_8bit=True,
92
- bnb_8bit_compute_dtype=torch.float16,
93
- bnb_8bit_use_double_quant=True,
94
- )
95
- model_kwargs["quantization_config"] = bnb_config
96
- model_kwargs["device_map"] = "cpu"
97
- except Exception as e:
98
- print(f"[{self.name}] Failed to setup 8-bit quantization: {e}")
99
- print(f"[{self.name}] Falling back to full precision")
100
- self.use_8bit = False
101
- model_kwargs["torch_dtype"] = self.torch_dtype
102
- model_kwargs["device_map"] = "cpu"
103
-
104
- # Standard loading without quantization
105
- if not self.use_8bit:
106
- model_kwargs["torch_dtype"] = self.torch_dtype
107
- model_kwargs["device_map"] = self.device if self.device == "cuda" else "cpu"
108
-
109
- # Enable flash attention if requested and available (GPU only)
110
- if self.use_flash_attention and self.device == "cuda" and not self.use_8bit:
111
- model_kwargs["attn_implementation"] = "flash_attention_2"
112
-
113
- self.model = await asyncio.to_thread(
114
- AutoModelForCausalLM.from_pretrained,
115
- self.model_id,
116
- **model_kwargs
117
- )
118
-
119
- # Ensure cache is enabled on model config
120
- if hasattr(self.model.config, 'use_cache'):
121
- self.model.config.use_cache = self.use_cache
122
-
123
- self._initialized = True
124
- print(f"[{self.name}] Model loaded successfully (use_cache={self.use_cache})")
125
-
126
- except Exception as e:
127
- print(f"[{self.name}] Failed to load model: {e}")
128
- raise
129
-
130
- async def generate(
131
- self,
132
- prompt: str = None,
133
- chat_messages: List[Dict[str, str]] = None,
134
- max_new_tokens: int = 150,
135
- temperature: float = 0.7,
136
- top_p: float = 0.9,
137
- **kwargs
138
- ) -> str:
139
- """
140
- Generate text using direct model.generate() with proper KV caching.
141
-
142
- KV Cache Impact (with proper implementation):
143
- - WITH: ~9 seconds for 10 ads (50 gaps)
144
- - WITHOUT: ~42 seconds (4.7x slower)
145
- """
146
-
147
- if not self._initialized or self.model is None:
148
- raise RuntimeError(f"[{self.name}] Model not initialized")
149
-
150
- formatted_prompt = None
151
-
152
- # Format prompt from chat messages
153
- if chat_messages:
154
- try:
155
- formatted_prompt = self.tokenizer.apply_chat_template(
156
- chat_messages,
157
- tokenize=False,
158
- add_generation_prompt=True
159
- )
160
- except Exception as e:
161
- print(f"[{self.name}] apply_chat_template failed: {e}, using fallback")
162
- formatted_prompt = self._format_chat_fallback(chat_messages)
163
-
164
- # Use raw prompt if provided
165
- if formatted_prompt is None and prompt:
166
- formatted_prompt = prompt
167
-
168
- if formatted_prompt is None:
169
- raise ValueError("Either prompt or chat_messages required")
170
-
171
- # Tokenize input
172
- inputs = await asyncio.to_thread(
173
- self.tokenizer.encode,
174
- formatted_prompt,
175
- return_tensors="pt"
176
- )
177
-
178
- # Move to device
179
- if self.device == "cuda":
180
- inputs = await asyncio.to_thread(lambda: inputs.to("cuda"))
181
-
182
- # Generate with explicit KV cache
183
- outputs = await asyncio.to_thread(
184
- self.model.generate,
185
- inputs,
186
- max_new_tokens=max_new_tokens,
187
- do_sample=True,
188
- temperature=temperature,
189
- top_p=top_p,
190
- use_cache=True, # CRITICAL: Enable KV cache
191
- eos_token_id=self.tokenizer.eos_token_id,
192
- pad_token_id=self.tokenizer.eos_token_id if self.tokenizer.pad_token_id is None else self.tokenizer.pad_token_id,
193
- )
194
-
195
- # Decode output
196
- output_text = await asyncio.to_thread(
197
- self.tokenizer.decode,
198
- outputs[0],
199
- skip_special_tokens=True
200
- )
201
-
202
- # Remove prompt from output
203
- if output_text.startswith(formatted_prompt):
204
- response = output_text[len(formatted_prompt):]
205
- else:
206
- response = output_text
207
-
208
- # Clean up special tokens
209
- for token in ["<|im_end|>", "<end_of_turn>", "<eos>", "</s>"]:
210
- if response.endswith(token):
211
- response = response[:-len(token)]
212
-
213
- return response.strip()
214
-
215
- def _format_chat_fallback(self, chat_messages: List[Dict[str, str]]) -> str:
216
- """
217
- Fallback chat formatting for models without proper chat template.
218
- Works with Gemma and other models.
219
- """
220
- formatted = ""
221
- for msg in chat_messages:
222
- role = msg.get("role", "user")
223
- content = msg.get("content", "")
224
-
225
- if role == "system":
226
- formatted += f"{content}\n\n"
227
- elif role == "user":
228
- formatted += f"User: {content}\n"
229
- elif role == "assistant":
230
- formatted += f"Assistant: {content}\n"
231
-
232
- # Add generation prompt
233
- formatted += "Assistant:"
234
- return formatted
235
-
236
- def get_info(self) -> Dict[str, Any]:
237
- """Return model info."""
238
- return {
239
- "name": self.name,
240
- "model_id": self.model_id,
241
- "type": "local",
242
- "initialized": self._initialized,
243
- "device": self.device
244
- }
245
-
246
- async def cleanup(self) -> None:
247
- """Release model from memory."""
248
- if self.pipeline is not None:
249
- del self.pipeline
250
- self.pipeline = None
251
- if self.tokenizer is not None:
252
- del self.tokenizer
253
- self.tokenizer = None
254
- self._initialized = False
255
-
256
- # Force CUDA cache clear if available
257
- if torch.cuda.is_available():
258
- torch.cuda.empty_cache()
259
-
260
- print(f"[{self.name}] Model unloaded from memory")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/models/huggingface_service.py DELETED
@@ -1,111 +0,0 @@
1
- from transformers import pipeline, AutoTokenizer
2
- import torch
3
- from fastapi import HTTPException
4
- import asyncio
5
-
6
- class HuggingFaceTextGenerationService:
7
- def __init__(self, model_name_or_path: str, device: str = None, task: str = "text-generation"):
8
- self.model_name_or_path = model_name_or_path
9
- self.task = task
10
- self.pipeline = None
11
- self.tokenizer = None
12
-
13
- if device is None:
14
- self.device_index = 0 if torch.cuda.is_available() else -1
15
- elif device == "cuda" and torch.cuda.is_available():
16
- self.device_index = 0
17
- elif device == "cpu":
18
- self.device_index = -1
19
- else:
20
- self.device_index = -1
21
-
22
- if self.device_index == 0:
23
- print("CUDA (GPU) is available. Using GPU.")
24
- else:
25
- print(f"Device set to use {'cpu' if self.device_index == -1 else f'cuda:{self.device_index}'}")
26
-
27
-
28
- async def initialize(self):
29
- try:
30
- print(f"Initializing Hugging Face pipeline for model: {self.model_name_or_path} on device index: {self.device_index}")
31
- self.tokenizer = await asyncio.to_thread(
32
- AutoTokenizer.from_pretrained, self.model_name_or_path, trust_remote_code=True
33
- )
34
- self.pipeline = await asyncio.to_thread(
35
- pipeline,
36
- self.task,
37
- model=self.model_name_or_path,
38
- tokenizer=self.tokenizer,
39
- device=self.device_index,
40
- torch_dtype=torch.bfloat16 if self.device_index != -1 and torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float32,
41
- trust_remote_code=True,
42
- )
43
- print(f"Pipeline for model {self.model_name_or_path} initialized successfully.")
44
- except Exception as e:
45
- print(f"Error initializing HuggingFace pipeline: {e}")
46
- raise HTTPException(status_code=503, detail=f"LLM (HuggingFace) model could not be loaded: {str(e)}")
47
-
48
- async def generate_text(self, prompt_text: str = None, chat_template_messages: list = None, max_new_tokens: int = 250, temperature: float = 0.7, top_p: float = 0.9, do_sample: bool = True, **kwargs) -> str:
49
- if not self.pipeline or not self.tokenizer:
50
- raise Exception("Pipeline is not initialized. Call initialize() first.")
51
-
52
- formatted_prompt_input = ""
53
- if chat_template_messages:
54
- try:
55
- formatted_prompt_input = self.tokenizer.apply_chat_template(
56
- chat_template_messages,
57
- tokenize=False,
58
- add_generation_prompt=True
59
- )
60
- except Exception as e:
61
- print(f"Could not apply chat template, falling back to raw prompt if available. Error: {e}")
62
- if prompt_text:
63
- formatted_prompt_input = prompt_text
64
- else:
65
- raise ValueError("Cannot generate text without a valid prompt or chat_template_messages.")
66
- elif prompt_text:
67
- formatted_prompt_input = prompt_text
68
- else:
69
- raise ValueError("Either prompt_text or chat_template_messages must be provided.")
70
-
71
- try:
72
- generated_outputs = await asyncio.to_thread(
73
- self.pipeline,
74
- formatted_prompt_input,
75
- max_new_tokens=max_new_tokens,
76
- do_sample=do_sample,
77
- temperature=temperature,
78
- top_p=top_p,
79
- eos_token_id=self.tokenizer.eos_token_id,
80
- pad_token_id=self.tokenizer.eos_token_id if self.tokenizer.pad_token_id is None else self.tokenizer.pad_token_id, # Common setting
81
- **kwargs
82
- )
83
-
84
- if generated_outputs and isinstance(generated_outputs, list) and "generated_text" in generated_outputs[0]:
85
- full_generated_sequence = generated_outputs[0]["generated_text"]
86
-
87
- assistant_response = ""
88
- if full_generated_sequence.startswith(formatted_prompt_input):
89
- assistant_response = full_generated_sequence[len(formatted_prompt_input):]
90
- else:
91
- assistant_marker = "<|im_start|>assistant\n"
92
- last_marker_pos = full_generated_sequence.rfind(assistant_marker)
93
- if last_marker_pos != -1:
94
- assistant_response = full_generated_sequence[last_marker_pos + len(assistant_marker):]
95
- print("Warning: Used fallback parsing for assistant response.")
96
- else:
97
- print("Error: Could not isolate assistant response from the full generated sequence.")
98
- assistant_response = full_generated_sequence
99
-
100
- if assistant_response.endswith("<|im_end|>"):
101
- assistant_response = assistant_response[:-len("<|im_end|>")]
102
-
103
- return assistant_response.strip()
104
- else:
105
- print(f"Unexpected output format from pipeline: {generated_outputs}")
106
- return "Error: Could not parse generated text from pipeline output."
107
-
108
- except Exception as e:
109
- print(f"Error during text generation with {self.model_name_or_path}: {e}")
110
- raise HTTPException(status_code=500, detail=f"Error generating text: {str(e)}")
111
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/models/registry.py DELETED
@@ -1,211 +0,0 @@
1
- """
2
- Model Registry - Central configuration and factory for all LLM models.
3
- Supports lazy loading and on/off mechanism for memory management.
4
- """
5
-
6
- import os
7
- import gc
8
- from typing import Dict, List, Any, Optional
9
-
10
- from app.models.base_llm import BaseLLM
11
- from app.models.huggingface_local import HuggingFaceLocal
12
- from app.models.huggingface_inference_api import HuggingFaceInferenceAPI
13
-
14
-
15
- # Model configuration - 3 local + 1 API for Polish language comparison
16
- MODEL_CONFIG = {
17
- "bielik-1.5b": {
18
- "id": "speakleash/Bielik-1.5B-v3.0-Instruct",
19
- "local_path": "bielik-1.5b",
20
- "type": "local",
21
- "polish_support": "excellent",
22
- "size": "1.5B",
23
- },
24
- "qwen2.5-3b": {
25
- "id": "Qwen/Qwen2.5-3B-Instruct",
26
- "local_path": "qwen2.5-3b",
27
- "type": "local",
28
- "polish_support": "good",
29
- "size": "3B",
30
- },
31
- "gemma-2-2b": {
32
- "id": "google/gemma-2-2b-it",
33
- "local_path": "gemma-2-2b",
34
- "type": "local",
35
- "polish_support": "medium",
36
- "size": "2B",
37
- },
38
- "pllum-12b": {
39
- "id": "CYFRAGOVPL/PLLuM-12B-instruct",
40
- "type": "inference_api",
41
- "polish_support": "excellent",
42
- "size": "12B",
43
- },
44
- }
45
-
46
- # Base path for pre-downloaded models in container
47
- LOCAL_MODEL_BASE = os.getenv("MODEL_DIR", "/app/pretrain_model")
48
-
49
-
50
- class ModelRegistry:
51
- """
52
- Central registry for managing all LLM models.
53
- Supports lazy loading (load on first request) and unloading for memory management.
54
- Only one local model is loaded at a time to conserve memory.
55
- """
56
-
57
- def __init__(self):
58
- self._models: Dict[str, BaseLLM] = {}
59
- self._config = MODEL_CONFIG.copy()
60
- self._active_local_model: Optional[str] = None
61
-
62
- def _create_model(self, name: str) -> BaseLLM:
63
- """Factory method to create model instance."""
64
-
65
- if name not in self._config:
66
- raise ValueError(f"Unknown model: {name}")
67
-
68
- config = self._config[name]
69
- model_type = config["type"]
70
- model_id = config["id"]
71
-
72
- # For local models, check if pre-downloaded version exists
73
- if model_type == "local" and "local_path" in config:
74
- local_path = os.path.join(LOCAL_MODEL_BASE, config["local_path"])
75
- if os.path.exists(local_path):
76
- print(f"Using pre-downloaded model at: {local_path}")
77
- model_id = local_path
78
- else:
79
- print(f"Pre-downloaded model not found at {local_path}, will download from HuggingFace")
80
-
81
- if model_type == "local":
82
- return HuggingFaceLocal(
83
- name=name,
84
- model_id=model_id,
85
- device="cpu"
86
- )
87
- elif model_type == "inference_api":
88
- return HuggingFaceInferenceAPI(
89
- name=name,
90
- model_id=model_id
91
- )
92
- else:
93
- raise ValueError(f"Unknown model type: {model_type}")
94
-
95
- async def _unload_model(self, name: str) -> None:
96
- """Unload a model from memory."""
97
- if name in self._models:
98
- model = self._models[name]
99
- # Call cleanup if available
100
- if hasattr(model, 'cleanup'):
101
- await model.cleanup()
102
- del self._models[name]
103
- gc.collect() # Force garbage collection
104
- print(f"Model '{name}' unloaded from memory.")
105
-
106
- async def _unload_all_local_models(self) -> None:
107
- """Unload all local models to free memory."""
108
- local_models = [
109
- name for name, config in self._config.items()
110
- if config["type"] == "local" and name in self._models
111
- ]
112
- for name in local_models:
113
- await self._unload_model(name)
114
- self._active_local_model = None
115
-
116
- async def get_model(self, name: str) -> BaseLLM:
117
- """
118
- Get a model (lazy loading).
119
- For local models: unloads any previously loaded local model first.
120
- For API models: always available without affecting local models.
121
- """
122
- if name not in self._config:
123
- raise ValueError(f"Unknown model: {name}")
124
-
125
- config = self._config[name]
126
-
127
- # If it's a local model, ensure only one is loaded at a time
128
- if config["type"] == "local":
129
- # Unload current local model if different
130
- if self._active_local_model and self._active_local_model != name:
131
- print(f"Switching from '{self._active_local_model}' to '{name}'...")
132
- await self._unload_model(self._active_local_model)
133
-
134
- # Load the requested model if not already loaded
135
- if name not in self._models:
136
- print(f"Loading model '{name}'...")
137
- model = self._create_model(name)
138
- await model.initialize()
139
- self._models[name] = model
140
- self._active_local_model = name
141
- print(f"Model '{name}' loaded successfully.")
142
-
143
- # For API models, just create/return (no memory concern)
144
- elif config["type"] == "inference_api":
145
- if name not in self._models:
146
- print(f"Initializing API model '{name}'...")
147
- model = self._create_model(name)
148
- await model.initialize()
149
- self._models[name] = model
150
-
151
- return self._models[name]
152
-
153
- async def load_model(self, name: str) -> Dict[str, Any]:
154
- """
155
- Explicitly load a model (unloads other local models first).
156
- Returns model info.
157
- """
158
- await self.get_model(name)
159
- return self.get_model_info(name)
160
-
161
- async def unload_model(self, name: str) -> Dict[str, str]:
162
- """
163
- Explicitly unload a model from memory.
164
- """
165
- if name not in self._config:
166
- raise ValueError(f"Unknown model: {name}")
167
-
168
- if name not in self._models:
169
- return {"status": "not_loaded", "model": name}
170
-
171
- await self._unload_model(name)
172
- if self._active_local_model == name:
173
- self._active_local_model = None
174
-
175
- return {"status": "unloaded", "model": name}
176
-
177
- def get_model_info(self, name: str) -> Dict[str, Any]:
178
- """Get info about a specific model."""
179
- if name not in self._config:
180
- raise ValueError(f"Unknown model: {name}")
181
-
182
- config = self._config[name]
183
- return {
184
- "name": name,
185
- "model_id": config["id"],
186
- "type": config["type"],
187
- "polish_support": config["polish_support"],
188
- "size": config["size"],
189
- "loaded": name in self._models,
190
- "active": name == self._active_local_model if config["type"] == "local" else None,
191
- }
192
-
193
- def list_models(self) -> List[Dict[str, Any]]:
194
- """List all available models with their info."""
195
- return [self.get_model_info(name) for name in self._config.keys()]
196
-
197
- def get_available_model_names(self) -> List[str]:
198
- """Get list of available model names."""
199
- return list(self._config.keys())
200
-
201
- def get_active_model(self) -> Optional[str]:
202
- """Get the currently active (loaded) local model name."""
203
- return self._active_local_model
204
-
205
- def get_loaded_models(self) -> List[str]:
206
- """Get list of currently loaded model names."""
207
- return list(self._models.keys())
208
-
209
-
210
- # Global registry instance
211
- registry = ModelRegistry()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/schemas/schemas.py DELETED
@@ -1,129 +0,0 @@
1
- from pydantic import BaseModel, Field
2
- from typing import List, Optional, Dict, Any
3
-
4
-
5
- class EnhancedDescriptionResponse(BaseModel):
6
- description: str
7
- model_used: str
8
- generation_time: float
9
- user_email: str
10
-
11
-
12
- # --- Batch Infill Schemas ---
13
-
14
- class InfillItem(BaseModel):
15
- """A single item (ad) with gaps to be filled."""
16
- id: str = Field(..., description="Unique identifier for this item")
17
- text_with_gaps: str = Field(..., description="Text containing [GAP:n] markers or ___ to fill")
18
-
19
-
20
- class InfillOptions(BaseModel):
21
- """Configuration options for infill processing."""
22
- gap_notation: str = Field(
23
- default="auto",
24
- description="Gap notation: 'auto' (detect), '[GAP:n]', or '___'"
25
- )
26
- top_n_per_gap: int = Field(
27
- default=3,
28
- ge=1,
29
- le=5,
30
- description="Number of alternative suggestions per gap (1-5)"
31
- )
32
- language: str = Field(default="pl", description="Output language (pl/en)")
33
- temperature: float = Field(default=0.6, ge=0.0, le=1.0)
34
- max_new_tokens: int = Field(default=256, ge=50, le=512)
35
-
36
-
37
- class GapFill(BaseModel):
38
- """Result for a single filled gap."""
39
- index: int = Field(..., description="Gap index (1-based)")
40
- marker: str = Field(..., description="Original marker (e.g., '[GAP:1]' or '___')")
41
- choice: str = Field(..., description="Selected fill word/phrase")
42
- alternatives: List[str] = Field(
43
- default_factory=list,
44
- description="Alternative suggestions"
45
- )
46
-
47
-
48
- class InfillResult(BaseModel):
49
- """Result for a single infill item."""
50
- id: str
51
- status: str = Field(..., description="'ok' or 'error'")
52
- filled_text: Optional[str] = Field(None, description="Text with gaps filled")
53
- gaps: List[GapFill] = Field(default_factory=list)
54
- error: Optional[str] = Field(None, description="Error message if status='error'")
55
-
56
-
57
- class InfillRequest(BaseModel):
58
- """Request for single-model batch infill."""
59
- domain: str = Field(..., description="Domain name (e.g., 'cars')")
60
- items: List[InfillItem] = Field(..., description="Batch of items to process")
61
- model: str = Field(default="bielik-1.5b", description="Model to use")
62
- options: InfillOptions = Field(default_factory=InfillOptions)
63
-
64
-
65
- class InfillResponse(BaseModel):
66
- """Response for single-model batch infill."""
67
- model: str
68
- results: List[InfillResult]
69
- total_time: float
70
- processed_count: int
71
- error_count: int
72
-
73
-
74
- class CompareInfillRequest(BaseModel):
75
- """Request for multi-model batch infill comparison."""
76
- domain: str
77
- items: List[InfillItem]
78
- models: Optional[List[str]] = Field(
79
- None,
80
- description="Models to compare. If None, use all available."
81
- )
82
- options: InfillOptions = Field(default_factory=InfillOptions)
83
-
84
-
85
- class ModelInfillResult(BaseModel):
86
- """Per-model results in comparison."""
87
- model: str
88
- type: str
89
- results: List[InfillResult]
90
- time: float
91
- error_count: int
92
-
93
-
94
- class CompareInfillResponse(BaseModel):
95
- """Response for multi-model batch infill comparison."""
96
- domain: str
97
- models: List[ModelInfillResult]
98
- total_time: float
99
-
100
-
101
- class ModelInfo(BaseModel):
102
- name: str
103
- model_id: str
104
- type: str
105
- polish_support: str
106
- size: str
107
- loaded: bool
108
- active: Optional[bool] = None # Only for local models
109
-
110
-
111
- class CompareRequest(BaseModel):
112
- domain: str
113
- data: Dict[str, Any]
114
- models: Optional[List[str]] = None # If None, use all models
115
-
116
-
117
- class ModelResult(BaseModel):
118
- model: str
119
- output: str
120
- time: float
121
- type: str
122
- error: Optional[str] = None
123
-
124
-
125
- class CompareResponse(BaseModel):
126
- domain: str
127
- results: List[ModelResult]
128
- total_time: float
129
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt DELETED
@@ -1,10 +0,0 @@
1
- fastapi==0.104.1
2
- uvicorn[standard]==0.24.0
3
- transformers==4.36.2
4
- accelerate==0.25.0
5
- huggingface_hub==0.19.4
6
- torch>=2.1.0
7
- pydantic==2.5.0
8
- # bitsandbytes is optional for 8-bit quantization (CPU optimization)
9
- # Uncomment below if bitsandbytes is available on your system:
10
- # bitsandbytes==0.49.0
 
 
 
 
 
 
 
 
 
 
 
start_container.ps1 DELETED
@@ -1,23 +0,0 @@
1
- # PowerShell script to build and run the Docker container for your FastAPI service
2
-
3
- # Set variables
4
- $imageName = "bielik-fastapi-service"
5
- $containerName = "bielik_app_instance"
6
- $tokenFile = "my_hf_token.txt"
7
-
8
- Write-Host "Building Docker image..."
9
- docker build --secret id=huggingface_token,src=$tokenFile -t $imageName .
10
-
11
- Write-Host "Stopping and removing any existing container named $containerName..."
12
- docker stop $containerName | Out-Null 2>&1
13
-
14
- docker rm $containerName | Out-Null 2>&1
15
-
16
- Write-Host "Running new container..."
17
- docker run -d --name $containerName -p 8000:8000 $imageName
18
-
19
- Write-Host ""
20
- Write-Host "$containerName should be starting up."
21
- Write-Host "You can view logs with: docker logs $containerName -f"
22
- Write-Host "To stop the container, run: docker stop $containerName"
23
- Write-Host "The service will be available at http://127.0.0.1:8000"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
start_container.sh DELETED
@@ -1,25 +0,0 @@
1
- #!/bin/bash
2
-
3
- IMAGE_NAME="bielik-fastapi-service"
4
- CONTAINER_NAME="bielik_app_instance"
5
- TOKEN_FILE="my_hf_token.txt"
6
-
7
- # Build the Docker image with Hugging Face token as a secret
8
- echo "Building Docker image..."
9
- DOCKER_BUILDKIT=1 docker build --secret id=huggingface_token,src=$TOKEN_FILE -t $IMAGE_NAME .
10
-
11
- echo "Attempting to stop and remove existing container named $CONTAINER_NAME (if any)..."
12
- docker stop $CONTAINER_NAME > /dev/null 2>&1 || true # Stop if running, ignore error if not
13
- docker rm $CONTAINER_NAME > /dev/null 2>&1 || true # Remove if exists, ignore error if not
14
-
15
- echo "Starting new $IMAGE_NAME container as $CONTAINER_NAME..."
16
- docker run -d --name $CONTAINER_NAME -p 8000:8000 $IMAGE_NAME
17
- # -d : Runs the container in detached mode (in the background)
18
- # --name : Assigns a specific name to your running container instance
19
- # -p 8000:8000 : Maps port 8000 on your host to port 8000 in the container
20
-
21
- echo ""
22
- echo "$CONTAINER_NAME should be starting up."
23
- echo "You can view logs with: docker logs $CONTAINER_NAME -f"
24
- echo "To stop the container, run: docker stop $CONTAINER_NAME"
25
- echo "The service will be available at http://127.0.0.1:8000"