Spaces:
Running
Running
| title: Bielik App Service | |
| emoji: 🤖 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # Bielik App Service | |
| Multi-model LLM service for description enhancement, batch gap-filling, and A/B testing. | |
| ## Overview | |
| This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports: | |
| - **Description Enhancement**: Generate marketing descriptions from structured data | |
| - **Batch Infill**: Fill gaps (`[GAP:n]` or `___`) in ad texts with natural words | |
| - **Multi-Model Comparison**: Compare outputs across different models for A/B testing | |
| ## Models | |
| | Model | Size | Polish Support | Type | | |
| |-------|------|----------------|------| | |
| | Bielik-1.5B | 1.5B | Excellent | Local | | |
| | Qwen2.5-3B | 3B | Good | Local | | |
| | Gemma-2-2B | 2B | Medium | Local | | |
| | PLLuM-12B | 12B | Excellent | API | | |
| ## API Endpoints | |
| ### Health & Info | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `GET` | `/` | Welcome message | | |
| | `GET` | `/health` | API health check and model status | | |
| | `GET` | `/models` | List all available models | | |
| ### Model Management (Lazy Loading) | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `POST` | `/models/{name}/load` | Load a model into memory | | |
| | `POST` | `/models/{name}/unload` | Unload a model from memory | | |
| ### Description Generation | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `POST` | `/enhance-description` | Generate description with single model | | |
| | `POST` | `/compare` | Compare outputs from multiple models | | |
| ### Batch Infill (Gap-Filling) | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `POST` | `/infill` | Batch gap-filling with single model | | |
| | `POST` | `/compare-infill` | Compare gap-filling across multiple models | | |
| --- | |
| ## Lazy Loading | |
| Models are **not loaded at startup** to conserve memory. Instead: | |
| - Models are loaded **on first request** (lazy loading) | |
| - Only **one local model** is loaded at a time | |
| - Switching to a different local model **automatically unloads** the previous one | |
| - API models (PLLuM) don't affect local model memory | |
| ### Example: Load/Unload Flow | |
| ``` | |
| 1. Request with bielik-1.5b → Loads Bielik (first use) | |
| 2. Request with qwen2.5-3b → Unloads Bielik, loads Qwen | |
| 3. Request with pllum-12b → Qwen stays loaded (API model doesn't affect local) | |
| 4. POST /models/qwen2.5-3b/unload → Manually free memory | |
| ``` | |
| --- | |
| ## Endpoint Details | |
| ### `GET /health` | |
| Check API status and loaded models. | |
| **Response:** | |
| ```json | |
| { | |
| "status": "ok", | |
| "available_models": 4, | |
| "loaded_models": ["bielik-1.5b"], | |
| "active_local_model": "bielik-1.5b" | |
| } | |
| ``` | |
| --- | |
| ### `GET /models` | |
| List all available models with their load status. | |
| **Response:** | |
| ```json | |
| [ | |
| { | |
| "name": "bielik-1.5b", | |
| "model_id": "speakleash/Bielik-1.5B-v3.0-Instruct", | |
| "type": "local", | |
| "polish_support": "excellent", | |
| "size": "1.5B", | |
| "loaded": true, | |
| "active": true | |
| }, | |
| { | |
| "name": "qwen2.5-3b", | |
| "model_id": "Qwen/Qwen2.5-3B-Instruct", | |
| "type": "local", | |
| "polish_support": "good", | |
| "size": "3B", | |
| "loaded": false, | |
| "active": false | |
| } | |
| ] | |
| ``` | |
| --- | |
| ### `POST /models/{name}/load` | |
| Explicitly load a model. For local models, unloads the previous one first. | |
| **Response:** | |
| ```json | |
| { | |
| "status": "loaded", | |
| "model": { | |
| "name": "bielik-1.5b", | |
| "loaded": true, | |
| "active": true | |
| } | |
| } | |
| ``` | |
| --- | |
| ### `POST /models/{name}/unload` | |
| Explicitly unload a model to free memory. | |
| **Response:** | |
| ```json | |
| { | |
| "status": "unloaded", | |
| "model": "bielik-1.5b" | |
| } | |
| ``` | |
| --- | |
| ### `POST /enhance-description` | |
| Generate enhanced description using a single model. | |
| **Request:** | |
| ```json | |
| { | |
| "domain": "cars", | |
| "data": { | |
| "make": "BMW", | |
| "model": "320i", | |
| "year": 2020, | |
| "mileage": 45000, | |
| "features": ["nawigacja", "klimatyzacja"], | |
| "condition": "bardzo dobry" | |
| }, | |
| "model": "bielik-1.5b" | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "description": "Generated description text...", | |
| "model_used": "speakleash/Bielik-1.5B-v3.0-Instruct", | |
| "generation_time": 2.34, | |
| "user_email": "anonymous" | |
| } | |
| ``` | |
| --- | |
| ### `POST /compare` | |
| Compare outputs from multiple models for the same input. | |
| **Request:** | |
| ```json | |
| { | |
| "domain": "cars", | |
| "data": { | |
| "make": "BMW", | |
| "model": "320i", | |
| "year": 2020, | |
| "mileage": 45000, | |
| "features": ["nawigacja", "klimatyzacja"], | |
| "condition": "bardzo dobry" | |
| }, | |
| "models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"] | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "domain": "cars", | |
| "results": [ | |
| { | |
| "model": "bielik-1.5b", | |
| "output": "Generated text from Bielik...", | |
| "time": 2.3, | |
| "type": "local", | |
| "error": null | |
| }, | |
| { | |
| "model": "pllum-12b", | |
| "output": "Generated text from PLLuM...", | |
| "time": 1.1, | |
| "type": "inference_api", | |
| "error": null | |
| } | |
| ], | |
| "total_time": 5.67 | |
| } | |
| ``` | |
| --- | |
| ### `POST /infill` | |
| Batch gap-filling for ads using a single model. Accepts texts with `[GAP:n]` markers or `___` and returns filled text with per-gap choices and alternatives. | |
| **Gap Notation:** | |
| - `[GAP:1]`, `[GAP:2]`, ... → Explicit numbered gaps (preferred) | |
| - `___` → Auto-numbered in scan order | |
| **Request:** | |
| ```json | |
| { | |
| "domain": "cars", | |
| "items": [ | |
| { | |
| "id": "ad1", | |
| "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie technicznym" | |
| }, | |
| { | |
| "id": "ad2", | |
| "text_with_gaps": "Auto ma ___ km przebiegu i ___ lakier" | |
| } | |
| ], | |
| "model": "bielik-1.5b", | |
| "options": { | |
| "top_n_per_gap": 3, | |
| "language": "pl", | |
| "temperature": 0.6 | |
| } | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "model": "bielik-1.5b", | |
| "results": [ | |
| { | |
| "id": "ad1", | |
| "status": "ok", | |
| "filled_text": "Sprzedam eleganckie BMW w doskonałym stanie technicznym", | |
| "gaps": [ | |
| { | |
| "index": 1, | |
| "marker": "[GAP:1]", | |
| "choice": "eleganckie", | |
| "alternatives": ["piękne", "zadbane"] | |
| }, | |
| { | |
| "index": 2, | |
| "marker": "[GAP:2]", | |
| "choice": "doskonałym", | |
| "alternatives": ["bardzo dobrym", "idealnym"] | |
| } | |
| ], | |
| "error": null | |
| } | |
| ], | |
| "total_time": 3.45, | |
| "processed_count": 2, | |
| "error_count": 0 | |
| } | |
| ``` | |
| **Options:** | |
| | Field | Type | Default | Description | | |
| |-------|------|---------|-------------| | |
| | `gap_notation` | string | `"auto"` | `"auto"`, `"[GAP:n]"`, or `"___"` | | |
| | `top_n_per_gap` | int | `3` | Alternatives per gap (1-5) | | |
| | `language` | string | `"pl"` | Output language | | |
| | `temperature` | float | `0.6` | Generation temperature (0-1) | | |
| | `max_new_tokens` | int | `256` | Max tokens to generate | | |
| --- | |
| ### `POST /compare-infill` | |
| Multi-model batch gap-filling comparison for A/B testing. | |
| **Request:** | |
| ```json | |
| { | |
| "domain": "cars", | |
| "items": [ | |
| { | |
| "id": "ad1", | |
| "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie" | |
| } | |
| ], | |
| "models": ["bielik-1.5b", "qwen2.5-3b", "pllum-12b"], | |
| "options": { | |
| "top_n_per_gap": 3 | |
| } | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "domain": "cars", | |
| "models": [ | |
| { | |
| "model": "bielik-1.5b", | |
| "type": "local", | |
| "results": [...], | |
| "time": 2.1, | |
| "error_count": 0 | |
| }, | |
| { | |
| "model": "qwen2.5-3b", | |
| "type": "local", | |
| "results": [...], | |
| "time": 1.8, | |
| "error_count": 0 | |
| } | |
| ], | |
| "total_time": 5.2 | |
| } | |
| ``` | |
| --- | |
| ## Domains | |
| Currently supported domains: | |
| | Domain | Schema Fields | | |
| |--------|---------------| | |
| | `cars` | `make`, `model`, `year`, `mileage`, `features[]`, `condition` | | |
| --- | |
| ## Environment Variables | |
| | Variable | Description | Required | | |
| |----------|-------------|----------| | |
| | `HF_TOKEN` | HuggingFace API token for Inference API | Yes (for API models) | | |
| | `LOCAL_MODEL_PATH` | Path to pre-downloaded local model | No (default: `/app/pretrain_model`) | | |
| | `FRONTEND_URL` | Frontend URL for CORS | No | | |
| ## Running Locally | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run server | |
| uvicorn app.main:app --reload --port 8000 | |
| ``` | |
| ## Docker | |
| ```bash | |
| # Build and run | |
| ./start_container.ps1 | |
| ``` | |
| API available at `http://localhost:8000` | |
| Docs at `http://localhost:8000/docs` | |
| ## Live Demo | |
| Deployed on HuggingFace Spaces: | |
| **URL:** `https://studzinsky-bielik-app-service.hf.space` | |
| **Quick Test:** | |
| ```bash | |
| # Health check | |
| curl https://studzinsky-bielik-app-service.hf.space/health | |
| # List models | |
| curl https://studzinsky-bielik-app-service.hf.space/models | |
| ``` | |