bielik_app_service / README.md
Patryk Studzinski
Fix: Handle function-call style
093fabc
---
title: Bielik App Service
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---
# Bielik App Service
Multi-model LLM service for description enhancement, batch gap-filling, and A/B testing.
## Overview
This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports:
- **Description Enhancement**: Generate marketing descriptions from structured data
- **Batch Infill**: Fill gaps (`[GAP:n]` or `___`) in ad texts with natural words
- **Multi-Model Comparison**: Compare outputs across different models for A/B testing
## Models
| Model | Size | Polish Support | Type |
|-------|------|----------------|------|
| Bielik-1.5B | 1.5B | Excellent | Local |
| Qwen2.5-3B | 3B | Good | Local |
| Gemma-2-2B | 2B | Medium | Local |
| PLLuM-12B | 12B | Excellent | API |
## API Endpoints
### Health & Info
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/` | Welcome message |
| `GET` | `/health` | API health check and model status |
| `GET` | `/models` | List all available models |
### Model Management (Lazy Loading)
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/models/{name}/load` | Load a model into memory |
| `POST` | `/models/{name}/unload` | Unload a model from memory |
### Description Generation
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/enhance-description` | Generate description with single model |
| `POST` | `/compare` | Compare outputs from multiple models |
### Batch Infill (Gap-Filling)
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/infill` | Batch gap-filling with single model |
| `POST` | `/compare-infill` | Compare gap-filling across multiple models |
---
## Lazy Loading
Models are **not loaded at startup** to conserve memory. Instead:
- Models are loaded **on first request** (lazy loading)
- Only **one local model** is loaded at a time
- Switching to a different local model **automatically unloads** the previous one
- API models (PLLuM) don't affect local model memory
### Example: Load/Unload Flow
```
1. Request with bielik-1.5b → Loads Bielik (first use)
2. Request with qwen2.5-3b → Unloads Bielik, loads Qwen
3. Request with pllum-12b → Qwen stays loaded (API model doesn't affect local)
4. POST /models/qwen2.5-3b/unload → Manually free memory
```
---
## Endpoint Details
### `GET /health`
Check API status and loaded models.
**Response:**
```json
{
"status": "ok",
"available_models": 4,
"loaded_models": ["bielik-1.5b"],
"active_local_model": "bielik-1.5b"
}
```
---
### `GET /models`
List all available models with their load status.
**Response:**
```json
[
{
"name": "bielik-1.5b",
"model_id": "speakleash/Bielik-1.5B-v3.0-Instruct",
"type": "local",
"polish_support": "excellent",
"size": "1.5B",
"loaded": true,
"active": true
},
{
"name": "qwen2.5-3b",
"model_id": "Qwen/Qwen2.5-3B-Instruct",
"type": "local",
"polish_support": "good",
"size": "3B",
"loaded": false,
"active": false
}
]
```
---
### `POST /models/{name}/load`
Explicitly load a model. For local models, unloads the previous one first.
**Response:**
```json
{
"status": "loaded",
"model": {
"name": "bielik-1.5b",
"loaded": true,
"active": true
}
}
```
---
### `POST /models/{name}/unload`
Explicitly unload a model to free memory.
**Response:**
```json
{
"status": "unloaded",
"model": "bielik-1.5b"
}
```
---
### `POST /enhance-description`
Generate enhanced description using a single model.
**Request:**
```json
{
"domain": "cars",
"data": {
"make": "BMW",
"model": "320i",
"year": 2020,
"mileage": 45000,
"features": ["nawigacja", "klimatyzacja"],
"condition": "bardzo dobry"
},
"model": "bielik-1.5b"
}
```
**Response:**
```json
{
"description": "Generated description text...",
"model_used": "speakleash/Bielik-1.5B-v3.0-Instruct",
"generation_time": 2.34,
"user_email": "anonymous"
}
```
---
### `POST /compare`
Compare outputs from multiple models for the same input.
**Request:**
```json
{
"domain": "cars",
"data": {
"make": "BMW",
"model": "320i",
"year": 2020,
"mileage": 45000,
"features": ["nawigacja", "klimatyzacja"],
"condition": "bardzo dobry"
},
"models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
}
```
**Response:**
```json
{
"domain": "cars",
"results": [
{
"model": "bielik-1.5b",
"output": "Generated text from Bielik...",
"time": 2.3,
"type": "local",
"error": null
},
{
"model": "pllum-12b",
"output": "Generated text from PLLuM...",
"time": 1.1,
"type": "inference_api",
"error": null
}
],
"total_time": 5.67
}
```
---
### `POST /infill`
Batch gap-filling for ads using a single model. Accepts texts with `[GAP:n]` markers or `___` and returns filled text with per-gap choices and alternatives.
**Gap Notation:**
- `[GAP:1]`, `[GAP:2]`, ... → Explicit numbered gaps (preferred)
- `___` → Auto-numbered in scan order
**Request:**
```json
{
"domain": "cars",
"items": [
{
"id": "ad1",
"text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie technicznym"
},
{
"id": "ad2",
"text_with_gaps": "Auto ma ___ km przebiegu i ___ lakier"
}
],
"model": "bielik-1.5b",
"options": {
"top_n_per_gap": 3,
"language": "pl",
"temperature": 0.6
}
}
```
**Response:**
```json
{
"model": "bielik-1.5b",
"results": [
{
"id": "ad1",
"status": "ok",
"filled_text": "Sprzedam eleganckie BMW w doskonałym stanie technicznym",
"gaps": [
{
"index": 1,
"marker": "[GAP:1]",
"choice": "eleganckie",
"alternatives": ["piękne", "zadbane"]
},
{
"index": 2,
"marker": "[GAP:2]",
"choice": "doskonałym",
"alternatives": ["bardzo dobrym", "idealnym"]
}
],
"error": null
}
],
"total_time": 3.45,
"processed_count": 2,
"error_count": 0
}
```
**Options:**
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `gap_notation` | string | `"auto"` | `"auto"`, `"[GAP:n]"`, or `"___"` |
| `top_n_per_gap` | int | `3` | Alternatives per gap (1-5) |
| `language` | string | `"pl"` | Output language |
| `temperature` | float | `0.6` | Generation temperature (0-1) |
| `max_new_tokens` | int | `256` | Max tokens to generate |
---
### `POST /compare-infill`
Multi-model batch gap-filling comparison for A/B testing.
**Request:**
```json
{
"domain": "cars",
"items": [
{
"id": "ad1",
"text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie"
}
],
"models": ["bielik-1.5b", "qwen2.5-3b", "pllum-12b"],
"options": {
"top_n_per_gap": 3
}
}
```
**Response:**
```json
{
"domain": "cars",
"models": [
{
"model": "bielik-1.5b",
"type": "local",
"results": [...],
"time": 2.1,
"error_count": 0
},
{
"model": "qwen2.5-3b",
"type": "local",
"results": [...],
"time": 1.8,
"error_count": 0
}
],
"total_time": 5.2
}
```
---
## Domains
Currently supported domains:
| Domain | Schema Fields |
|--------|---------------|
| `cars` | `make`, `model`, `year`, `mileage`, `features[]`, `condition` |
---
## Environment Variables
| Variable | Description | Required |
|----------|-------------|----------|
| `HF_TOKEN` | HuggingFace API token for Inference API | Yes (for API models) |
| `LOCAL_MODEL_PATH` | Path to pre-downloaded local model | No (default: `/app/pretrain_model`) |
| `FRONTEND_URL` | Frontend URL for CORS | No |
## Running Locally
```bash
# Install dependencies
pip install -r requirements.txt
# Run server
uvicorn app.main:app --reload --port 8000
```
## Docker
```bash
# Build and run
./start_container.ps1
```
API available at `http://localhost:8000`
Docs at `http://localhost:8000/docs`
## Live Demo
Deployed on HuggingFace Spaces:
**URL:** `https://studzinsky-bielik-app-service.hf.space`
**Quick Test:**
```bash
# Health check
curl https://studzinsky-bielik-app-service.hf.space/health
# List models
curl https://studzinsky-bielik-app-service.hf.space/models
```