Spaces:
Running
title: Bielik App Service
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
Bielik App Service
Multi-model LLM service for description enhancement, batch gap-filling, and A/B testing.
Overview
This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports:
- Description Enhancement: Generate marketing descriptions from structured data
- Batch Infill: Fill gaps (
[GAP:n]or___) in ad texts with natural words - Multi-Model Comparison: Compare outputs across different models for A/B testing
Models
| Model | Size | Polish Support | Type |
|---|---|---|---|
| Bielik-1.5B | 1.5B | Excellent | Local |
| Qwen2.5-3B | 3B | Good | Local |
| Gemma-2-2B | 2B | Medium | Local |
| PLLuM-12B | 12B | Excellent | API |
API Endpoints
Health & Info
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Welcome message |
GET |
/health |
API health check and model status |
GET |
/models |
List all available models |
Model Management (Lazy Loading)
| Method | Endpoint | Description |
|---|---|---|
POST |
/models/{name}/load |
Load a model into memory |
POST |
/models/{name}/unload |
Unload a model from memory |
Description Generation
| Method | Endpoint | Description |
|---|---|---|
POST |
/enhance-description |
Generate description with single model |
POST |
/compare |
Compare outputs from multiple models |
Batch Infill (Gap-Filling)
| Method | Endpoint | Description |
|---|---|---|
POST |
/infill |
Batch gap-filling with single model |
POST |
/compare-infill |
Compare gap-filling across multiple models |
Lazy Loading
Models are not loaded at startup to conserve memory. Instead:
- Models are loaded on first request (lazy loading)
- Only one local model is loaded at a time
- Switching to a different local model automatically unloads the previous one
- API models (PLLuM) don't affect local model memory
Example: Load/Unload Flow
1. Request with bielik-1.5b → Loads Bielik (first use)
2. Request with qwen2.5-3b → Unloads Bielik, loads Qwen
3. Request with pllum-12b → Qwen stays loaded (API model doesn't affect local)
4. POST /models/qwen2.5-3b/unload → Manually free memory
Endpoint Details
GET /health
Check API status and loaded models.
Response:
{
"status": "ok",
"available_models": 4,
"loaded_models": ["bielik-1.5b"],
"active_local_model": "bielik-1.5b"
}
GET /models
List all available models with their load status.
Response:
[
{
"name": "bielik-1.5b",
"model_id": "speakleash/Bielik-1.5B-v3.0-Instruct",
"type": "local",
"polish_support": "excellent",
"size": "1.5B",
"loaded": true,
"active": true
},
{
"name": "qwen2.5-3b",
"model_id": "Qwen/Qwen2.5-3B-Instruct",
"type": "local",
"polish_support": "good",
"size": "3B",
"loaded": false,
"active": false
}
]
POST /models/{name}/load
Explicitly load a model. For local models, unloads the previous one first.
Response:
{
"status": "loaded",
"model": {
"name": "bielik-1.5b",
"loaded": true,
"active": true
}
}
POST /models/{name}/unload
Explicitly unload a model to free memory.
Response:
{
"status": "unloaded",
"model": "bielik-1.5b"
}
POST /enhance-description
Generate enhanced description using a single model.
Request:
{
"domain": "cars",
"data": {
"make": "BMW",
"model": "320i",
"year": 2020,
"mileage": 45000,
"features": ["nawigacja", "klimatyzacja"],
"condition": "bardzo dobry"
},
"model": "bielik-1.5b"
}
Response:
{
"description": "Generated description text...",
"model_used": "speakleash/Bielik-1.5B-v3.0-Instruct",
"generation_time": 2.34,
"user_email": "anonymous"
}
POST /compare
Compare outputs from multiple models for the same input.
Request:
{
"domain": "cars",
"data": {
"make": "BMW",
"model": "320i",
"year": 2020,
"mileage": 45000,
"features": ["nawigacja", "klimatyzacja"],
"condition": "bardzo dobry"
},
"models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
}
Response:
{
"domain": "cars",
"results": [
{
"model": "bielik-1.5b",
"output": "Generated text from Bielik...",
"time": 2.3,
"type": "local",
"error": null
},
{
"model": "pllum-12b",
"output": "Generated text from PLLuM...",
"time": 1.1,
"type": "inference_api",
"error": null
}
],
"total_time": 5.67
}
POST /infill
Batch gap-filling for ads using a single model. Accepts texts with [GAP:n] markers or ___ and returns filled text with per-gap choices and alternatives.
Gap Notation:
[GAP:1],[GAP:2], ... → Explicit numbered gaps (preferred)___→ Auto-numbered in scan order
Request:
{
"domain": "cars",
"items": [
{
"id": "ad1",
"text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie technicznym"
},
{
"id": "ad2",
"text_with_gaps": "Auto ma ___ km przebiegu i ___ lakier"
}
],
"model": "bielik-1.5b",
"options": {
"top_n_per_gap": 3,
"language": "pl",
"temperature": 0.6
}
}
Response:
{
"model": "bielik-1.5b",
"results": [
{
"id": "ad1",
"status": "ok",
"filled_text": "Sprzedam eleganckie BMW w doskonałym stanie technicznym",
"gaps": [
{
"index": 1,
"marker": "[GAP:1]",
"choice": "eleganckie",
"alternatives": ["piękne", "zadbane"]
},
{
"index": 2,
"marker": "[GAP:2]",
"choice": "doskonałym",
"alternatives": ["bardzo dobrym", "idealnym"]
}
],
"error": null
}
],
"total_time": 3.45,
"processed_count": 2,
"error_count": 0
}
Options:
| Field | Type | Default | Description |
|---|---|---|---|
gap_notation |
string | "auto" |
"auto", "[GAP:n]", or "___" |
top_n_per_gap |
int | 3 |
Alternatives per gap (1-5) |
language |
string | "pl" |
Output language |
temperature |
float | 0.6 |
Generation temperature (0-1) |
max_new_tokens |
int | 256 |
Max tokens to generate |
POST /compare-infill
Multi-model batch gap-filling comparison for A/B testing.
Request:
{
"domain": "cars",
"items": [
{
"id": "ad1",
"text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie"
}
],
"models": ["bielik-1.5b", "qwen2.5-3b", "pllum-12b"],
"options": {
"top_n_per_gap": 3
}
}
Response:
{
"domain": "cars",
"models": [
{
"model": "bielik-1.5b",
"type": "local",
"results": [...],
"time": 2.1,
"error_count": 0
},
{
"model": "qwen2.5-3b",
"type": "local",
"results": [...],
"time": 1.8,
"error_count": 0
}
],
"total_time": 5.2
}
Domains
Currently supported domains:
| Domain | Schema Fields |
|---|---|
cars |
make, model, year, mileage, features[], condition |
Environment Variables
| Variable | Description | Required |
|---|---|---|
HF_TOKEN |
HuggingFace API token for Inference API | Yes (for API models) |
LOCAL_MODEL_PATH |
Path to pre-downloaded local model | No (default: /app/pretrain_model) |
FRONTEND_URL |
Frontend URL for CORS | No |
Running Locally
# Install dependencies
pip install -r requirements.txt
# Run server
uvicorn app.main:app --reload --port 8000
Docker
# Build and run
./start_container.ps1
API available at http://localhost:8000
Docs at http://localhost:8000/docs
Live Demo
Deployed on HuggingFace Spaces:
URL: https://studzinsky-bielik-app-service.hf.space
Quick Test:
# Health check
curl https://studzinsky-bielik-app-service.hf.space/health
# List models
curl https://studzinsky-bielik-app-service.hf.space/models