bielik_app_service / README.md
Patryk Studzinski
Fix: Handle function-call style
093fabc
metadata
title: Bielik App Service
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

Bielik App Service

Multi-model LLM service for description enhancement, batch gap-filling, and A/B testing.

Overview

This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports:

  • Description Enhancement: Generate marketing descriptions from structured data
  • Batch Infill: Fill gaps ([GAP:n] or ___) in ad texts with natural words
  • Multi-Model Comparison: Compare outputs across different models for A/B testing

Models

Model Size Polish Support Type
Bielik-1.5B 1.5B Excellent Local
Qwen2.5-3B 3B Good Local
Gemma-2-2B 2B Medium Local
PLLuM-12B 12B Excellent API

API Endpoints

Health & Info

Method Endpoint Description
GET / Welcome message
GET /health API health check and model status
GET /models List all available models

Model Management (Lazy Loading)

Method Endpoint Description
POST /models/{name}/load Load a model into memory
POST /models/{name}/unload Unload a model from memory

Description Generation

Method Endpoint Description
POST /enhance-description Generate description with single model
POST /compare Compare outputs from multiple models

Batch Infill (Gap-Filling)

Method Endpoint Description
POST /infill Batch gap-filling with single model
POST /compare-infill Compare gap-filling across multiple models

Lazy Loading

Models are not loaded at startup to conserve memory. Instead:

  • Models are loaded on first request (lazy loading)
  • Only one local model is loaded at a time
  • Switching to a different local model automatically unloads the previous one
  • API models (PLLuM) don't affect local model memory

Example: Load/Unload Flow

1. Request with bielik-1.5b → Loads Bielik (first use)
2. Request with qwen2.5-3b → Unloads Bielik, loads Qwen
3. Request with pllum-12b → Qwen stays loaded (API model doesn't affect local)
4. POST /models/qwen2.5-3b/unload → Manually free memory

Endpoint Details

GET /health

Check API status and loaded models.

Response:

{
  "status": "ok",
  "available_models": 4,
  "loaded_models": ["bielik-1.5b"],
  "active_local_model": "bielik-1.5b"
}

GET /models

List all available models with their load status.

Response:

[
  {
    "name": "bielik-1.5b",
    "model_id": "speakleash/Bielik-1.5B-v3.0-Instruct",
    "type": "local",
    "polish_support": "excellent",
    "size": "1.5B",
    "loaded": true,
    "active": true
  },
  {
    "name": "qwen2.5-3b",
    "model_id": "Qwen/Qwen2.5-3B-Instruct",
    "type": "local",
    "polish_support": "good",
    "size": "3B",
    "loaded": false,
    "active": false
  }
]

POST /models/{name}/load

Explicitly load a model. For local models, unloads the previous one first.

Response:

{
  "status": "loaded",
  "model": {
    "name": "bielik-1.5b",
    "loaded": true,
    "active": true
  }
}

POST /models/{name}/unload

Explicitly unload a model to free memory.

Response:

{
  "status": "unloaded",
  "model": "bielik-1.5b"
}

POST /enhance-description

Generate enhanced description using a single model.

Request:

{
  "domain": "cars",
  "data": {
    "make": "BMW",
    "model": "320i",
    "year": 2020,
    "mileage": 45000,
    "features": ["nawigacja", "klimatyzacja"],
    "condition": "bardzo dobry"
  },
  "model": "bielik-1.5b"
}

Response:

{
  "description": "Generated description text...",
  "model_used": "speakleash/Bielik-1.5B-v3.0-Instruct",
  "generation_time": 2.34,
  "user_email": "anonymous"
}

POST /compare

Compare outputs from multiple models for the same input.

Request:

{
  "domain": "cars",
  "data": {
    "make": "BMW",
    "model": "320i",
    "year": 2020,
    "mileage": 45000,
    "features": ["nawigacja", "klimatyzacja"],
    "condition": "bardzo dobry"
  },
  "models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
}

Response:

{
  "domain": "cars",
  "results": [
    {
      "model": "bielik-1.5b",
      "output": "Generated text from Bielik...",
      "time": 2.3,
      "type": "local",
      "error": null
    },
    {
      "model": "pllum-12b",
      "output": "Generated text from PLLuM...",
      "time": 1.1,
      "type": "inference_api",
      "error": null
    }
  ],
  "total_time": 5.67
}

POST /infill

Batch gap-filling for ads using a single model. Accepts texts with [GAP:n] markers or ___ and returns filled text with per-gap choices and alternatives.

Gap Notation:

  • [GAP:1], [GAP:2], ... → Explicit numbered gaps (preferred)
  • ___ → Auto-numbered in scan order

Request:

{
  "domain": "cars",
  "items": [
    {
      "id": "ad1",
      "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie technicznym"
    },
    {
      "id": "ad2", 
      "text_with_gaps": "Auto ma ___ km przebiegu i ___ lakier"
    }
  ],
  "model": "bielik-1.5b",
  "options": {
    "top_n_per_gap": 3,
    "language": "pl",
    "temperature": 0.6
  }
}

Response:

{
  "model": "bielik-1.5b",
  "results": [
    {
      "id": "ad1",
      "status": "ok",
      "filled_text": "Sprzedam eleganckie BMW w doskonałym stanie technicznym",
      "gaps": [
        {
          "index": 1,
          "marker": "[GAP:1]",
          "choice": "eleganckie",
          "alternatives": ["piękne", "zadbane"]
        },
        {
          "index": 2,
          "marker": "[GAP:2]",
          "choice": "doskonałym",
          "alternatives": ["bardzo dobrym", "idealnym"]
        }
      ],
      "error": null
    }
  ],
  "total_time": 3.45,
  "processed_count": 2,
  "error_count": 0
}

Options:

Field Type Default Description
gap_notation string "auto" "auto", "[GAP:n]", or "___"
top_n_per_gap int 3 Alternatives per gap (1-5)
language string "pl" Output language
temperature float 0.6 Generation temperature (0-1)
max_new_tokens int 256 Max tokens to generate

POST /compare-infill

Multi-model batch gap-filling comparison for A/B testing.

Request:

{
  "domain": "cars",
  "items": [
    {
      "id": "ad1",
      "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie"
    }
  ],
  "models": ["bielik-1.5b", "qwen2.5-3b", "pllum-12b"],
  "options": {
    "top_n_per_gap": 3
  }
}

Response:

{
  "domain": "cars",
  "models": [
    {
      "model": "bielik-1.5b",
      "type": "local",
      "results": [...],
      "time": 2.1,
      "error_count": 0
    },
    {
      "model": "qwen2.5-3b",
      "type": "local",
      "results": [...],
      "time": 1.8,
      "error_count": 0
    }
  ],
  "total_time": 5.2
}

Domains

Currently supported domains:

Domain Schema Fields
cars make, model, year, mileage, features[], condition

Environment Variables

Variable Description Required
HF_TOKEN HuggingFace API token for Inference API Yes (for API models)
LOCAL_MODEL_PATH Path to pre-downloaded local model No (default: /app/pretrain_model)
FRONTEND_URL Frontend URL for CORS No

Running Locally

# Install dependencies
pip install -r requirements.txt

# Run server
uvicorn app.main:app --reload --port 8000

Docker

# Build and run
./start_container.ps1

API available at http://localhost:8000

Docs at http://localhost:8000/docs

Live Demo

Deployed on HuggingFace Spaces:

URL: https://studzinsky-bielik-app-service.hf.space

Quick Test:

# Health check
curl https://studzinsky-bielik-app-service.hf.space/health

# List models
curl https://studzinsky-bielik-app-service.hf.space/models