Spaces:

studzinsky
/

bielik_app_service

Running

App Files Files Community

bielik_app_service / README.md

Patryk Studzinski

Fix: Handle function-call style

093fabc 10 days ago

preview code

raw

history blame contribute delete

8.44 kB

metadata

title: Bielik App Service
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false

Bielik App Service

Multi-model LLM service for description enhancement, batch gap-filling, and A/B testing.

Overview

This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports:

Description Enhancement: Generate marketing descriptions from structured data
Batch Infill: Fill gaps ([GAP:n] or ___) in ad texts with natural words
Multi-Model Comparison: Compare outputs across different models for A/B testing

Models

Model	Size	Polish Support	Type
Bielik-1.5B	1.5B	Excellent	Local
Qwen2.5-3B	3B	Good	Local
Gemma-2-2B	2B	Medium	Local
PLLuM-12B	12B	Excellent	API

API Endpoints

Health & Info

Method	Endpoint	Description
`GET`	`/`	Welcome message
`GET`	`/health`	API health check and model status
`GET`	`/models`	List all available models

Model Management (Lazy Loading)

Method	Endpoint	Description
`POST`	`/models/{name}/load`	Load a model into memory
`POST`	`/models/{name}/unload`	Unload a model from memory

Description Generation

Method	Endpoint	Description
`POST`	`/enhance-description`	Generate description with single model
`POST`	`/compare`	Compare outputs from multiple models

Batch Infill (Gap-Filling)

Method	Endpoint	Description
`POST`	`/infill`	Batch gap-filling with single model
`POST`	`/compare-infill`	Compare gap-filling across multiple models

Lazy Loading

Models are not loaded at startup to conserve memory. Instead:

Models are loaded on first request (lazy loading)
Only one local model is loaded at a time
Switching to a different local model automatically unloads the previous one
API models (PLLuM) don't affect local model memory

Example: Load/Unload Flow

1. Request with bielik-1.5b → Loads Bielik (first use)
2. Request with qwen2.5-3b → Unloads Bielik, loads Qwen
3. Request with pllum-12b → Qwen stays loaded (API model doesn't affect local)
4. POST /models/qwen2.5-3b/unload → Manually free memory

Endpoint Details

`GET /health`

Check API status and loaded models.

Response:

{
  "status": "ok",
  "available_models": 4,
  "loaded_models": ["bielik-1.5b"],
  "active_local_model": "bielik-1.5b"
}

`GET /models`

List all available models with their load status.

Response:

[
  {
    "name": "bielik-1.5b",
    "model_id": "speakleash/Bielik-1.5B-v3.0-Instruct",
    "type": "local",
    "polish_support": "excellent",
    "size": "1.5B",
    "loaded": true,
    "active": true
  },
  {
    "name": "qwen2.5-3b",
    "model_id": "Qwen/Qwen2.5-3B-Instruct",
    "type": "local",
    "polish_support": "good",
    "size": "3B",
    "loaded": false,
    "active": false
  }
]

`POST /models/{name}/load`

Explicitly load a model. For local models, unloads the previous one first.

Response:

{
  "status": "loaded",
  "model": {
    "name": "bielik-1.5b",
    "loaded": true,
    "active": true
  }
}

`POST /models/{name}/unload`

Explicitly unload a model to free memory.

Response:

{
  "status": "unloaded",
  "model": "bielik-1.5b"
}

`POST /enhance-description`

Generate enhanced description using a single model.

Request:

{
  "domain": "cars",
  "data": {
    "make": "BMW",
    "model": "320i",
    "year": 2020,
    "mileage": 45000,
    "features": ["nawigacja", "klimatyzacja"],
    "condition": "bardzo dobry"
  },
  "model": "bielik-1.5b"
}

Response:

{
  "description": "Generated description text...",
  "model_used": "speakleash/Bielik-1.5B-v3.0-Instruct",
  "generation_time": 2.34,
  "user_email": "anonymous"
}

`POST /compare`

Compare outputs from multiple models for the same input.

Request:

{
  "domain": "cars",
  "data": {
    "make": "BMW",
    "model": "320i",
    "year": 2020,
    "mileage": 45000,
    "features": ["nawigacja", "klimatyzacja"],
    "condition": "bardzo dobry"
  },
  "models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
}

Response:

{
  "domain": "cars",
  "results": [
    {
      "model": "bielik-1.5b",
      "output": "Generated text from Bielik...",
      "time": 2.3,
      "type": "local",
      "error": null
    },
    {
      "model": "pllum-12b",
      "output": "Generated text from PLLuM...",
      "time": 1.1,
      "type": "inference_api",
      "error": null
    }
  ],
  "total_time": 5.67
}

`POST /infill`

Batch gap-filling for ads using a single model. Accepts texts with [GAP:n] markers or ___ and returns filled text with per-gap choices and alternatives.

Gap Notation:

[GAP:1], [GAP:2], ... → Explicit numbered gaps (preferred)
___ → Auto-numbered in scan order

Request:

{
  "domain": "cars",
  "items": [
    {
      "id": "ad1",
      "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie technicznym"
    },
    {
      "id": "ad2", 
      "text_with_gaps": "Auto ma ___ km przebiegu i ___ lakier"
    }
  ],
  "model": "bielik-1.5b",
  "options": {
    "top_n_per_gap": 3,
    "language": "pl",
    "temperature": 0.6
  }
}

Response:

{
  "model": "bielik-1.5b",
  "results": [
    {
      "id": "ad1",
      "status": "ok",
      "filled_text": "Sprzedam eleganckie BMW w doskonałym stanie technicznym",
      "gaps": [
        {
          "index": 1,
          "marker": "[GAP:1]",
          "choice": "eleganckie",
          "alternatives": ["piękne", "zadbane"]
        },
        {
          "index": 2,
          "marker": "[GAP:2]",
          "choice": "doskonałym",
          "alternatives": ["bardzo dobrym", "idealnym"]
        }
      ],
      "error": null
    }
  ],
  "total_time": 3.45,
  "processed_count": 2,
  "error_count": 0
}

Options:

Field	Type	Default	Description
`gap_notation`	string	`"auto"`	`"auto"`, `"[GAP:n]"`, or `"___"`
`top_n_per_gap`	int	`3`	Alternatives per gap (1-5)
`language`	string	`"pl"`	Output language
`temperature`	float	`0.6`	Generation temperature (0-1)
`max_new_tokens`	int	`256`	Max tokens to generate

`POST /compare-infill`

Multi-model batch gap-filling comparison for A/B testing.

Request:

{
  "domain": "cars",
  "items": [
    {
      "id": "ad1",
      "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie"
    }
  ],
  "models": ["bielik-1.5b", "qwen2.5-3b", "pllum-12b"],
  "options": {
    "top_n_per_gap": 3
  }
}

Response:

{
  "domain": "cars",
  "models": [
    {
      "model": "bielik-1.5b",
      "type": "local",
      "results": [...],
      "time": 2.1,
      "error_count": 0
    },
    {
      "model": "qwen2.5-3b",
      "type": "local",
      "results": [...],
      "time": 1.8,
      "error_count": 0
    }
  ],
  "total_time": 5.2
}

Domains

Currently supported domains:

Domain	Schema Fields
`cars`	`make`, `model`, `year`, `mileage`, `features[]`, `condition`

Environment Variables

Variable	Description	Required
`HF_TOKEN`	HuggingFace API token for Inference API	Yes (for API models)
`LOCAL_MODEL_PATH`	Path to pre-downloaded local model	No (default: `/app/pretrain_model`)
`FRONTEND_URL`	Frontend URL for CORS	No

Running Locally

# Install dependencies
pip install -r requirements.txt

# Run server
uvicorn app.main:app --reload --port 8000

Docker

# Build and run
./start_container.ps1

API available at http://localhost:8000

Docs at http://localhost:8000/docs

Live Demo

Deployed on HuggingFace Spaces:

URL: https://studzinsky-bielik-app-service.hf.space

Quick Test:

# Health check
curl https://studzinsky-bielik-app-service.hf.space/health

# List models
curl https://studzinsky-bielik-app-service.hf.space/models