Spaces:

studzinsky
/

bielik_app_service

Sleeping

File size: 8,439 Bytes

---
title: Bielik App Service
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# Bielik App Service

Multi-model LLM service for description enhancement, batch gap-filling, and A/B testing.

## Overview

This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports:
- **Description Enhancement**: Generate marketing descriptions from structured data
- **Batch Infill**: Fill gaps (`[GAP:n]` or `___`) in ad texts with natural words
- **Multi-Model Comparison**: Compare outputs across different models for A/B testing

## Models

| Model | Size | Polish Support | Type |
|-------|------|----------------|------|
| Bielik-1.5B | 1.5B | Excellent | Local |
| Qwen2.5-3B | 3B | Good | Local |
| Gemma-2-2B | 2B | Medium | Local |
| PLLuM-12B | 12B | Excellent | API |

## API Endpoints

### Health & Info

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/` | Welcome message |
| `GET` | `/health` | API health check and model status |
| `GET` | `/models` | List all available models |

### Model Management (Lazy Loading)

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/models/{name}/load` | Load a model into memory |
| `POST` | `/models/{name}/unload` | Unload a model from memory |

### Description Generation

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/enhance-description` | Generate description with single model |
| `POST` | `/compare` | Compare outputs from multiple models |

### Batch Infill (Gap-Filling)

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/infill` | Batch gap-filling with single model |
| `POST` | `/compare-infill` | Compare gap-filling across multiple models |

---

## Lazy Loading

Models are **not loaded at startup** to conserve memory. Instead:
- Models are loaded **on first request** (lazy loading)
- Only **one local model** is loaded at a time
- Switching to a different local model **automatically unloads** the previous one
- API models (PLLuM) don't affect local model memory

### Example: Load/Unload Flow
```
1. Request with bielik-1.5b → Loads Bielik (first use)
2. Request with qwen2.5-3b → Unloads Bielik, loads Qwen
3. Request with pllum-12b → Qwen stays loaded (API model doesn't affect local)
4. POST /models/qwen2.5-3b/unload → Manually free memory
```

---

## Endpoint Details

### `GET /health`

Check API status and loaded models.

**Response:**
```json
{
  "status": "ok",
  "available_models": 4,
  "loaded_models": ["bielik-1.5b"],
  "active_local_model": "bielik-1.5b"
}
```

---

### `GET /models`

List all available models with their load status.

**Response:**
```json
[
  {
    "name": "bielik-1.5b",
    "model_id": "speakleash/Bielik-1.5B-v3.0-Instruct",
    "type": "local",
    "polish_support": "excellent",
    "size": "1.5B",
    "loaded": true,
    "active": true
  },
  {
    "name": "qwen2.5-3b",
    "model_id": "Qwen/Qwen2.5-3B-Instruct",
    "type": "local",
    "polish_support": "good",
    "size": "3B",
    "loaded": false,
    "active": false
  }
]
```

---

### `POST /models/{name}/load`

Explicitly load a model. For local models, unloads the previous one first.

**Response:**
```json
{
  "status": "loaded",
  "model": {
    "name": "bielik-1.5b",
    "loaded": true,
    "active": true
  }
}
```

---

### `POST /models/{name}/unload`

Explicitly unload a model to free memory.

**Response:**
```json
{
  "status": "unloaded",
  "model": "bielik-1.5b"
}
```

---

### `POST /enhance-description`

Generate enhanced description using a single model.

**Request:**
```json
{
  "domain": "cars",
  "data": {
    "make": "BMW",
    "model": "320i",
    "year": 2020,
    "mileage": 45000,
    "features": ["nawigacja", "klimatyzacja"],
    "condition": "bardzo dobry"
  },
  "model": "bielik-1.5b"
}
```

**Response:**
```json
{
  "description": "Generated description text...",
  "model_used": "speakleash/Bielik-1.5B-v3.0-Instruct",
  "generation_time": 2.34,
  "user_email": "anonymous"
}
```

---

### `POST /compare`

Compare outputs from multiple models for the same input.

**Request:**
```json
{
  "domain": "cars",
  "data": {
    "make": "BMW",
    "model": "320i",
    "year": 2020,
    "mileage": 45000,
    "features": ["nawigacja", "klimatyzacja"],
    "condition": "bardzo dobry"
  },
  "models": ["bielik-1.5b", "qwen2.5-3b", "gemma-2-2b", "pllum-12b"]
}
```

**Response:**
```json
{
  "domain": "cars",
  "results": [
    {
      "model": "bielik-1.5b",
      "output": "Generated text from Bielik...",
      "time": 2.3,
      "type": "local",
      "error": null
    },
    {
      "model": "pllum-12b",
      "output": "Generated text from PLLuM...",
      "time": 1.1,
      "type": "inference_api",
      "error": null
    }
  ],
  "total_time": 5.67
}
```

---

### `POST /infill`

Batch gap-filling for ads using a single model. Accepts texts with `[GAP:n]` markers or `___` and returns filled text with per-gap choices and alternatives.

**Gap Notation:**
- `[GAP:1]`, `[GAP:2]`, ... → Explicit numbered gaps (preferred)
- `___` → Auto-numbered in scan order

**Request:**
```json
{
  "domain": "cars",
  "items": [
    {
      "id": "ad1",
      "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie technicznym"
    },
    {
      "id": "ad2", 
      "text_with_gaps": "Auto ma ___ km przebiegu i ___ lakier"
    }
  ],
  "model": "bielik-1.5b",
  "options": {
    "top_n_per_gap": 3,
    "language": "pl",
    "temperature": 0.6
  }
}
```

**Response:**
```json
{
  "model": "bielik-1.5b",
  "results": [
    {
      "id": "ad1",
      "status": "ok",
      "filled_text": "Sprzedam eleganckie BMW w doskonałym stanie technicznym",
      "gaps": [
        {
          "index": 1,
          "marker": "[GAP:1]",
          "choice": "eleganckie",
          "alternatives": ["piękne", "zadbane"]
        },
        {
          "index": 2,
          "marker": "[GAP:2]",
          "choice": "doskonałym",
          "alternatives": ["bardzo dobrym", "idealnym"]
        }
      ],
      "error": null
    }
  ],
  "total_time": 3.45,
  "processed_count": 2,
  "error_count": 0
}
```

**Options:**
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `gap_notation` | string | `"auto"` | `"auto"`, `"[GAP:n]"`, or `"___"` |
| `top_n_per_gap` | int | `3` | Alternatives per gap (1-5) |
| `language` | string | `"pl"` | Output language |
| `temperature` | float | `0.6` | Generation temperature (0-1) |
| `max_new_tokens` | int | `256` | Max tokens to generate |

---

### `POST /compare-infill`

Multi-model batch gap-filling comparison for A/B testing.

**Request:**
```json
{
  "domain": "cars",
  "items": [
    {
      "id": "ad1",
      "text_with_gaps": "Sprzedam [GAP:1] BMW w [GAP:2] stanie"
    }
  ],
  "models": ["bielik-1.5b", "qwen2.5-3b", "pllum-12b"],
  "options": {
    "top_n_per_gap": 3
  }
}
```

**Response:**
```json
{
  "domain": "cars",
  "models": [
    {
      "model": "bielik-1.5b",
      "type": "local",
      "results": [...],
      "time": 2.1,
      "error_count": 0
    },
    {
      "model": "qwen2.5-3b",
      "type": "local",
      "results": [...],
      "time": 1.8,
      "error_count": 0
    }
  ],
  "total_time": 5.2
}
```

---

## Domains

Currently supported domains:

| Domain | Schema Fields |
|--------|---------------|
| `cars` | `make`, `model`, `year`, `mileage`, `features[]`, `condition` |

---

## Environment Variables

| Variable | Description | Required |
|----------|-------------|----------|
| `HF_TOKEN` | HuggingFace API token for Inference API | Yes (for API models) |
| `LOCAL_MODEL_PATH` | Path to pre-downloaded local model | No (default: `/app/pretrain_model`) |
| `FRONTEND_URL` | Frontend URL for CORS | No |

## Running Locally

```bash
# Install dependencies
pip install -r requirements.txt

# Run server
uvicorn app.main:app --reload --port 8000
```

## Docker

```bash
# Build and run
./start_container.ps1
```

API available at `http://localhost:8000`

Docs at `http://localhost:8000/docs`

## Live Demo

Deployed on HuggingFace Spaces:

**URL:** `https://studzinsky-bielik-app-service.hf.space`

**Quick Test:**
```bash
# Health check
curl https://studzinsky-bielik-app-service.hf.space/health

# List models
curl https://studzinsky-bielik-app-service.hf.space/models
```