Spaces:

studzinsky
/

bielik_app_service

Running

App Files Files Community

Patryk Studzinski commited on 11 days ago

Commit

a7fd202

1 Parent(s): a1c0774

first-imrpvement-commit

Browse files

Files changed (12) hide show

PROJECT_CONTEXT.md +0 -107
README.md +141 -276
answer.md +0 -213
app/main.py +115 -57
app/models/__init__.py +16 -0
app/models/base_llm.py +54 -0
app/models/huggingface_inference_api.py +93 -0
app/models/huggingface_local.py +133 -0
app/models/registry.py +124 -0
app/schemas/schemas.py +31 -0
llm_app_rework.md +141 -0
requirements.txt +2 -1

PROJECT_CONTEXT.md DELETED Viewed

@@ -1,107 +0,0 @@
-# GPT4All Service - Project Context
-## Project Overview
-This is a **Polish Car Description Enhancement Service** built as a FastAPI microservice that uses a Hugging Face Large Language Model to generate enhanced marketing descriptions for cars in Polish language.
-## Core Functionality
-The service takes basic car information (make, model, year, mileage, features, condition) and generates compelling, marketing-friendly descriptions in Polish using the `speakleash/Bielik-1.5B-v3.0-Instruct` model - a Polish language model from the Bielik series.
-## Project Structure
-```
-gpt4all-service/
-├── app/
-│   ├── main.py                    # FastAPI application with endpoints
-│   ├── models/
-│   │   └── huggingface_service.py # Core LLM service wrapper
-│   └── schemas/
-│       └── schemas.py             # Pydantic data models
-├── Dockerfile                     # Multi-stage Docker build
-├── download_model.py             # Model download script for Docker
-├── requirements.txt              # Python dependencies
-├── start_container.ps1           # PowerShell startup script
-├── start_container.sh            # Bash startup script
-└── README.md                     # Comprehensive documentation
-```
-## Technical Architecture
-### 1. FastAPI Application (`app/main.py`)
-- **Framework**: FastAPI with CORS middleware
-- **Main Endpoint**: `POST /enhance-description` - takes car data, returns enhanced description
-- **Health Check**: `GET /health` - service status and model initialization check
-- **CORS**: Configured for frontend on `http://localhost:5173` (likely React/Vue dev server)
-### 2. LLM Service (`app/models/huggingface_service.py`)
-- **Purpose**: Wrapper around Hugging Face Transformers pipeline
-- **Model**: `speakleash/Bielik-1.5B-v3.0-Instruct` (Polish language model)
-- **Features**:
-  - Async initialization and text generation
-  - Support for both GPU (CUDA) and CPU inference
-  - Chat template support for conversation-style prompts
-  - Configurable generation parameters (temperature, top_p, max_tokens)
-  - Smart response parsing to extract only the assistant's response
-### 3. Data Models (`app/schemas/schemas.py`)
-- **CarData**: Input model with make, model, year, mileage, features[], condition
-- **EnhancedDescriptionResponse**: Output model with generated description
-### 4. Containerization
-- **Docker**: Self-contained image with pre-downloaded model (~3.2GB)
-- **Security**: Uses Docker BuildKit secrets for Hugging Face token handling
-- **Model Storage**: Downloaded to `/app/pretrain_model` during build
-- **Runtime**: Python 3.9-slim base image
-## Key Technical Details
-### Model Configuration
-- **Model Path**: `/app/pretrain_model` (in container) or configurable for local dev
-- **Device**: Currently set to CPU in main.py, but service supports GPU
-- **Generation Params**: 150 max tokens, temperature 0.75, top_p 0.9
-### Prompt Engineering
-The service uses a carefully crafted Polish system prompt:
-- Instructs the model to create marketing descriptions in Polish
-- Limits output to 500 characters maximum
-- Tells the model to ignore off-topic content
-- Uses chat template format with system/user roles
-### Dependencies
-- **fastapi**: Web framework
-- **uvicorn[standard]**: ASGI server
-- **transformers[torch]**: Hugging Face transformers with PyTorch
-- **accelerate**: Hugging Face optimization library
-## Current State & Issues
-### Git Status
-- Modified `app/main.py` (likely recent changes)
-- Deleted `app/models/gpt4all.py` (indicates migration from GPT4All to Hugging Face)
-### Linter Issues in `huggingface_service.py`
-1. Import issues: `pipeline` and `AutoTokenizer` imports need specific paths
-2. Type annotations: `device: str = None` should be `Optional[str] = None`
-3. Method parameters: Similar optional parameter typing issues
-## Usage Scenarios
-1. **Car Dealership Websites**: Auto-generate compelling descriptions from basic car specs
-2. **Marketplace Applications**: Enhance user-provided car listings
-3. **Inventory Management**: Bulk description generation for car databases
-## Deployment Options
-1. **Local Development**: Direct Python/uvicorn execution
-2. **Docker Container**: Self-contained deployment with pre-downloaded model
-3. **Production**: Containerized deployment with proper authentication
-## Authentication Requirements
-- Hugging Face Hub token required for model download (gated model)
-- Token stored in `my_hf_token.txt` during Docker build
-- Securely handled via Docker BuildKit secrets
-## Performance Considerations
-- Model size: ~3.2GB (significant memory footprint)
-- CPU inference: Slower but more accessible
-- GPU inference: Faster but requires CUDA setup
-- Async design: Non-blocking text generation
-This service represents a specialized AI application for the Polish automotive market, focusing on generating marketing content using state-of-the-art Polish language models.

README.md CHANGED Viewed

@@ -1,314 +1,179 @@
----
-title: AI Car Description Enhancer
-emoji: 🚗✨
-colorFrom: red
-colorTo: yellow
-sdk: docker
-pinned: false
----
-<div align="center">
-# AI Car Description Enhancer "Bielik"
-**Turbocharge your automotive listings! This app, powered by the Bielik Polish language model, transforms dry vehicle data into compelling, ready-to-publish marketing descriptions.**
-</div>
----
-## Contents
-1. [Features](#features)
-2. [Prerequisites](#prerequisites)
-3. [Project Structure](#project-structure)
-4. [Installation (Local Development)](#installation-local-development)
-5. [Usage (Local Development)](#usage-local-development)
-6. [Docker Usage](#docker-usage)
-7. [Quick Start with PowerShell (`start_container.ps1`)](#quick-start-with-powershell-start_containerps1)
-8. [API Endpoints](#api-endpoints)
-    - [Health Check](#health-check)
-    - [Enhance Description](#enhance-description)
-9. [Core Service (`app/models/huggingface_service.py`)](#core-service-appmodelshuggingface_servicepy)
-10. [Configuration](#configuration)
-11. [Schemas (`app/schemas/schemas.py`)](#schemas-appschemasschemaspy)
-    - [CarData](#cardata)
-    - [EnhancedDescriptionResponse](#enhanceddescriptionresponse)
-12. [Contributing](#contributing)
-13. [License](#license)
----
-# LLM Car Description Enhancer (Polish)
-This repository contains a FastAPI application that utilizes a Hugging Face Transformers Large Language Model (specifically, `speakleash/Bielik-1.5B-v3.0-Instruct` or a similar model from the Bielik series) to generate enhanced marketing descriptions for cars, primarily in Polish.
-The application is designed to be run locally for development or containerized using Docker for deployment. The LLM is baked into the Docker image for self-contained and efficient execution, which may require Hugging Face Hub authentication during the build process if the model is gated.
-## Features
-- Generate enhanced marketing descriptions for cars in Polish.
-- Utilizes the `speakleash/Bielik-1.5B-v3.0-Instruct` model via the Hugging Face `transformers` library.
-- Health check endpoint.
-- Docker support for easy deployment, with the model included in the image.
-- Includes a `start_container.sh` script for convenient container startup.
-## Prerequisites
-- Python 3.9 or higher
-- `pip` (Python package installer)
-- Docker (for containerized deployment, Docker BuildKit enabled recommended for secrets)
-- Git (for cloning the repository)
-- A Hugging Face Hub account and an access token (with `read` permissions) if the chosen model is gated (see Docker Usage section).
-- For using `start_container.sh`: A bash-compatible shell (like those on Linux, macOS, or Git Bash on Windows).
-## Project Structure
-A typical layout for this project would be:
-```text
-.
-├── app/
-│   ├── __init__.py
-│   ├── main.py                   # FastAPI application, endpoints
-│   ├── models/
-│   │   ├── __init__.py
-│   │   └── huggingface_service.py  # Service for interacting with the LLM
-│   └── schemas/
-│       ├── __init__.py
-│       └── schemas.py              # Pydantic schemas for request/response
-├── .gitignore
-├── Dockerfile
-├── download_model.py             # Script to download model during Docker build
-├── my_hf_token.txt               # (Should be created locally) For storing HF token
-├── requirements.txt
-├── start_container.sh            # Helper script to run the Docker container
-└── README.md
 ```
-## Installation (Local Development)
-1.  **Clone the repository:**
-    ```bash
-    git clone [https://github.com/studzin-sky/llm-description-enhancer.git](https://github.com/studzin-sky/llm-description-enhancer.git)
-    cd llm-description-enhancer
-    ```
-2.  **Create and activate a virtual environment:**
-    (Recommended to keep dependencies isolated)
-    ```bash
-    python -m venv venv
-    ```
-    * On macOS/Linux:
-        ```bash
-        source venv/bin/activate
-        ```
-    * On Windows (PowerShell):
-        ```bash
-        .\venv\Scripts\Activate.ps1
-        ```
-    * On Windows (Command Prompt):
-        ```bash
-        venv\Scripts\activate.bat
-        ```
-3.  **Install the required dependencies:**
-    Ensure your `requirements.txt` includes `fastapi`, `uvicorn[standard]`, `transformers[torch]`, `torch`, `accelerate`, and `huggingface_hub`.
-    ```bash
-    pip install -r requirements.txt
-    ```
-    *Note: The first time you run the application locally (or if the model cache is empty), the Hugging Face model (~3.2GB) will be downloaded. This might take some time. **If the model (`speakleash/Bielik-1.5B-v3.0-Instruct` or the one configured) is gated or requires authentication, you may need to log in using `huggingface-cli login` in your terminal before running the application locally.** After logging in, your token will be cached by the `huggingface_hub` library.*
-## Usage (Local Development)
-1.  **Start the FastAPI server:**
-    From the project root directory:
-    ```bash
-    uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
-    ```
-    * `--reload` enables auto-reloading for development.
-    * `--host 0.0.0.0` makes the server accessible on your network.
-2.  **Access the application:**
-    * Health Check: [http://127.0.0.1:8000/health](http://127.0.0.1:8000/health)
-    * API Documentation (Swagger UI): [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
-    * Enhance Description: `POST` requests to [http://127.0.0.1:8000/enhance-description](http://127.0.0.1:8000/enhance-description)
-## Docker Usage
-The included `Dockerfile` builds an image with the application and the pre-downloaded Hugging Face model, making it self-contained. Downloading gated models during the build process requires a Hugging Face Hub token.
-1.  **Prepare Hugging Face Hub Token (for Gated Models):**
-    The `speakleash/Bielik-1.5B-v3.0-Instruct` model may require authentication to download.
-    * **Get a Token:**
-        1.  Go to your Hugging Face account settings: [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
-        2.  Create a new token (e.g., named "docker-bielik-access") with `read` permissions.
-        3.  Copy the generated token (it will start with `hf_`).
-    * **Create Token File:**
-        1.  In your project's root directory (next to your `Dockerfile`), create a file named `my_hf_token.txt`.
-        2.  Paste **only the token string** (e.g., `hf_YourActualTokenValueHere`) into this file. Do not add any other text or variable names.
-2.  **Build the Docker image:**
-    From the project root directory, run:
-    ```bash
-    DOCKER_BUILDKIT=1 docker build --secret id=huggingface_token,src=my_hf_token.txt -t llm-description-enhancer .
-    ```
-    * `DOCKER_BUILDKIT=1`: Enables BuildKit, which is required for using `--secret`.
-    * `--secret id=huggingface_token,src=my_hf_token.txt`: Securely provides the content of `my_hf_token.txt` to the build process. The `id=huggingface_token` must match the ID used in the `RUN --mount` directive in your `Dockerfile`.
-    * *(This step will take a while, especially the first time, as it downloads the LLM using your token).*
-3.  **Run the Docker container using the Helper Script (`start_container.sh`):**
-    A helper script `start_container.sh` is included in the repository to simplify starting the Docker container. This script typically handles stopping/removing any pre-existing container with the same configured name and then starts a new one.
-    * **Ensure the script is executable:**
-        After cloning the repository, or if the execute permission isn't set, you might need to make the script executable (on Linux, macOS, or Git Bash on Windows):
-        ```bash
-        chmod +x start_container.sh
-        ```
-    * **Run the script:**
-        From the project root directory:
-        ```bash
-        ./start_container.sh
-        ```
-    * **Expected Outcome (depends on your script's content):**
-        The script will likely:
-        * Output messages indicating it's managing the container.
-        * Start the container (possibly in detached mode).
-        * Inform you that the service is available at `http://127.0.0.1:8000`.
-        * Provide commands to view logs or stop the container if it's running in detached mode (e.g., `docker logs <container_name> -f` and `docker stop <container_name>`).
-    *(Alternatively, you can run the container manually: `docker run --rm -p 8000:8000 llm-description-enhancer`)*
-4.  **Test the containerized application:**
-    Once the container is running (via the script or manually), send requests to `http://127.0.0.1:8000` as described in the API Endpoints section.
-## Quick Start with PowerShell (`start_container.ps1`)
-For Windows users, you can automate the Docker build and run process using the provided PowerShell script. This script will:
-- Build the Docker image using your Hugging Face token (from `my_hf_token.txt`)
-- Stop and remove any existing container named `bielik_app_instance`
-- Start a new container and map port 8000
-**Steps:**
-1. Ensure your Hugging Face token is saved in `my_hf_token.txt` in the project root (see above for details).
-2. Open PowerShell in the project directory.
-3. (Optional, but recommended) Temporarily allow running unsigned scripts for this session:
-   ```powershell
-   Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process
-   ```
-4. Run the script:
-   ```powershell
-   .\start_container.ps1
-   ```
-The script will build the image and start the container. Your FastAPI service will be available at [http://127.0.0.1:8000](http://127.0.0.1:8000).
-You can view logs with:
-```powershell
-docker logs bielik_app_instance -f
-```
-To stop the container:
-```powershell
-docker stop bielik_app_instance
-```
-If you encounter a security error about script signing, see the [Microsoft documentation on execution policies](https://go.microsoft.com/fwlink/?LinkID=135170).
 ---
-## API Endpoints
-### Health Check
--   **Endpoint:** `/health`
--   **Method:** `GET`
--   **Description:** Returns the status of the application and model initialization.
--   **Example Response:**
-    ```json
-    {
-      "status": "ok",
-      "model_initialized": true,
-      "model_path": "/app/pretrain_model"
-    }
-    ```
-### Enhance Description
--   **Endpoint:** `/enhance-description`
--   **Method:** `POST`
--   **Description:** Generates an enhanced marketing description for a car in Polish.
--   **Request Body (`application/json`):**
-    ```json
     {
-      "make": "Volkswagen",
-      "model": "Golf",
-      "year": 2022,
-      "mileage": 15000,
-      "features": ["Klimatyzacja automatyczna", "System nawigacji", "Czujniki parkowania"],
-      "condition": "Bardzo dobry"
-    }
-    ```
--   **Response (`application/json`):**
-    ```json
     {
-      "description": "Wygenerowany przez AI opis samochodu..."
     }
-    ```
--   **Example cURL request (for Git Bash / bash-like shells):**
-    ```bash
-    curl -X POST "http://127.0.0.1:8000/enhance-description" \
-    -H "Content-Type: application/json" \
-    -d '{
-        "make": "Toyota",
-        "model": "Corolla",
-        "year": 2021,
-        "mileage": 25000,
-        "features": ["Kamera cofania", "Apple CarPlay", "Android Auto", "System bezkluczykowy"],
-        "condition": "Bardzo dobry"
-    }'
-    ```
-## Core Service (`app/models/huggingface_service.py`)
-The `HuggingFaceTextGenerationService` class handles the interaction with the Large Language Model.
--   **Key Methods:**
-    -   `async initialize()`: Loads the pre-trained model and tokenizer from the path specified during service instantiation (e.g., `/app/pretrain_model` in Docker, or from Hugging Face cache locally).
-    -   `async generate_text(chat_template_messages: list, max_new_tokens: int, ...)`: Generates text based on a structured chat prompt, applying appropriate chat templates and parsing the model's output to return only the assistant's response.
-## Configuration
--   **Model Used:** `speakleash/Bielik-1.5B-v3.0-Instruct`. This is baked into `/app/pretrain_model` in the Docker image. For local development, it's downloaded to the Hugging Face cache.
--   **Language:** The primary focus is on generating descriptions in **Polish**.
--   **Prompt Engineering:** The system and user prompts in `app/main.py` are crafted to guide the model towards generating concise and relevant marketing descriptions.
-## Schemas (`app/schemas/schemas.py`)
-Pydantic models are used for request and response validation.
-### `CarData`
--   **Fields:**
-    -   `make`: `str`
-    -   `model`: `str`
-    -   `year`: `int`
-    -   `mileage`: `int`
-    -   `features`: `list[str]`
-    -   `condition`: `str`
-### `EnhancedDescriptionResponse`
--   **Fields:**
-    -   `description`: `str`
-## Contributing
-Contributions are welcome! Please open an issue or submit a pull request for any changes.
-## License
-This project is licensed under the MIT License.

+# Bielik App Service
+Multi-model LLM service for description enhancement and A/B testing.
+## Overview
+This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports comparing outputs across different models to evaluate quality, speed, and Polish language support.
+## Models
+| Model | Size | Polish Support | Type |
+|-------|------|----------------|------|
+| Bielik-1.5B | 1.5B | Excellent | Local |
+| PLLuM-12B | 12B | Excellent | API |
+| Mistral-Small-3 | 24B | Good | API |
+| Gemma-2-9B | 9B | Medium | API |
+## API Endpoints
+### Health & Info
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `GET` | `/` | Welcome message |
+| `GET` | `/health` | API health check and model status |
+| `GET` | `/models` | List all available models |
+### Generation
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `POST` | `/enhance-description` | Generate description with single model |
+| `POST` | `/compare` | Compare outputs from multiple models |
+---
+## Endpoint Details
+### `GET /health`
+Check API status and model initialization.
+**Response:**
+```json
+{
+  "status": "ok",
+  "local_models_initialized": true,
+  "available_models": 4
+}
 ```
+---
+### `GET /models`
+List all available models with their details.
+**Response:**
+```json
+[
+  {
+    "name": "bielik-1.5b",
+    "model_id": "speakleash/Bielik-1.5B-v3.0-Instruct",
+    "type": "local",
+    "polish_support": "excellent",
+    "size": "1.5B",
+    "initialized": true
+  }
+]
+```
 ---
+### `POST /enhance-description`
+Generate enhanced description using a single model.
+**Request:**
+```json
+{
+  "domain": "cars",
+  "data": {
+    "make": "BMW",
+    "model": "320i",
+    "year": 2020,
+    "mileage": 45000,
+    "features": ["nawigacja", "klimatyzacja"],
+    "condition": "bardzo dobry"
+  },
+  "model": "bielik-1.5b"
+}
+```
+**Response:**
+```json
+{
+  "description": "Generated description text...",
+  "model_used": "speakleash/Bielik-1.5B-v3.0-Instruct",
+  "generation_time": 2.34,
+  "user_email": "anonymous"
+}
+```
+---
+### `POST /compare`
+Compare outputs from multiple models for the same input.
+**Request:**
+```json
+{
+  "domain": "cars",
+  "data": {
+    "make": "BMW",
+    "model": "320i",
+    "year": 2020,
+    "mileage": 45000,
+    "features": ["nawigacja", "klimatyzacja"],
+    "condition": "bardzo dobry"
+  },
+  "models": ["bielik-1.5b", "pllum-12b", "gemma-2-9b"]
+}
+```
+**Response:**
+```json
+{
+  "domain": "cars",
+  "results": [
     {
+      "model": "bielik-1.5b",
+      "output": "Generated text from Bielik...",
+      "time": 2.3,
+      "type": "local",
+      "error": null
+    },
     {
+      "model": "pllum-12b",
+      "output": "Generated text from PLLuM...",
+      "time": 1.1,
+      "type": "inference_api",
+      "error": null
     }
+  ],
+  "total_time": 5.67
+}
+```
+---
+## Environment Variables
+| Variable | Description | Required |
+|----------|-------------|----------|
+| `HF_TOKEN` | HuggingFace API token for Inference API | Yes (for API models) |
+| `LOCAL_MODEL_PATH` | Path to pre-downloaded local model | No (default: `/app/pretrain_model`) |
+| `FRONTEND_URL` | Frontend URL for CORS | No |
+## Running Locally
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run server
+uvicorn app.main:app --reload --port 8000
+```
+## Docker
+```bash
+# Build and run
+./start_container.ps1
+```
+API available at `http://localhost:8000`
+Docs at `http://localhost:8000/docs`

answer.md DELETED Viewed

@@ -1,213 +0,0 @@
-Great choice! Creating a separate service for GPT4All is a clean and scalable approach. Let’s break this down step by step, starting from setting up the project structure to deploying the service.
----
-### **1. Project Structure**
-Here’s a suggested folder structure for your GPT4All service:
-```
-gpt4all-service/
-├── app/
-│   ├── __init__.py
-│   ├── main.py          # FastAPI/Flask app entry point
-│   ├── models/          # GPT4All model loading and inference logic
-│   │   └── gpt4all.py
-│   └── schemas/         # Pydantic models for request/response validation
-│       └── schemas.py
-├── tests/               # Unit and integration tests
-│   └── test_api.py
-├── requirements.txt     # Python dependencies
-├── Dockerfile           # For containerization
-├── README.md            # Project documentation
-└── .env                 # Environment variables (optional)
-```
----
-### **2. Setting Up the Project**
-1. **Create the Project Folder**:
-   ```bash
-   mkdir gpt4all-service
-   cd gpt4all-service
-   ```
-2. **Initialize a Virtual Environment**:
-   ```bash
-   python -m venv venv
-   source venv/bin/activate  # On Windows: venv\Scripts\activate
-   ```
-3. **Install Dependencies**:
-   Create a `requirements.txt` file:
-   ```plaintext
-   fastapi
-   uvicorn
-   gpt4all
-   pydantic
-   python-dotenv
-   ```
-   Install the dependencies:
-   ```bash
-   pip install -r requirements.txt
-   ```
----
-### **3. Build the GPT4All Service**
-#### **Step 1: Create the Model Loading Logic**
-- Create `app/models/gpt4all.py`:
-  ```python
-  from gpt4all import GPT4All
-  class GPT4AllService:
-      def __init__(self, model_path: str):
-          self.model = GPT4All(model_path)
-      def generate_description(self, prompt: str) -> str:
-          response = self.model.generate(prompt, max_tokens=300)
-          return response
-  ```
-#### **Step 2: Define Request/Response Schemas**
-- Create `app/schemas/schemas.py`:
-  ```python
-  from pydantic import BaseModel
-  class CarData(BaseModel):
-      make: str
-      model: str
-      year: int
-      mileage: int
-      features: list[str]
-      condition: str
-  class EnhancedDescriptionResponse(BaseModel):
-      description: str
-  ```
-#### **Step 3: Create the FastAPI App**
-- Create `app/main.py`:
-  ```python
-  from fastapi import FastAPI, HTTPException
-  from app.models.gpt4all import GPT4AllService
-  from app.schemas.schemas import CarData, EnhancedDescriptionResponse
-  app = FastAPI()
-  # Initialize GPT4All service
-  gpt4all_service = GPT4AllService("ggml-model-gpt4all-falcon-q4_0.bin")
-  @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
-  async def enhance_description(car_data: CarData):
-      try:
-          # Create a prompt from car data
-          prompt = f"""
-          Enhance this car description for an auction portal:
-          - Make: {car_data.make}
-          - Model: {car_data.model}
-          - Year: {car_data.year}
-          - Mileage: {car_data.mileage}
-          - Features: {', '.join(car_data.features)}
-          - Condition: {car_data.condition}
-          """
-          # Generate description
-          description = gpt4all_service.generate_description(prompt)
-          return {"description": description}
-      except Exception as e:
-          raise HTTPException(status_code=500, detail=str(e))
-  ```
----
-### **4. Run the Service**
-1. **Start the Service**:
-   ```bash
-   uvicorn app.main:app --reload --port 8000
-   ```
-2. **Test the API**:
-   Use `curl` or Postman to send a POST request:
-   ```bash
-   curl -X POST "http://localhost:8000/enhance-description" \
-   -H "Content-Type: application/json" \
-   -d '{
-       "make": "Toyota",
-       "model": "Camry",
-       "year": 2020,
-       "mileage": 45000,
-       "features": ["sunroof", "leather seats", "lane assist"],
-       "condition": "excellent"
-   }'
-   ```
-   **Expected Response**:
-   ```json
-   {
-       "description": "This 2020 Toyota Camry is a well-maintained vehicle with only 45,000 miles on the odometer. It comes equipped with a sunroof, luxurious leather seats, and advanced lane assist technology. Perfect for families or commuters, this car is in excellent condition and ready to hit the road. Don’t miss out on this fantastic deal!"
-   }
-   ```
----
-### **5. Add Logging and Error Handling**
-- Add logging to track requests and errors:
-  ```python
-  import logging
-  logging.basicConfig(level=logging.INFO)
-  logger = logging.getLogger(__name__)
-  @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
-  async def enhance_description(car_data: CarData):
-      try:
-          logger.info(f"Received request for car: {car_data.make} {car_data.model}")
-          prompt = f"Enhance this car description: {car_data}"
-          description = gpt4all_service.generate_description(prompt)
-          return {"description": description}
-      except Exception as e:
-          logger.error(f"Error generating description: {e}")
-          raise HTTPException(status_code=500, detail=str(e))
-  ```
----
-### **6. Containerize the Service (Optional)**
-Create a `Dockerfile` to containerize the service:
-```dockerfile
-# Use an official Python runtime as a parent image
-FROM python:3.9-slim
-# Set the working directory
-WORKDIR /app
-# Copy the requirements file
-COPY requirements.txt .
-# Install dependencies
-RUN pip install --no-cache-dir -r requirements.txt
-# Copy the application code
-COPY . .
-# Expose the port the app runs on
-EXPOSE 8000
-# Run the application
-CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
-```
-Build and run the Docker container:
-```bash
-docker build -t gpt4all-service .
-docker run -p 8000:8000 gpt4all-service
-```
----
-### **7. Next Steps**
-- Add unit tests in the `tests/` folder.
-- Add environment variables for configuration (e.g., model path, port).
-- Integrate with your Flask backend by calling this service via HTTP.
-Let me know if you need help with any specific part (e.g., testing, deployment, or advanced features)!

app/main.py CHANGED Viewed

@@ -1,20 +1,26 @@
 import os
 import time
 import importlib
 from fastapi import FastAPI, HTTPException, Depends, Body
-from typing import Optional
 from pydantic import ValidationError
-from app.models.huggingface_service import HuggingFaceTextGenerationService
 from fastapi.middleware.cors import CORSMiddleware
-from app.schemas.schemas import EnhancedDescriptionResponse
 from app.auth.placeholder_auth import get_authenticated_user
-# MCP imports removed
 app = FastAPI(
-    title="Modular Car Description Enhancer",
-    description="AI-powered service for enhancing descriptions for multiple domains with Auth0 JWT authentication",
-    version="2.0.0"
 )
 # CORS configuration
@@ -30,21 +36,15 @@ app.add_middleware(
     allow_headers=["*"],
 )
-# Global service initialization
-MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
-hf_service = HuggingFaceTextGenerationService(
-    model_name_or_path=MODEL_PATH_IN_CONTAINER,
-    device="cpu"
-)
 @app.on_event("startup")
 async def startup_event():
-    print("Starting up and initializing HuggingFace service...")
     try:
-        await hf_service.initialize()
-        print(f"HuggingFace service initialized successfully from {MODEL_PATH_IN_CONTAINER}.")
     except Exception as e:
-        print(f"An unexpected error occurred during HuggingFace service initialization: {e}")
         raise
 # --- Helper function to load domain logic ---
@@ -59,95 +59,153 @@ def get_domain_config(domain: str):
 @app.get("/")
 async def read_root():
-    return {"message": "Welcome to the Modular Description Enhancer API! Go to /docs for documentation."}
 @app.get("/health")
 async def health_check():
     return {
         "status": "ok",
-        "model_initialized": hf_service.pipeline is not None,
     }
 @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
 async def enhance_description(
     domain: str = Body(..., embed=True),
     data: dict = Body(..., embed=True),
     user: Optional[dict] = Depends(get_authenticated_user)
 ):
     """
-    Generate an enhanced description for a given domain and data.
     - **domain**: The name of the domain (e.g., 'cars').
     - **data**: A dictionary with the data for the description.
     """
     start_time = time.time()
-    # --- 1. Load Domain Configuration ---
     domain_config = get_domain_config(domain)
     DomainSchema = domain_config["schema"]
     create_prompt = domain_config["create_prompt"]
-    # mcp_rules removed
-    # --- 2. Validate Input Data ---
     try:
         validated_data = DomainSchema(**data)
     except ValidationError as e:
         raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
-    # --- 3. Prompt Construction ---
     chat_messages = create_prompt(validated_data)
-    # --- 4. Text Generation ---
     try:
-        generated_description = await hf_service.generate_text(
-            chat_template_messages=chat_messages,
             max_new_tokens=150,
             temperature=0.75,
             top_p=0.9,
         )
     except Exception as e:
-        print(f"Unexpected error during text generation: {e}")
-        raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
-    # --- 5. MCP Guardrails & Post-processing removed ---
-    # if not guardrails.check_compliance(generated_description, mcp_rules.get("guardrails", {})):
-    #     raise HTTPException(status_code=400, detail="Generated description failed compliance checks.")
-    # final_description = postprocessor.format_output(generated_description, mcp_rules.get("postprocessor", {}))
-    final_description = generated_description # No post-processing here
     generation_time = time.time() - start_time
     user_email = user['email'] if user else "anonymous"
     return EnhancedDescriptionResponse(
-        description=final_description,
-        model_used="speakleash/Bielik-1.5B-v3.0-Instruct",
         generation_time=round(generation_time, 2),
         user_email=user_email
     )
-@app.post("/generate")
-async def generate_text_only(
-    chat_template_messages: str = Body(..., embed=True),
-    max_new_tokens: int = 150,
-    temperature: float = 0.75,
-    top_p: float = 0.9
 ):
     """
-    Generates raw text based on provided chat template messages.
-    This endpoint is intended for internal use by the MCP service.
     """
     try:
-        generated_text = await hf_service.generate_text(
-            chat_template_messages=chat_template_messages,
-            max_new_tokens=max_new_tokens,
-            temperature=temperature,
-            top_p=top_p,
-        )
-        return {"generated_text": generated_text}
-    except Exception as e:
-        print(f"Unexpected error during raw text generation: {e}")
-        raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
 @app.get("/user/me")
 async def get_user_info(user: dict = Depends(get_authenticated_user)):

 import os
 import time
+import asyncio
 import importlib
 from fastapi import FastAPI, HTTPException, Depends, Body
+from typing import Optional, List
 from pydantic import ValidationError
+from app.models.registry import registry, MODEL_CONFIG
 from fastapi.middleware.cors import CORSMiddleware
+from app.schemas.schemas import (
+    EnhancedDescriptionResponse,
+    CompareRequest,
+    CompareResponse,
+    ModelResult,
+    ModelInfo,
+)
 from app.auth.placeholder_auth import get_authenticated_user
 app = FastAPI(
+    title="Multi-Model Description Enhancer",
+    description="AI-powered service for enhancing descriptions using multiple LLMs for A/B testing",
+    version="3.0.0"
 )
 # CORS configuration
     allow_headers=["*"],
 )
 @app.on_event("startup")
 async def startup_event():
+    """Initialize local models at startup."""
+    print("Starting up and initializing local models...")
     try:
+        await registry.initialize_local_models()
+        print("Local models initialized successfully.")
     except Exception as e:
+        print(f"Error during model initialization: {e}")
         raise
 # --- Helper function to load domain logic ---
 @app.get("/")
 async def read_root():
+    return {"message": "Welcome to the Multi-Model Description Enhancer API! Go to /docs for documentation."}
 @app.get("/health")
 async def health_check():
+    """Check API health and model status."""
+    models = registry.list_models()
+    local_initialized = any(m["initialized"] for m in models if m["type"] == "local")
     return {
         "status": "ok",
+        "local_models_initialized": local_initialized,
+        "available_models": len(models),
     }
+@app.get("/models", response_model=List[ModelInfo])
+async def list_models():
+    """List all available models."""
+    return registry.list_models()
 @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
 async def enhance_description(
     domain: str = Body(..., embed=True),
     data: dict = Body(..., embed=True),
+    model: str = Body("bielik-1.5b", embed=True),
     user: Optional[dict] = Depends(get_authenticated_user)
 ):
     """
+    Generate an enhanced description using a single model.
     - **domain**: The name of the domain (e.g., 'cars').
     - **data**: A dictionary with the data for the description.
+    - **model**: Model to use (default: bielik-1.5b)
     """
     start_time = time.time()
+    # Validate model
+    if model not in registry.get_available_model_names():
+        raise HTTPException(status_code=400, detail=f"Unknown model: {model}")
+    # Load Domain Configuration
     domain_config = get_domain_config(domain)
     DomainSchema = domain_config["schema"]
     create_prompt = domain_config["create_prompt"]
+    # Validate Input Data
     try:
         validated_data = DomainSchema(**data)
     except ValidationError as e:
         raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
+    # Prompt Construction
     chat_messages = create_prompt(validated_data)
+    # Text Generation
     try:
+        llm = await registry.get_model(model)
+        generated_description = await llm.generate(
+            chat_messages=chat_messages,
             max_new_tokens=150,
             temperature=0.75,
             top_p=0.9,
         )
     except Exception as e:
+        print(f"Error during text generation with {model}: {e}")
+        raise HTTPException(status_code=500, detail=f"Generation error: {str(e)}")
     generation_time = time.time() - start_time
     user_email = user['email'] if user else "anonymous"
     return EnhancedDescriptionResponse(
+        description=generated_description,
+        model_used=MODEL_CONFIG[model]["id"],
         generation_time=round(generation_time, 2),
         user_email=user_email
     )
+@app.post("/compare", response_model=CompareResponse)
+async def compare_models(
+    request: CompareRequest,
+    user: Optional[dict] = Depends(get_authenticated_user)
 ):
     """
+    Compare outputs from multiple models for the same input.
+    Returns results from all specified models (or all available if not specified).
     """
+    total_start = time.time()
+    # Get models to compare
+    available_models = registry.get_available_model_names()
+    models_to_use = request.models if request.models else available_models
+    # Validate requested models
+    for model in models_to_use:
+        if model not in available_models:
+            raise HTTPException(status_code=400, detail=f"Unknown model: {model}")
+    # Load Domain Configuration
+    domain_config = get_domain_config(request.domain)
+    DomainSchema = domain_config["schema"]
+    create_prompt = domain_config["create_prompt"]
+    # Validate Input Data
     try:
+        validated_data = DomainSchema(**request.data)
+    except ValidationError as e:
+        raise HTTPException(status_code=422, detail=f"Invalid data: {e}")
+    # Prompt Construction
+    chat_messages = create_prompt(validated_data)
+    # Generate with each model
+    results = []
+    async def generate_with_model(model_name: str) -> ModelResult:
+        start_time = time.time()
+        try:
+            llm = await registry.get_model(model_name)
+            output = await llm.generate(
+                chat_messages=chat_messages,
+                max_new_tokens=150,
+                temperature=0.75,
+                top_p=0.9,
+            )
+            return ModelResult(
+                model=model_name,
+                output=output,
+                time=round(time.time() - start_time, 2),
+                type=MODEL_CONFIG[model_name]["type"],
+                error=None
+            )
+        except Exception as e:
+            return ModelResult(
+                model=model_name,
+                output="",
+                time=round(time.time() - start_time, 2),
+                type=MODEL_CONFIG[model_name]["type"],
+                error=str(e)
+            )
+    # Run all models (sequentially to avoid memory issues)
+    for model_name in models_to_use:
+        result = await generate_with_model(model_name)
+        results.append(result)
+    return CompareResponse(
+        domain=request.domain,
+        results=results,
+        total_time=round(time.time() - total_start, 2)
+    )
 @app.get("/user/me")
 async def get_user_info(user: dict = Depends(get_authenticated_user)):

app/models/__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+"""
+Models module - LLM implementations and registry.
+"""
+from app.models.base_llm import BaseLLM
+from app.models.huggingface_local import HuggingFaceLocal
+from app.models.huggingface_inference_api import HuggingFaceInferenceAPI
+from app.models.registry import registry, MODEL_CONFIG
+__all__ = [
+    "BaseLLM",
+    "HuggingFaceLocal",
+    "HuggingFaceInferenceAPI",
+    "registry",
+    "MODEL_CONFIG",
+]

app/models/base_llm.py ADDED Viewed

	@@ -0,0 +1,54 @@

+"""
+Abstract base class for all LLM implementations.
+"""
+from abc import ABC, abstractmethod
+from typing import Optional, List, Dict, Any
+class BaseLLM(ABC):
+    """Abstract interface for LLM models."""
+    def __init__(self, name: str, model_id: str):
+        self.name = name
+        self.model_id = model_id
+        self._initialized = False
+    @property
+    def is_initialized(self) -> bool:
+        return self._initialized
+    @abstractmethod
+    async def initialize(self) -> None:
+        """Initialize the model. Must be called before generate()."""
+        pass
+    @abstractmethod
+    async def generate(
+        self,
+        prompt: str = None,
+        chat_messages: List[Dict[str, str]] = None,
+        max_new_tokens: int = 150,
+        temperature: float = 0.7,
+        top_p: float = 0.9,
+        **kwargs
+    ) -> str:
+        """
+        Generate text from prompt or chat messages.
+        Args:
+            prompt: Raw text prompt
+            chat_messages: List of {"role": "...", "content": "..."} messages
+            max_new_tokens: Maximum tokens to generate
+            temperature: Sampling temperature
+            top_p: Nucleus sampling parameter
+        Returns:
+            Generated text string
+        """
+        pass
+    @abstractmethod
+    def get_info(self) -> Dict[str, Any]:
+        """Return model information for /models endpoint."""
+        pass

app/models/huggingface_inference_api.py ADDED Viewed

	@@ -0,0 +1,93 @@

+"""
+HuggingFace Inference API client for remote model access.
+"""
+import os
+from typing import List, Dict, Any, Optional
+from huggingface_hub import InferenceClient
+from app.models.base_llm import BaseLLM
+class HuggingFaceInferenceAPI(BaseLLM):
+    """
+    Remote model access via HuggingFace Inference API.
+    Best for larger models (7B+) that don't fit in local RAM.
+    """
+    def __init__(self, name: str, model_id: str, token: str = None):
+        super().__init__(name, model_id)
+        self.token = token or os.getenv("HF_TOKEN")
+        self.client: Optional[InferenceClient] = None
+    async def initialize(self) -> None:
+        """Initialize the Inference API client."""
+        if self._initialized:
+            return
+        try:
+            print(f"[{self.name}] Initializing Inference API for: {self.model_id}")
+            self.client = InferenceClient(
+                model=self.model_id,
+                token=self.token
+            )
+            self._initialized = True
+            print(f"[{self.name}] Inference API ready")
+        except Exception as e:
+            print(f"[{self.name}] Failed to initialize: {e}")
+            raise
+    async def generate(
+        self,
+        prompt: str = None,
+        chat_messages: List[Dict[str, str]] = None,
+        max_new_tokens: int = 150,
+        temperature: float = 0.7,
+        top_p: float = 0.9,
+        **kwargs
+    ) -> str:
+        """Generate text using HuggingFace Inference API."""
+        if not self._initialized or not self.client:
+            raise RuntimeError(f"[{self.name}] Client not initialized")
+        try:
+            # Use chat completion if chat_messages provided
+            if chat_messages:
+                response = self.client.chat_completion(
+                    messages=chat_messages,
+                    max_tokens=max_new_tokens,
+                    temperature=temperature,
+                    top_p=top_p,
+                )
+                return response.choices[0].message.content.strip()
+            # Otherwise use text generation
+            elif prompt:
+                response = self.client.text_generation(
+                    prompt=prompt,
+                    max_new_tokens=max_new_tokens,
+                    temperature=temperature,
+                    top_p=top_p,
+                    do_sample=True,
+                )
+                return response.strip()
+            else:
+                raise ValueError("Either prompt or chat_messages required")
+        except Exception as e:
+            print(f"[{self.name}] Generation error: {e}")
+            raise
+    def get_info(self) -> Dict[str, Any]:
+        """Return model info."""
+        return {
+            "name": self.name,
+            "model_id": self.model_id,
+            "type": "inference_api",
+            "initialized": self._initialized,
+        }

app/models/huggingface_local.py ADDED Viewed

	@@ -0,0 +1,133 @@

+"""
+Local HuggingFace model implementation using transformers pipeline.
+"""
+from typing import List, Dict, Any, Optional
+from transformers import pipeline, AutoTokenizer
+import torch
+import asyncio
+from app.models.base_llm import BaseLLM
+class HuggingFaceLocal(BaseLLM):
+    """
+    Local HuggingFace model loaded into container memory.
+    Best for smaller models (< 3B parameters) that fit in RAM.
+    """
+    def __init__(self, name: str, model_id: str, device: str = "cpu"):
+        super().__init__(name, model_id)
+        self.device = device
+        self.pipeline = None
+        self.tokenizer = None
+        # Determine device index
+        if device == "cuda" and torch.cuda.is_available():
+            self.device_index = 0
+        else:
+            self.device_index = -1  # CPU
+    async def initialize(self) -> None:
+        """Load model into memory."""
+        if self._initialized:
+            return
+        try:
+            print(f"[{self.name}] Loading local model: {self.model_id}")
+            self.tokenizer = await asyncio.to_thread(
+                AutoTokenizer.from_pretrained,
+                self.model_id,
+                trust_remote_code=True
+            )
+            self.pipeline = await asyncio.to_thread(
+                pipeline,
+                "text-generation",
+                model=self.model_id,
+                tokenizer=self.tokenizer,
+                device=self.device_index,
+                torch_dtype=torch.float32,
+                trust_remote_code=True,
+            )
+            self._initialized = True
+            print(f"[{self.name}] Model loaded successfully")
+        except Exception as e:
+            print(f"[{self.name}] Failed to load model: {e}")
+            raise
+    async def generate(
+        self,
+        prompt: str = None,
+        chat_messages: List[Dict[str, str]] = None,
+        max_new_tokens: int = 150,
+        temperature: float = 0.7,
+        top_p: float = 0.9,
+        **kwargs
+    ) -> str:
+        """Generate text using local pipeline."""
+        if not self._initialized:
+            raise RuntimeError(f"[{self.name}] Model not initialized")
+        # Format prompt
+        if chat_messages:
+            try:
+                formatted_prompt = self.tokenizer.apply_chat_template(
+                    chat_messages,
+                    tokenize=False,
+                    add_generation_prompt=True
+                )
+            except Exception:
+                # Fallback to raw prompt
+                if prompt:
+                    formatted_prompt = prompt
+                else:
+                    raise ValueError("Cannot generate without valid prompt")
+        elif prompt:
+            formatted_prompt = prompt
+        else:
+            raise ValueError("Either prompt or chat_messages required")
+        # Generate
+        outputs = await asyncio.to_thread(
+            self.pipeline,
+            formatted_prompt,
+            max_new_tokens=max_new_tokens,
+            do_sample=True,
+            temperature=temperature,
+            top_p=top_p,
+            eos_token_id=self.tokenizer.eos_token_id,
+            pad_token_id=self.tokenizer.eos_token_id if self.tokenizer.pad_token_id is None else self.tokenizer.pad_token_id,
+        )
+        # Extract response
+        if outputs and isinstance(outputs, list) and "generated_text" in outputs[0]:
+            full_text = outputs[0]["generated_text"]
+            # Remove prompt from output
+            if full_text.startswith(formatted_prompt):
+                response = full_text[len(formatted_prompt):]
+            else:
+                response = full_text
+            # Clean up special tokens
+            if response.endswith("<|im_end|>"):
+                response = response[:-len("<|im_end|>")]
+            return response.strip()
+        return ""
+    def get_info(self) -> Dict[str, Any]:
+        """Return model info."""
+        return {
+            "name": self.name,
+            "model_id": self.model_id,
+            "type": "local",
+            "initialized": self._initialized,
+            "device": self.device
+        }

app/models/registry.py ADDED Viewed

	@@ -0,0 +1,124 @@

+"""
+Model Registry - Central configuration and factory for all LLM models.
+"""
+import os
+from typing import Dict, List, Any, Optional
+from app.models.base_llm import BaseLLM
+from app.models.huggingface_local import HuggingFaceLocal
+from app.models.huggingface_inference_api import HuggingFaceInferenceAPI
+# Model configuration
+MODEL_CONFIG = {
+    "bielik-1.5b": {
+        "id": "speakleash/Bielik-1.5B-v3.0-Instruct",
+        "type": "local",
+        "polish_support": "excellent",
+        "size": "1.5B",
+    },
+    "pllum-12b": {
+        "id": "CYFRAGOVPL/PLLuM-12B-instruct",
+        "type": "inference_api",
+        "polish_support": "excellent",
+        "size": "12B",
+    },
+    "mistral-small-3": {
+        "id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
+        "type": "inference_api",
+        "polish_support": "good",
+        "size": "24B",
+    },
+    "gemma-2-9b": {
+        "id": "google/gemma-2-9b-it",
+        "type": "inference_api",
+        "polish_support": "medium",
+        "size": "9B",
+    },
+}
+# For local model override (when model is pre-downloaded in container)
+LOCAL_MODEL_PATH = os.getenv("LOCAL_MODEL_PATH", "/app/pretrain_model")
+class ModelRegistry:
+    """
+    Central registry for managing all LLM models.
+    Handles model instantiation, initialization, and access.
+    """
+    def __init__(self):
+        self._models: Dict[str, BaseLLM] = {}
+        self._config = MODEL_CONFIG.copy()
+    def _create_model(self, name: str) -> BaseLLM:
+        """Factory method to create model instance."""
+        if name not in self._config:
+            raise ValueError(f"Unknown model: {name}")
+        config = self._config[name]
+        model_type = config["type"]
+        model_id = config["id"]
+        # Special case: local Bielik uses pre-downloaded path
+        if name == "bielik-1.5b" and os.path.exists(LOCAL_MODEL_PATH):
+            model_id = LOCAL_MODEL_PATH
+        if model_type == "local":
+            return HuggingFaceLocal(
+                name=name,
+                model_id=model_id,
+                device="cpu"
+            )
+        elif model_type == "inference_api":
+            return HuggingFaceInferenceAPI(
+                name=name,
+                model_id=model_id
+            )
+        else:
+            raise ValueError(f"Unknown model type: {model_type}")
+    async def get_model(self, name: str) -> BaseLLM:
+        """Get or create and initialize a model."""
+        if name not in self._models:
+            model = self._create_model(name)
+            await model.initialize()
+            self._models[name] = model
+        return self._models[name]
+    async def initialize_model(self, name: str) -> None:
+        """Pre-initialize a specific model."""
+        await self.get_model(name)
+    async def initialize_local_models(self) -> None:
+        """Initialize all local models at startup."""
+        for name, config in self._config.items():
+            if config["type"] == "local":
+                await self.initialize_model(name)
+    def list_models(self) -> List[Dict[str, Any]]:
+        """List all available models with their info."""
+        models = []
+        for name, config in self._config.items():
+            model_info = {
+                "name": name,
+                "model_id": config["id"],
+                "type": config["type"],
+                "polish_support": config["polish_support"],
+                "size": config["size"],
+                "initialized": name in self._models and self._models[name].is_initialized,
+            }
+            models.append(model_info)
+        return models
+    def get_available_model_names(self) -> List[str]:
+        """Get list of available model names."""
+        return list(self._config.keys())
+# Global registry instance
+registry = ModelRegistry()

app/schemas/schemas.py CHANGED Viewed

@@ -1,4 +1,6 @@
 from pydantic import BaseModel
 class EnhancedDescriptionResponse(BaseModel):
     description: str
@@ -6,3 +8,32 @@ class EnhancedDescriptionResponse(BaseModel):
     generation_time: float
     user_email: str

 from pydantic import BaseModel
+from typing import List, Optional, Dict, Any
 class EnhancedDescriptionResponse(BaseModel):
     description: str
     generation_time: float
     user_email: str
+class ModelInfo(BaseModel):
+    name: str
+    model_id: str
+    type: str
+    polish_support: str
+    size: str
+    initialized: bool
+class CompareRequest(BaseModel):
+    domain: str
+    data: Dict[str, Any]
+    models: Optional[List[str]] = None  # If None, use all models
+class ModelResult(BaseModel):
+    model: str
+    output: str
+    time: float
+    type: str
+    error: Optional[str] = None
+class CompareResponse(BaseModel):
+    domain: str
+    results: List[ModelResult]
+    total_time: float

llm_app_rework.md ADDED Viewed

	@@ -0,0 +1,141 @@

+# LLM App Rework Plan
+## Goal
+Transform single-model app → multi-model comparison platform for A/B testing open-source LLMs on car descriptions.
+---
+## Current State
+- Single model: Bielik-1.5B (local HuggingFace)
+- Single domain: cars
+- No comparison capability
+## Target State
+- Multiple open-source LLMs via HuggingFace
+- Same prompt → multiple outputs → compare results
+- Support compression/decompression testing
+---
+## Architecture Changes
+### 1. Model Registry
+```
+app/models/
+├── registry.py          # Model registry + factory
+├── base_llm.py          # Abstract base class
+├── huggingface_local.py # Refactored current service
+```
+### 2. Base LLM Interface
+```python
+class BaseLLM(ABC):
+    name: str
+    model_id: str
+    async def generate(prompt, **params) -> str
+    async def initialize() -> None
+    def is_initialized() -> bool
+```
+### 3. Model Registry
+```python
+MODELS = {
+    "bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
+    "pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
+    "mistral-small-3": {"id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503", "type": "inference_api"},
+    "gemma-2-9b": {"id": "google/gemma-2-9b-it", "type": "inference_api"},
+}
+```
+### 4. Two Model Types
+| Type | Description | Use Case |
+|------|-------------|----------|
+| `local` | Loaded in container memory | Bielik-1.5B (small, fits in RAM) |
+| `inference_api` | HuggingFace Inference API | Larger models (7B+) via API |
+### 5. New Endpoints
+| Endpoint | Purpose |
+|----------|---------|
+| `POST /enhance` | Single model (existing) |
+| `POST /compare` | Multiple models, return all outputs |
+| `GET /models` | List available models |
+### 6. Compare Request/Response
+```python
+# Request
+{
+    "domain": "cars",
+    "data": {...},
+    "models": ["bielik-1.5b", "pllum-12b", "mistral-small-3", "gemma-2-9b"]
+}
+# Response
+{
+    "results": [
+        {"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
+        {"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"},
+        {"model": "mistral-small-3", "output": "...", "time": 0.9, "type": "inference_api"},
+        {"model": "gemma-2-9b", "output": "...", "time": 1.0, "type": "inference_api"}
+    ]
+}
+```
+---
+## Implementation Steps
+1. **Create base_llm.py** - abstract interface
+2. **Create huggingface_inference_api.py** - HF Inference API client
+3. **Refactor huggingface_service.py** → HuggingFaceLocal (implements BaseLLM)
+4. **Create registry.py** - model factory + config
+5. **Add /compare endpoint** in main.py
+6. **Add /models endpoint** - list available
+7. **Update schemas** - CompareRequest, CompareResponse
+---
+## HuggingFace Inference API
+```python
+from huggingface_hub import InferenceClient
+client = InferenceClient(token=HF_TOKEN)
+response = client.text_generation(
+    model="mistralai/Mistral-7B-Instruct-v0.3",
+    prompt=formatted_prompt,
+    max_new_tokens=150
+)
+```
+---
+## Env Vars (HuggingFace Secrets)
+```
+HF_TOKEN=hf_...  # For Inference API access
+```
+---
+## Models (Approved)
+| Model | Size | Polish Support | HuggingFace ID |
+|-------|------|----------------|----------------|
+| Bielik-1.5B | 1.5B | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
+| PLLuM-12B | 12B | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
+| Mistral-Small-3 | 24B | Good | mistralai/Mistral-Small-3.1-24B-Instruct-2503 |
+| Gemma-2-9B | 9B | Medium | google/gemma-2-9b-it |
+---
+## Priority
+1. HuggingFace Inference API integration
+2. /compare endpoint
+3. /models endpoint
+---
+## Notes
+- All models = open source via HuggingFace
+- Local model = Bielik-1.5B (already works)
+- Larger models = HF Inference API (no local GPU needed)
+- HF_TOKEN needed for gated models (Gemma, etc)

requirements.txt CHANGED Viewed

@@ -1,4 +1,5 @@
 fastapi
 uvicorn[standard]
 transformers[torch]
-accelerate

 fastapi
 uvicorn[standard]
 transformers[torch]
+accelerate
+huggingface_hub