Patryk Studzinski commited on
Commit
a7fd202
Β·
1 Parent(s): a1c0774

first-imrpvement-commit

Browse files
PROJECT_CONTEXT.md DELETED
@@ -1,107 +0,0 @@
1
- # GPT4All Service - Project Context
2
-
3
- ## Project Overview
4
- This is a **Polish Car Description Enhancement Service** built as a FastAPI microservice that uses a Hugging Face Large Language Model to generate enhanced marketing descriptions for cars in Polish language.
5
-
6
- ## Core Functionality
7
- The service takes basic car information (make, model, year, mileage, features, condition) and generates compelling, marketing-friendly descriptions in Polish using the `speakleash/Bielik-1.5B-v3.0-Instruct` model - a Polish language model from the Bielik series.
8
-
9
- ## Project Structure
10
-
11
- ```
12
- gpt4all-service/
13
- β”œβ”€β”€ app/
14
- β”‚ β”œβ”€β”€ main.py # FastAPI application with endpoints
15
- β”‚ β”œβ”€β”€ models/
16
- β”‚ β”‚ └── huggingface_service.py # Core LLM service wrapper
17
- β”‚ └── schemas/
18
- β”‚ └── schemas.py # Pydantic data models
19
- β”œβ”€β”€ Dockerfile # Multi-stage Docker build
20
- β”œβ”€β”€ download_model.py # Model download script for Docker
21
- β”œβ”€β”€ requirements.txt # Python dependencies
22
- β”œβ”€β”€ start_container.ps1 # PowerShell startup script
23
- β”œβ”€β”€ start_container.sh # Bash startup script
24
- └── README.md # Comprehensive documentation
25
- ```
26
-
27
- ## Technical Architecture
28
-
29
- ### 1. FastAPI Application (`app/main.py`)
30
- - **Framework**: FastAPI with CORS middleware
31
- - **Main Endpoint**: `POST /enhance-description` - takes car data, returns enhanced description
32
- - **Health Check**: `GET /health` - service status and model initialization check
33
- - **CORS**: Configured for frontend on `http://localhost:5173` (likely React/Vue dev server)
34
-
35
- ### 2. LLM Service (`app/models/huggingface_service.py`)
36
- - **Purpose**: Wrapper around Hugging Face Transformers pipeline
37
- - **Model**: `speakleash/Bielik-1.5B-v3.0-Instruct` (Polish language model)
38
- - **Features**:
39
- - Async initialization and text generation
40
- - Support for both GPU (CUDA) and CPU inference
41
- - Chat template support for conversation-style prompts
42
- - Configurable generation parameters (temperature, top_p, max_tokens)
43
- - Smart response parsing to extract only the assistant's response
44
-
45
- ### 3. Data Models (`app/schemas/schemas.py`)
46
- - **CarData**: Input model with make, model, year, mileage, features[], condition
47
- - **EnhancedDescriptionResponse**: Output model with generated description
48
-
49
- ### 4. Containerization
50
- - **Docker**: Self-contained image with pre-downloaded model (~3.2GB)
51
- - **Security**: Uses Docker BuildKit secrets for Hugging Face token handling
52
- - **Model Storage**: Downloaded to `/app/pretrain_model` during build
53
- - **Runtime**: Python 3.9-slim base image
54
-
55
- ## Key Technical Details
56
-
57
- ### Model Configuration
58
- - **Model Path**: `/app/pretrain_model` (in container) or configurable for local dev
59
- - **Device**: Currently set to CPU in main.py, but service supports GPU
60
- - **Generation Params**: 150 max tokens, temperature 0.75, top_p 0.9
61
-
62
- ### Prompt Engineering
63
- The service uses a carefully crafted Polish system prompt:
64
- - Instructs the model to create marketing descriptions in Polish
65
- - Limits output to 500 characters maximum
66
- - Tells the model to ignore off-topic content
67
- - Uses chat template format with system/user roles
68
-
69
- ### Dependencies
70
- - **fastapi**: Web framework
71
- - **uvicorn[standard]**: ASGI server
72
- - **transformers[torch]**: Hugging Face transformers with PyTorch
73
- - **accelerate**: Hugging Face optimization library
74
-
75
- ## Current State & Issues
76
-
77
- ### Git Status
78
- - Modified `app/main.py` (likely recent changes)
79
- - Deleted `app/models/gpt4all.py` (indicates migration from GPT4All to Hugging Face)
80
-
81
- ### Linter Issues in `huggingface_service.py`
82
- 1. Import issues: `pipeline` and `AutoTokenizer` imports need specific paths
83
- 2. Type annotations: `device: str = None` should be `Optional[str] = None`
84
- 3. Method parameters: Similar optional parameter typing issues
85
-
86
- ## Usage Scenarios
87
- 1. **Car Dealership Websites**: Auto-generate compelling descriptions from basic car specs
88
- 2. **Marketplace Applications**: Enhance user-provided car listings
89
- 3. **Inventory Management**: Bulk description generation for car databases
90
-
91
- ## Deployment Options
92
- 1. **Local Development**: Direct Python/uvicorn execution
93
- 2. **Docker Container**: Self-contained deployment with pre-downloaded model
94
- 3. **Production**: Containerized deployment with proper authentication
95
-
96
- ## Authentication Requirements
97
- - Hugging Face Hub token required for model download (gated model)
98
- - Token stored in `my_hf_token.txt` during Docker build
99
- - Securely handled via Docker BuildKit secrets
100
-
101
- ## Performance Considerations
102
- - Model size: ~3.2GB (significant memory footprint)
103
- - CPU inference: Slower but more accessible
104
- - GPU inference: Faster but requires CUDA setup
105
- - Async design: Non-blocking text generation
106
-
107
- This service represents a specialized AI application for the Polish automotive market, focusing on generating marketing content using state-of-the-art Polish language models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,314 +1,179 @@
1
- ---
2
- title: AI Car Description Enhancer
3
- emoji: πŸš—βœ¨
4
- colorFrom: red
5
- colorTo: yellow
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- <div align="center">
11
-
12
- # AI Car Description Enhancer "Bielik"
13
 
14
- **Turbocharge your automotive listings! This app, powered by the Bielik Polish language model, transforms dry vehicle data into compelling, ready-to-publish marketing descriptions.**
15
 
16
- </div>
17
 
18
- ---
19
-
20
- ## Contents
21
-
22
- 1. [Features](#features)
23
- 2. [Prerequisites](#prerequisites)
24
- 3. [Project Structure](#project-structure)
25
- 4. [Installation (Local Development)](#installation-local-development)
26
- 5. [Usage (Local Development)](#usage-local-development)
27
- 6. [Docker Usage](#docker-usage)
28
- 7. [Quick Start with PowerShell (`start_container.ps1`)](#quick-start-with-powershell-start_containerps1)
29
- 8. [API Endpoints](#api-endpoints)
30
- - [Health Check](#health-check)
31
- - [Enhance Description](#enhance-description)
32
- 9. [Core Service (`app/models/huggingface_service.py`)](#core-service-appmodelshuggingface_servicepy)
33
- 10. [Configuration](#configuration)
34
- 11. [Schemas (`app/schemas/schemas.py`)](#schemas-appschemasschemaspy)
35
- - [CarData](#cardata)
36
- - [EnhancedDescriptionResponse](#enhanceddescriptionresponse)
37
- 12. [Contributing](#contributing)
38
- 13. [License](#license)
39
 
40
- ---
41
 
42
- # LLM Car Description Enhancer (Polish)
 
 
 
 
 
43
 
44
- This repository contains a FastAPI application that utilizes a Hugging Face Transformers Large Language Model (specifically, `speakleash/Bielik-1.5B-v3.0-Instruct` or a similar model from the Bielik series) to generate enhanced marketing descriptions for cars, primarily in Polish.
45
 
46
- The application is designed to be run locally for development or containerized using Docker for deployment. The LLM is baked into the Docker image for self-contained and efficient execution, which may require Hugging Face Hub authentication during the build process if the model is gated.
47
 
48
- ## Features
 
 
 
 
49
 
50
- - Generate enhanced marketing descriptions for cars in Polish.
51
- - Utilizes the `speakleash/Bielik-1.5B-v3.0-Instruct` model via the Hugging Face `transformers` library.
52
- - Health check endpoint.
53
- - Docker support for easy deployment, with the model included in the image.
54
- - Includes a `start_container.sh` script for convenient container startup.
55
 
56
- ## Prerequisites
 
 
 
57
 
58
- - Python 3.9 or higher
59
- - `pip` (Python package installer)
60
- - Docker (for containerized deployment, Docker BuildKit enabled recommended for secrets)
61
- - Git (for cloning the repository)
62
- - A Hugging Face Hub account and an access token (with `read` permissions) if the chosen model is gated (see Docker Usage section).
63
- - For using `start_container.sh`: A bash-compatible shell (like those on Linux, macOS, or Git Bash on Windows).
64
 
65
- ## Project Structure
66
 
67
- A typical layout for this project would be:
68
 
69
- ```text
70
- .
71
- β”œβ”€β”€ app/
72
- β”‚ β”œβ”€β”€ __init__.py
73
- β”‚ β”œβ”€β”€ main.py # FastAPI application, endpoints
74
- β”‚ β”œβ”€β”€ models/
75
- β”‚ β”‚ β”œβ”€β”€ __init__.py
76
- β”‚ β”‚ └── huggingface_service.py # Service for interacting with the LLM
77
- β”‚ └── schemas/
78
- β”‚ β”œβ”€β”€ __init__.py
79
- β”‚ └── schemas.py # Pydantic schemas for request/response
80
- β”œβ”€β”€ .gitignore
81
- β”œβ”€β”€ Dockerfile
82
- β”œβ”€β”€ download_model.py # Script to download model during Docker build
83
- β”œβ”€β”€ my_hf_token.txt # (Should be created locally) For storing HF token
84
- β”œβ”€β”€ requirements.txt
85
- β”œβ”€β”€ start_container.sh # Helper script to run the Docker container
86
- └── README.md
87
 
 
 
 
 
 
 
 
88
  ```
89
 
90
- ## Installation (Local Development)
91
-
92
- 1. **Clone the repository:**
93
- ```bash
94
- git clone [https://github.com/studzin-sky/llm-description-enhancer.git](https://github.com/studzin-sky/llm-description-enhancer.git)
95
- cd llm-description-enhancer
96
- ```
97
-
98
- 2. **Create and activate a virtual environment:**
99
- (Recommended to keep dependencies isolated)
100
- ```bash
101
- python -m venv venv
102
- ```
103
- * On macOS/Linux:
104
- ```bash
105
- source venv/bin/activate
106
- ```
107
- * On Windows (PowerShell):
108
- ```bash
109
- .\venv\Scripts\Activate.ps1
110
- ```
111
- * On Windows (Command Prompt):
112
- ```bash
113
- venv\Scripts\activate.bat
114
- ```
115
-
116
- 3. **Install the required dependencies:**
117
- Ensure your `requirements.txt` includes `fastapi`, `uvicorn[standard]`, `transformers[torch]`, `torch`, `accelerate`, and `huggingface_hub`.
118
- ```bash
119
- pip install -r requirements.txt
120
- ```
121
- *Note: The first time you run the application locally (or if the model cache is empty), the Hugging Face model (~3.2GB) will be downloaded. This might take some time. **If the model (`speakleash/Bielik-1.5B-v3.0-Instruct` or the one configured) is gated or requires authentication, you may need to log in using `huggingface-cli login` in your terminal before running the application locally.** After logging in, your token will be cached by the `huggingface_hub` library.*
122
-
123
- ## Usage (Local Development)
124
-
125
- 1. **Start the FastAPI server:**
126
- From the project root directory:
127
- ```bash
128
- uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
129
- ```
130
- * `--reload` enables auto-reloading for development.
131
- * `--host 0.0.0.0` makes the server accessible on your network.
132
-
133
- 2. **Access the application:**
134
- * Health Check: [http://127.0.0.1:8000/health](http://127.0.0.1:8000/health)
135
- * API Documentation (Swagger UI): [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
136
- * Enhance Description: `POST` requests to [http://127.0.0.1:8000/enhance-description](http://127.0.0.1:8000/enhance-description)
137
-
138
- ## Docker Usage
139
-
140
- The included `Dockerfile` builds an image with the application and the pre-downloaded Hugging Face model, making it self-contained. Downloading gated models during the build process requires a Hugging Face Hub token.
141
-
142
- 1. **Prepare Hugging Face Hub Token (for Gated Models):**
143
- The `speakleash/Bielik-1.5B-v3.0-Instruct` model may require authentication to download.
144
- * **Get a Token:**
145
- 1. Go to your Hugging Face account settings: [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
146
- 2. Create a new token (e.g., named "docker-bielik-access") with `read` permissions.
147
- 3. Copy the generated token (it will start with `hf_`).
148
- * **Create Token File:**
149
- 1. In your project's root directory (next to your `Dockerfile`), create a file named `my_hf_token.txt`.
150
- 2. Paste **only the token string** (e.g., `hf_YourActualTokenValueHere`) into this file. Do not add any other text or variable names.
151
-
152
- 2. **Build the Docker image:**
153
- From the project root directory, run:
154
- ```bash
155
- DOCKER_BUILDKIT=1 docker build --secret id=huggingface_token,src=my_hf_token.txt -t llm-description-enhancer .
156
- ```
157
- * `DOCKER_BUILDKIT=1`: Enables BuildKit, which is required for using `--secret`.
158
- * `--secret id=huggingface_token,src=my_hf_token.txt`: Securely provides the content of `my_hf_token.txt` to the build process. The `id=huggingface_token` must match the ID used in the `RUN --mount` directive in your `Dockerfile`.
159
- * *(This step will take a while, especially the first time, as it downloads the LLM using your token).*
160
-
161
- 3. **Run the Docker container using the Helper Script (`start_container.sh`):**
162
- A helper script `start_container.sh` is included in the repository to simplify starting the Docker container. This script typically handles stopping/removing any pre-existing container with the same configured name and then starts a new one.
163
-
164
- * **Ensure the script is executable:**
165
- After cloning the repository, or if the execute permission isn't set, you might need to make the script executable (on Linux, macOS, or Git Bash on Windows):
166
- ```bash
167
- chmod +x start_container.sh
168
- ```
169
-
170
- * **Run the script:**
171
- From the project root directory:
172
- ```bash
173
- ./start_container.sh
174
- ```
175
-
176
- * **Expected Outcome (depends on your script's content):**
177
- The script will likely:
178
- * Output messages indicating it's managing the container.
179
- * Start the container (possibly in detached mode).
180
- * Inform you that the service is available at `http://127.0.0.1:8000`.
181
- * Provide commands to view logs or stop the container if it's running in detached mode (e.g., `docker logs <container_name> -f` and `docker stop <container_name>`).
182
-
183
- *(Alternatively, you can run the container manually: `docker run --rm -p 8000:8000 llm-description-enhancer`)*
184
-
185
- 4. **Test the containerized application:**
186
- Once the container is running (via the script or manually), send requests to `http://127.0.0.1:8000` as described in the API Endpoints section.
187
-
188
- ## Quick Start with PowerShell (`start_container.ps1`)
189
-
190
- For Windows users, you can automate the Docker build and run process using the provided PowerShell script. This script will:
191
- - Build the Docker image using your Hugging Face token (from `my_hf_token.txt`)
192
- - Stop and remove any existing container named `bielik_app_instance`
193
- - Start a new container and map port 8000
194
-
195
- **Steps:**
196
-
197
- 1. Ensure your Hugging Face token is saved in `my_hf_token.txt` in the project root (see above for details).
198
- 2. Open PowerShell in the project directory.
199
- 3. (Optional, but recommended) Temporarily allow running unsigned scripts for this session:
200
- ```powershell
201
- Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process
202
- ```
203
- 4. Run the script:
204
- ```powershell
205
- .\start_container.ps1
206
- ```
207
-
208
- The script will build the image and start the container. Your FastAPI service will be available at [http://127.0.0.1:8000](http://127.0.0.1:8000).
209
-
210
- You can view logs with:
211
- ```powershell
212
- docker logs bielik_app_instance -f
213
- ```
214
- To stop the container:
215
- ```powershell
216
- docker stop bielik_app_instance
217
- ```
218
 
219
- If you encounter a security error about script signing, see the [Microsoft documentation on execution policies](https://go.microsoft.com/fwlink/?LinkID=135170).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
220
 
221
  ---
222
 
223
- ## API Endpoints
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
224
 
225
- ### Health Check
 
 
 
 
 
 
 
 
226
 
227
- - **Endpoint:** `/health`
228
- - **Method:** `GET`
229
- - **Description:** Returns the status of the application and model initialization.
230
- - **Example Response:**
231
- ```json
232
- {
233
- "status": "ok",
234
- "model_initialized": true,
235
- "model_path": "/app/pretrain_model"
236
- }
237
- ```
238
 
239
- ### Enhance Description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240
 
241
- - **Endpoint:** `/enhance-description`
242
- - **Method:** `POST`
243
- - **Description:** Generates an enhanced marketing description for a car in Polish.
244
- - **Request Body (`application/json`):**
245
- ```json
246
  {
247
- "make": "Volkswagen",
248
- "model": "Golf",
249
- "year": 2022,
250
- "mileage": 15000,
251
- "features": ["Klimatyzacja automatyczna", "System nawigacji", "Czujniki parkowania"],
252
- "condition": "Bardzo dobry"
253
- }
254
- ```
255
- - **Response (`application/json`):**
256
- ```json
257
  {
258
- "description": "Wygenerowany przez AI opis samochodu..."
 
 
 
 
259
  }
260
- ```
261
- - **Example cURL request (for Git Bash / bash-like shells):**
262
- ```bash
263
- curl -X POST "http://127.0.0.1:8000/enhance-description" \
264
- -H "Content-Type: application/json" \
265
- -d '{
266
- "make": "Toyota",
267
- "model": "Corolla",
268
- "year": 2021,
269
- "mileage": 25000,
270
- "features": ["Kamera cofania", "Apple CarPlay", "Android Auto", "System bezkluczykowy"],
271
- "condition": "Bardzo dobry"
272
- }'
273
- ```
274
-
275
- ## Core Service (`app/models/huggingface_service.py`)
276
-
277
- The `HuggingFaceTextGenerationService` class handles the interaction with the Large Language Model.
278
-
279
- - **Key Methods:**
280
- - `async initialize()`: Loads the pre-trained model and tokenizer from the path specified during service instantiation (e.g., `/app/pretrain_model` in Docker, or from Hugging Face cache locally).
281
- - `async generate_text(chat_template_messages: list, max_new_tokens: int, ...)`: Generates text based on a structured chat prompt, applying appropriate chat templates and parsing the model's output to return only the assistant's response.
282
-
283
- ## Configuration
284
-
285
- - **Model Used:** `speakleash/Bielik-1.5B-v3.0-Instruct`. This is baked into `/app/pretrain_model` in the Docker image. For local development, it's downloaded to the Hugging Face cache.
286
- - **Language:** The primary focus is on generating descriptions in **Polish**.
287
- - **Prompt Engineering:** The system and user prompts in `app/main.py` are crafted to guide the model towards generating concise and relevant marketing descriptions.
288
 
289
- ## Schemas (`app/schemas/schemas.py`)
290
 
291
- Pydantic models are used for request and response validation.
292
 
293
- ### `CarData`
 
 
 
 
294
 
295
- - **Fields:**
296
- - `make`: `str`
297
- - `model`: `str`
298
- - `year`: `int`
299
- - `mileage`: `int`
300
- - `features`: `list[str]`
301
- - `condition`: `str`
302
 
303
- ### `EnhancedDescriptionResponse`
 
 
304
 
305
- - **Fields:**
306
- - `description`: `str`
 
307
 
308
- ## Contributing
309
 
310
- Contributions are welcome! Please open an issue or submit a pull request for any changes.
 
 
 
311
 
312
- ## License
313
 
314
- This project is licensed under the MIT License.
 
1
+ # Bielik App Service
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ Multi-model LLM service for description enhancement and A/B testing.
4
 
5
+ ## Overview
6
 
7
+ This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports comparing outputs across different models to evaluate quality, speed, and Polish language support.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
+ ## Models
10
 
11
+ | Model | Size | Polish Support | Type |
12
+ |-------|------|----------------|------|
13
+ | Bielik-1.5B | 1.5B | Excellent | Local |
14
+ | PLLuM-12B | 12B | Excellent | API |
15
+ | Mistral-Small-3 | 24B | Good | API |
16
+ | Gemma-2-9B | 9B | Medium | API |
17
 
18
+ ## API Endpoints
19
 
20
+ ### Health & Info
21
 
22
+ | Method | Endpoint | Description |
23
+ |--------|----------|-------------|
24
+ | `GET` | `/` | Welcome message |
25
+ | `GET` | `/health` | API health check and model status |
26
+ | `GET` | `/models` | List all available models |
27
 
28
+ ### Generation
 
 
 
 
29
 
30
+ | Method | Endpoint | Description |
31
+ |--------|----------|-------------|
32
+ | `POST` | `/enhance-description` | Generate description with single model |
33
+ | `POST` | `/compare` | Compare outputs from multiple models |
34
 
35
+ ---
 
 
 
 
 
36
 
37
+ ## Endpoint Details
38
 
39
+ ### `GET /health`
40
 
41
+ Check API status and model initialization.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
+ **Response:**
44
+ ```json
45
+ {
46
+ "status": "ok",
47
+ "local_models_initialized": true,
48
+ "available_models": 4
49
+ }
50
  ```
51
 
52
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
+ ### `GET /models`
55
+
56
+ List all available models with their details.
57
+
58
+ **Response:**
59
+ ```json
60
+ [
61
+ {
62
+ "name": "bielik-1.5b",
63
+ "model_id": "speakleash/Bielik-1.5B-v3.0-Instruct",
64
+ "type": "local",
65
+ "polish_support": "excellent",
66
+ "size": "1.5B",
67
+ "initialized": true
68
+ }
69
+ ]
70
+ ```
71
 
72
  ---
73
 
74
+ ### `POST /enhance-description`
75
+
76
+ Generate enhanced description using a single model.
77
+
78
+ **Request:**
79
+ ```json
80
+ {
81
+ "domain": "cars",
82
+ "data": {
83
+ "make": "BMW",
84
+ "model": "320i",
85
+ "year": 2020,
86
+ "mileage": 45000,
87
+ "features": ["nawigacja", "klimatyzacja"],
88
+ "condition": "bardzo dobry"
89
+ },
90
+ "model": "bielik-1.5b"
91
+ }
92
+ ```
93
 
94
+ **Response:**
95
+ ```json
96
+ {
97
+ "description": "Generated description text...",
98
+ "model_used": "speakleash/Bielik-1.5B-v3.0-Instruct",
99
+ "generation_time": 2.34,
100
+ "user_email": "anonymous"
101
+ }
102
+ ```
103
 
104
+ ---
 
 
 
 
 
 
 
 
 
 
105
 
106
+ ### `POST /compare`
107
+
108
+ Compare outputs from multiple models for the same input.
109
+
110
+ **Request:**
111
+ ```json
112
+ {
113
+ "domain": "cars",
114
+ "data": {
115
+ "make": "BMW",
116
+ "model": "320i",
117
+ "year": 2020,
118
+ "mileage": 45000,
119
+ "features": ["nawigacja", "klimatyzacja"],
120
+ "condition": "bardzo dobry"
121
+ },
122
+ "models": ["bielik-1.5b", "pllum-12b", "gemma-2-9b"]
123
+ }
124
+ ```
125
 
126
+ **Response:**
127
+ ```json
128
+ {
129
+ "domain": "cars",
130
+ "results": [
131
  {
132
+ "model": "bielik-1.5b",
133
+ "output": "Generated text from Bielik...",
134
+ "time": 2.3,
135
+ "type": "local",
136
+ "error": null
137
+ },
 
 
 
 
138
  {
139
+ "model": "pllum-12b",
140
+ "output": "Generated text from PLLuM...",
141
+ "time": 1.1,
142
+ "type": "inference_api",
143
+ "error": null
144
  }
145
+ ],
146
+ "total_time": 5.67
147
+ }
148
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149
 
150
+ ---
151
 
152
+ ## Environment Variables
153
 
154
+ | Variable | Description | Required |
155
+ |----------|-------------|----------|
156
+ | `HF_TOKEN` | HuggingFace API token for Inference API | Yes (for API models) |
157
+ | `LOCAL_MODEL_PATH` | Path to pre-downloaded local model | No (default: `/app/pretrain_model`) |
158
+ | `FRONTEND_URL` | Frontend URL for CORS | No |
159
 
160
+ ## Running Locally
 
 
 
 
 
 
161
 
162
+ ```bash
163
+ # Install dependencies
164
+ pip install -r requirements.txt
165
 
166
+ # Run server
167
+ uvicorn app.main:app --reload --port 8000
168
+ ```
169
 
170
+ ## Docker
171
 
172
+ ```bash
173
+ # Build and run
174
+ ./start_container.ps1
175
+ ```
176
 
177
+ API available at `http://localhost:8000`
178
 
179
+ Docs at `http://localhost:8000/docs`
answer.md DELETED
@@ -1,213 +0,0 @@
1
- Great choice! Creating a separate service for GPT4All is a clean and scalable approach. Let’s break this down step by step, starting from setting up the project structure to deploying the service.
2
-
3
- ---
4
-
5
- ### **1. Project Structure**
6
- Here’s a suggested folder structure for your GPT4All service:
7
-
8
- ```
9
- gpt4all-service/
10
- β”œβ”€β”€ app/
11
- β”‚ β”œβ”€β”€ __init__.py
12
- β”‚ β”œβ”€β”€ main.py # FastAPI/Flask app entry point
13
- β”‚ β”œβ”€β”€ models/ # GPT4All model loading and inference logic
14
- β”‚ β”‚ └── gpt4all.py
15
- β”‚ └── schemas/ # Pydantic models for request/response validation
16
- β”‚ └── schemas.py
17
- β”œβ”€β”€ tests/ # Unit and integration tests
18
- β”‚ └── test_api.py
19
- β”œβ”€β”€ requirements.txt # Python dependencies
20
- β”œβ”€β”€ Dockerfile # For containerization
21
- β”œβ”€β”€ README.md # Project documentation
22
- └── .env # Environment variables (optional)
23
- ```
24
-
25
- ---
26
-
27
- ### **2. Setting Up the Project**
28
- 1. **Create the Project Folder**:
29
- ```bash
30
- mkdir gpt4all-service
31
- cd gpt4all-service
32
- ```
33
-
34
- 2. **Initialize a Virtual Environment**:
35
- ```bash
36
- python -m venv venv
37
- source venv/bin/activate # On Windows: venv\Scripts\activate
38
- ```
39
-
40
- 3. **Install Dependencies**:
41
- Create a `requirements.txt` file:
42
- ```plaintext
43
- fastapi
44
- uvicorn
45
- gpt4all
46
- pydantic
47
- python-dotenv
48
- ```
49
-
50
- Install the dependencies:
51
- ```bash
52
- pip install -r requirements.txt
53
- ```
54
-
55
- ---
56
-
57
- ### **3. Build the GPT4All Service**
58
- #### **Step 1: Create the Model Loading Logic**
59
- - Create `app/models/gpt4all.py`:
60
- ```python
61
- from gpt4all import GPT4All
62
-
63
- class GPT4AllService:
64
- def __init__(self, model_path: str):
65
- self.model = GPT4All(model_path)
66
-
67
- def generate_description(self, prompt: str) -> str:
68
- response = self.model.generate(prompt, max_tokens=300)
69
- return response
70
- ```
71
-
72
- #### **Step 2: Define Request/Response Schemas**
73
- - Create `app/schemas/schemas.py`:
74
- ```python
75
- from pydantic import BaseModel
76
-
77
- class CarData(BaseModel):
78
- make: str
79
- model: str
80
- year: int
81
- mileage: int
82
- features: list[str]
83
- condition: str
84
-
85
- class EnhancedDescriptionResponse(BaseModel):
86
- description: str
87
- ```
88
-
89
- #### **Step 3: Create the FastAPI App**
90
- - Create `app/main.py`:
91
- ```python
92
- from fastapi import FastAPI, HTTPException
93
- from app.models.gpt4all import GPT4AllService
94
- from app.schemas.schemas import CarData, EnhancedDescriptionResponse
95
-
96
- app = FastAPI()
97
-
98
- # Initialize GPT4All service
99
- gpt4all_service = GPT4AllService("ggml-model-gpt4all-falcon-q4_0.bin")
100
-
101
- @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
102
- async def enhance_description(car_data: CarData):
103
- try:
104
- # Create a prompt from car data
105
- prompt = f"""
106
- Enhance this car description for an auction portal:
107
- - Make: {car_data.make}
108
- - Model: {car_data.model}
109
- - Year: {car_data.year}
110
- - Mileage: {car_data.mileage}
111
- - Features: {', '.join(car_data.features)}
112
- - Condition: {car_data.condition}
113
- """
114
- # Generate description
115
- description = gpt4all_service.generate_description(prompt)
116
- return {"description": description}
117
- except Exception as e:
118
- raise HTTPException(status_code=500, detail=str(e))
119
- ```
120
-
121
- ---
122
-
123
- ### **4. Run the Service**
124
- 1. **Start the Service**:
125
- ```bash
126
- uvicorn app.main:app --reload --port 8000
127
- ```
128
-
129
- 2. **Test the API**:
130
- Use `curl` or Postman to send a POST request:
131
- ```bash
132
- curl -X POST "http://localhost:8000/enhance-description" \
133
- -H "Content-Type: application/json" \
134
- -d '{
135
- "make": "Toyota",
136
- "model": "Camry",
137
- "year": 2020,
138
- "mileage": 45000,
139
- "features": ["sunroof", "leather seats", "lane assist"],
140
- "condition": "excellent"
141
- }'
142
- ```
143
-
144
- **Expected Response**:
145
- ```json
146
- {
147
- "description": "This 2020 Toyota Camry is a well-maintained vehicle with only 45,000 miles on the odometer. It comes equipped with a sunroof, luxurious leather seats, and advanced lane assist technology. Perfect for families or commuters, this car is in excellent condition and ready to hit the road. Don’t miss out on this fantastic deal!"
148
- }
149
- ```
150
-
151
- ---
152
-
153
- ### **5. Add Logging and Error Handling**
154
- - Add logging to track requests and errors:
155
- ```python
156
- import logging
157
-
158
- logging.basicConfig(level=logging.INFO)
159
- logger = logging.getLogger(__name__)
160
-
161
- @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
162
- async def enhance_description(car_data: CarData):
163
- try:
164
- logger.info(f"Received request for car: {car_data.make} {car_data.model}")
165
- prompt = f"Enhance this car description: {car_data}"
166
- description = gpt4all_service.generate_description(prompt)
167
- return {"description": description}
168
- except Exception as e:
169
- logger.error(f"Error generating description: {e}")
170
- raise HTTPException(status_code=500, detail=str(e))
171
- ```
172
-
173
- ---
174
-
175
- ### **6. Containerize the Service (Optional)**
176
- Create a `Dockerfile` to containerize the service:
177
- ```dockerfile
178
- # Use an official Python runtime as a parent image
179
- FROM python:3.9-slim
180
-
181
- # Set the working directory
182
- WORKDIR /app
183
-
184
- # Copy the requirements file
185
- COPY requirements.txt .
186
-
187
- # Install dependencies
188
- RUN pip install --no-cache-dir -r requirements.txt
189
-
190
- # Copy the application code
191
- COPY . .
192
-
193
- # Expose the port the app runs on
194
- EXPOSE 8000
195
-
196
- # Run the application
197
- CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
198
- ```
199
-
200
- Build and run the Docker container:
201
- ```bash
202
- docker build -t gpt4all-service .
203
- docker run -p 8000:8000 gpt4all-service
204
- ```
205
-
206
- ---
207
-
208
- ### **7. Next Steps**
209
- - Add unit tests in the `tests/` folder.
210
- - Add environment variables for configuration (e.g., model path, port).
211
- - Integrate with your Flask backend by calling this service via HTTP.
212
-
213
- Let me know if you need help with any specific part (e.g., testing, deployment, or advanced features)!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/main.py CHANGED
@@ -1,20 +1,26 @@
1
  import os
2
  import time
 
3
  import importlib
4
  from fastapi import FastAPI, HTTPException, Depends, Body
5
- from typing import Optional
6
  from pydantic import ValidationError
7
 
8
- from app.models.huggingface_service import HuggingFaceTextGenerationService
9
  from fastapi.middleware.cors import CORSMiddleware
10
- from app.schemas.schemas import EnhancedDescriptionResponse
 
 
 
 
 
 
11
  from app.auth.placeholder_auth import get_authenticated_user
12
- # MCP imports removed
13
 
14
  app = FastAPI(
15
- title="Modular Car Description Enhancer",
16
- description="AI-powered service for enhancing descriptions for multiple domains with Auth0 JWT authentication",
17
- version="2.0.0"
18
  )
19
 
20
  # CORS configuration
@@ -30,21 +36,15 @@ app.add_middleware(
30
  allow_headers=["*"],
31
  )
32
 
33
- # Global service initialization
34
- MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
35
- hf_service = HuggingFaceTextGenerationService(
36
- model_name_or_path=MODEL_PATH_IN_CONTAINER,
37
- device="cpu"
38
- )
39
-
40
  @app.on_event("startup")
41
  async def startup_event():
42
- print("Starting up and initializing HuggingFace service...")
 
43
  try:
44
- await hf_service.initialize()
45
- print(f"HuggingFace service initialized successfully from {MODEL_PATH_IN_CONTAINER}.")
46
  except Exception as e:
47
- print(f"An unexpected error occurred during HuggingFace service initialization: {e}")
48
  raise
49
 
50
  # --- Helper function to load domain logic ---
@@ -59,95 +59,153 @@ def get_domain_config(domain: str):
59
 
60
  @app.get("/")
61
  async def read_root():
62
- return {"message": "Welcome to the Modular Description Enhancer API! Go to /docs for documentation."}
63
 
64
  @app.get("/health")
65
  async def health_check():
 
 
 
66
  return {
67
  "status": "ok",
68
- "model_initialized": hf_service.pipeline is not None,
 
69
  }
70
 
 
 
 
 
 
71
  @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
72
  async def enhance_description(
73
  domain: str = Body(..., embed=True),
74
  data: dict = Body(..., embed=True),
 
75
  user: Optional[dict] = Depends(get_authenticated_user)
76
  ):
77
  """
78
- Generate an enhanced description for a given domain and data.
79
  - **domain**: The name of the domain (e.g., 'cars').
80
  - **data**: A dictionary with the data for the description.
 
81
  """
82
  start_time = time.time()
83
 
84
- # --- 1. Load Domain Configuration ---
 
 
 
 
85
  domain_config = get_domain_config(domain)
86
  DomainSchema = domain_config["schema"]
87
  create_prompt = domain_config["create_prompt"]
88
- # mcp_rules removed
89
 
90
- # --- 2. Validate Input Data ---
91
  try:
92
  validated_data = DomainSchema(**data)
93
  except ValidationError as e:
94
  raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
95
 
96
- # --- 3. Prompt Construction ---
97
  chat_messages = create_prompt(validated_data)
98
 
99
- # --- 4. Text Generation ---
100
  try:
101
- generated_description = await hf_service.generate_text(
102
- chat_template_messages=chat_messages,
 
103
  max_new_tokens=150,
104
  temperature=0.75,
105
  top_p=0.9,
106
  )
107
  except Exception as e:
108
- print(f"Unexpected error during text generation: {e}")
109
- raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
110
-
111
- # --- 5. MCP Guardrails & Post-processing removed ---
112
- # if not guardrails.check_compliance(generated_description, mcp_rules.get("guardrails", {})):
113
- # raise HTTPException(status_code=400, detail="Generated description failed compliance checks.")
114
-
115
- # final_description = postprocessor.format_output(generated_description, mcp_rules.get("postprocessor", {}))
116
- final_description = generated_description # No post-processing here
117
 
118
  generation_time = time.time() - start_time
119
  user_email = user['email'] if user else "anonymous"
120
 
121
  return EnhancedDescriptionResponse(
122
- description=final_description,
123
- model_used="speakleash/Bielik-1.5B-v3.0-Instruct",
124
  generation_time=round(generation_time, 2),
125
  user_email=user_email
126
  )
127
 
128
- @app.post("/generate")
129
- async def generate_text_only(
130
- chat_template_messages: str = Body(..., embed=True),
131
- max_new_tokens: int = 150,
132
- temperature: float = 0.75,
133
- top_p: float = 0.9
134
  ):
135
  """
136
- Generates raw text based on provided chat template messages.
137
- This endpoint is intended for internal use by the MCP service.
138
  """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
139
  try:
140
- generated_text = await hf_service.generate_text(
141
- chat_template_messages=chat_template_messages,
142
- max_new_tokens=max_new_tokens,
143
- temperature=temperature,
144
- top_p=top_p,
145
- )
146
- return {"generated_text": generated_text}
147
- except Exception as e:
148
- print(f"Unexpected error during raw text generation: {e}")
149
- raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
150
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
151
 
152
  @app.get("/user/me")
153
  async def get_user_info(user: dict = Depends(get_authenticated_user)):
 
1
  import os
2
  import time
3
+ import asyncio
4
  import importlib
5
  from fastapi import FastAPI, HTTPException, Depends, Body
6
+ from typing import Optional, List
7
  from pydantic import ValidationError
8
 
9
+ from app.models.registry import registry, MODEL_CONFIG
10
  from fastapi.middleware.cors import CORSMiddleware
11
+ from app.schemas.schemas import (
12
+ EnhancedDescriptionResponse,
13
+ CompareRequest,
14
+ CompareResponse,
15
+ ModelResult,
16
+ ModelInfo,
17
+ )
18
  from app.auth.placeholder_auth import get_authenticated_user
 
19
 
20
  app = FastAPI(
21
+ title="Multi-Model Description Enhancer",
22
+ description="AI-powered service for enhancing descriptions using multiple LLMs for A/B testing",
23
+ version="3.0.0"
24
  )
25
 
26
  # CORS configuration
 
36
  allow_headers=["*"],
37
  )
38
 
 
 
 
 
 
 
 
39
  @app.on_event("startup")
40
  async def startup_event():
41
+ """Initialize local models at startup."""
42
+ print("Starting up and initializing local models...")
43
  try:
44
+ await registry.initialize_local_models()
45
+ print("Local models initialized successfully.")
46
  except Exception as e:
47
+ print(f"Error during model initialization: {e}")
48
  raise
49
 
50
  # --- Helper function to load domain logic ---
 
59
 
60
  @app.get("/")
61
  async def read_root():
62
+ return {"message": "Welcome to the Multi-Model Description Enhancer API! Go to /docs for documentation."}
63
 
64
  @app.get("/health")
65
  async def health_check():
66
+ """Check API health and model status."""
67
+ models = registry.list_models()
68
+ local_initialized = any(m["initialized"] for m in models if m["type"] == "local")
69
  return {
70
  "status": "ok",
71
+ "local_models_initialized": local_initialized,
72
+ "available_models": len(models),
73
  }
74
 
75
+ @app.get("/models", response_model=List[ModelInfo])
76
+ async def list_models():
77
+ """List all available models."""
78
+ return registry.list_models()
79
+
80
  @app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
81
  async def enhance_description(
82
  domain: str = Body(..., embed=True),
83
  data: dict = Body(..., embed=True),
84
+ model: str = Body("bielik-1.5b", embed=True),
85
  user: Optional[dict] = Depends(get_authenticated_user)
86
  ):
87
  """
88
+ Generate an enhanced description using a single model.
89
  - **domain**: The name of the domain (e.g., 'cars').
90
  - **data**: A dictionary with the data for the description.
91
+ - **model**: Model to use (default: bielik-1.5b)
92
  """
93
  start_time = time.time()
94
 
95
+ # Validate model
96
+ if model not in registry.get_available_model_names():
97
+ raise HTTPException(status_code=400, detail=f"Unknown model: {model}")
98
+
99
+ # Load Domain Configuration
100
  domain_config = get_domain_config(domain)
101
  DomainSchema = domain_config["schema"]
102
  create_prompt = domain_config["create_prompt"]
 
103
 
104
+ # Validate Input Data
105
  try:
106
  validated_data = DomainSchema(**data)
107
  except ValidationError as e:
108
  raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
109
 
110
+ # Prompt Construction
111
  chat_messages = create_prompt(validated_data)
112
 
113
+ # Text Generation
114
  try:
115
+ llm = await registry.get_model(model)
116
+ generated_description = await llm.generate(
117
+ chat_messages=chat_messages,
118
  max_new_tokens=150,
119
  temperature=0.75,
120
  top_p=0.9,
121
  )
122
  except Exception as e:
123
+ print(f"Error during text generation with {model}: {e}")
124
+ raise HTTPException(status_code=500, detail=f"Generation error: {str(e)}")
 
 
 
 
 
 
 
125
 
126
  generation_time = time.time() - start_time
127
  user_email = user['email'] if user else "anonymous"
128
 
129
  return EnhancedDescriptionResponse(
130
+ description=generated_description,
131
+ model_used=MODEL_CONFIG[model]["id"],
132
  generation_time=round(generation_time, 2),
133
  user_email=user_email
134
  )
135
 
136
+ @app.post("/compare", response_model=CompareResponse)
137
+ async def compare_models(
138
+ request: CompareRequest,
139
+ user: Optional[dict] = Depends(get_authenticated_user)
 
 
140
  ):
141
  """
142
+ Compare outputs from multiple models for the same input.
143
+ Returns results from all specified models (or all available if not specified).
144
  """
145
+ total_start = time.time()
146
+
147
+ # Get models to compare
148
+ available_models = registry.get_available_model_names()
149
+ models_to_use = request.models if request.models else available_models
150
+
151
+ # Validate requested models
152
+ for model in models_to_use:
153
+ if model not in available_models:
154
+ raise HTTPException(status_code=400, detail=f"Unknown model: {model}")
155
+
156
+ # Load Domain Configuration
157
+ domain_config = get_domain_config(request.domain)
158
+ DomainSchema = domain_config["schema"]
159
+ create_prompt = domain_config["create_prompt"]
160
+
161
+ # Validate Input Data
162
  try:
163
+ validated_data = DomainSchema(**request.data)
164
+ except ValidationError as e:
165
+ raise HTTPException(status_code=422, detail=f"Invalid data: {e}")
 
 
 
 
 
 
 
166
 
167
+ # Prompt Construction
168
+ chat_messages = create_prompt(validated_data)
169
+
170
+ # Generate with each model
171
+ results = []
172
+
173
+ async def generate_with_model(model_name: str) -> ModelResult:
174
+ start_time = time.time()
175
+ try:
176
+ llm = await registry.get_model(model_name)
177
+ output = await llm.generate(
178
+ chat_messages=chat_messages,
179
+ max_new_tokens=150,
180
+ temperature=0.75,
181
+ top_p=0.9,
182
+ )
183
+ return ModelResult(
184
+ model=model_name,
185
+ output=output,
186
+ time=round(time.time() - start_time, 2),
187
+ type=MODEL_CONFIG[model_name]["type"],
188
+ error=None
189
+ )
190
+ except Exception as e:
191
+ return ModelResult(
192
+ model=model_name,
193
+ output="",
194
+ time=round(time.time() - start_time, 2),
195
+ type=MODEL_CONFIG[model_name]["type"],
196
+ error=str(e)
197
+ )
198
+
199
+ # Run all models (sequentially to avoid memory issues)
200
+ for model_name in models_to_use:
201
+ result = await generate_with_model(model_name)
202
+ results.append(result)
203
+
204
+ return CompareResponse(
205
+ domain=request.domain,
206
+ results=results,
207
+ total_time=round(time.time() - total_start, 2)
208
+ )
209
 
210
  @app.get("/user/me")
211
  async def get_user_info(user: dict = Depends(get_authenticated_user)):
app/models/__init__.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Models module - LLM implementations and registry.
3
+ """
4
+
5
+ from app.models.base_llm import BaseLLM
6
+ from app.models.huggingface_local import HuggingFaceLocal
7
+ from app.models.huggingface_inference_api import HuggingFaceInferenceAPI
8
+ from app.models.registry import registry, MODEL_CONFIG
9
+
10
+ __all__ = [
11
+ "BaseLLM",
12
+ "HuggingFaceLocal",
13
+ "HuggingFaceInferenceAPI",
14
+ "registry",
15
+ "MODEL_CONFIG",
16
+ ]
app/models/base_llm.py ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Abstract base class for all LLM implementations.
3
+ """
4
+
5
+ from abc import ABC, abstractmethod
6
+ from typing import Optional, List, Dict, Any
7
+
8
+
9
+ class BaseLLM(ABC):
10
+ """Abstract interface for LLM models."""
11
+
12
+ def __init__(self, name: str, model_id: str):
13
+ self.name = name
14
+ self.model_id = model_id
15
+ self._initialized = False
16
+
17
+ @property
18
+ def is_initialized(self) -> bool:
19
+ return self._initialized
20
+
21
+ @abstractmethod
22
+ async def initialize(self) -> None:
23
+ """Initialize the model. Must be called before generate()."""
24
+ pass
25
+
26
+ @abstractmethod
27
+ async def generate(
28
+ self,
29
+ prompt: str = None,
30
+ chat_messages: List[Dict[str, str]] = None,
31
+ max_new_tokens: int = 150,
32
+ temperature: float = 0.7,
33
+ top_p: float = 0.9,
34
+ **kwargs
35
+ ) -> str:
36
+ """
37
+ Generate text from prompt or chat messages.
38
+
39
+ Args:
40
+ prompt: Raw text prompt
41
+ chat_messages: List of {"role": "...", "content": "..."} messages
42
+ max_new_tokens: Maximum tokens to generate
43
+ temperature: Sampling temperature
44
+ top_p: Nucleus sampling parameter
45
+
46
+ Returns:
47
+ Generated text string
48
+ """
49
+ pass
50
+
51
+ @abstractmethod
52
+ def get_info(self) -> Dict[str, Any]:
53
+ """Return model information for /models endpoint."""
54
+ pass
app/models/huggingface_inference_api.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ HuggingFace Inference API client for remote model access.
3
+ """
4
+
5
+ import os
6
+ from typing import List, Dict, Any, Optional
7
+ from huggingface_hub import InferenceClient
8
+
9
+ from app.models.base_llm import BaseLLM
10
+
11
+
12
+ class HuggingFaceInferenceAPI(BaseLLM):
13
+ """
14
+ Remote model access via HuggingFace Inference API.
15
+ Best for larger models (7B+) that don't fit in local RAM.
16
+ """
17
+
18
+ def __init__(self, name: str, model_id: str, token: str = None):
19
+ super().__init__(name, model_id)
20
+ self.token = token or os.getenv("HF_TOKEN")
21
+ self.client: Optional[InferenceClient] = None
22
+
23
+ async def initialize(self) -> None:
24
+ """Initialize the Inference API client."""
25
+ if self._initialized:
26
+ return
27
+
28
+ try:
29
+ print(f"[{self.name}] Initializing Inference API for: {self.model_id}")
30
+
31
+ self.client = InferenceClient(
32
+ model=self.model_id,
33
+ token=self.token
34
+ )
35
+
36
+ self._initialized = True
37
+ print(f"[{self.name}] Inference API ready")
38
+
39
+ except Exception as e:
40
+ print(f"[{self.name}] Failed to initialize: {e}")
41
+ raise
42
+
43
+ async def generate(
44
+ self,
45
+ prompt: str = None,
46
+ chat_messages: List[Dict[str, str]] = None,
47
+ max_new_tokens: int = 150,
48
+ temperature: float = 0.7,
49
+ top_p: float = 0.9,
50
+ **kwargs
51
+ ) -> str:
52
+ """Generate text using HuggingFace Inference API."""
53
+
54
+ if not self._initialized or not self.client:
55
+ raise RuntimeError(f"[{self.name}] Client not initialized")
56
+
57
+ try:
58
+ # Use chat completion if chat_messages provided
59
+ if chat_messages:
60
+ response = self.client.chat_completion(
61
+ messages=chat_messages,
62
+ max_tokens=max_new_tokens,
63
+ temperature=temperature,
64
+ top_p=top_p,
65
+ )
66
+ return response.choices[0].message.content.strip()
67
+
68
+ # Otherwise use text generation
69
+ elif prompt:
70
+ response = self.client.text_generation(
71
+ prompt=prompt,
72
+ max_new_tokens=max_new_tokens,
73
+ temperature=temperature,
74
+ top_p=top_p,
75
+ do_sample=True,
76
+ )
77
+ return response.strip()
78
+
79
+ else:
80
+ raise ValueError("Either prompt or chat_messages required")
81
+
82
+ except Exception as e:
83
+ print(f"[{self.name}] Generation error: {e}")
84
+ raise
85
+
86
+ def get_info(self) -> Dict[str, Any]:
87
+ """Return model info."""
88
+ return {
89
+ "name": self.name,
90
+ "model_id": self.model_id,
91
+ "type": "inference_api",
92
+ "initialized": self._initialized,
93
+ }
app/models/huggingface_local.py ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Local HuggingFace model implementation using transformers pipeline.
3
+ """
4
+
5
+ from typing import List, Dict, Any, Optional
6
+ from transformers import pipeline, AutoTokenizer
7
+ import torch
8
+ import asyncio
9
+
10
+ from app.models.base_llm import BaseLLM
11
+
12
+
13
+ class HuggingFaceLocal(BaseLLM):
14
+ """
15
+ Local HuggingFace model loaded into container memory.
16
+ Best for smaller models (< 3B parameters) that fit in RAM.
17
+ """
18
+
19
+ def __init__(self, name: str, model_id: str, device: str = "cpu"):
20
+ super().__init__(name, model_id)
21
+ self.device = device
22
+ self.pipeline = None
23
+ self.tokenizer = None
24
+
25
+ # Determine device index
26
+ if device == "cuda" and torch.cuda.is_available():
27
+ self.device_index = 0
28
+ else:
29
+ self.device_index = -1 # CPU
30
+
31
+ async def initialize(self) -> None:
32
+ """Load model into memory."""
33
+ if self._initialized:
34
+ return
35
+
36
+ try:
37
+ print(f"[{self.name}] Loading local model: {self.model_id}")
38
+
39
+ self.tokenizer = await asyncio.to_thread(
40
+ AutoTokenizer.from_pretrained,
41
+ self.model_id,
42
+ trust_remote_code=True
43
+ )
44
+
45
+ self.pipeline = await asyncio.to_thread(
46
+ pipeline,
47
+ "text-generation",
48
+ model=self.model_id,
49
+ tokenizer=self.tokenizer,
50
+ device=self.device_index,
51
+ torch_dtype=torch.float32,
52
+ trust_remote_code=True,
53
+ )
54
+
55
+ self._initialized = True
56
+ print(f"[{self.name}] Model loaded successfully")
57
+
58
+ except Exception as e:
59
+ print(f"[{self.name}] Failed to load model: {e}")
60
+ raise
61
+
62
+ async def generate(
63
+ self,
64
+ prompt: str = None,
65
+ chat_messages: List[Dict[str, str]] = None,
66
+ max_new_tokens: int = 150,
67
+ temperature: float = 0.7,
68
+ top_p: float = 0.9,
69
+ **kwargs
70
+ ) -> str:
71
+ """Generate text using local pipeline."""
72
+
73
+ if not self._initialized:
74
+ raise RuntimeError(f"[{self.name}] Model not initialized")
75
+
76
+ # Format prompt
77
+ if chat_messages:
78
+ try:
79
+ formatted_prompt = self.tokenizer.apply_chat_template(
80
+ chat_messages,
81
+ tokenize=False,
82
+ add_generation_prompt=True
83
+ )
84
+ except Exception:
85
+ # Fallback to raw prompt
86
+ if prompt:
87
+ formatted_prompt = prompt
88
+ else:
89
+ raise ValueError("Cannot generate without valid prompt")
90
+ elif prompt:
91
+ formatted_prompt = prompt
92
+ else:
93
+ raise ValueError("Either prompt or chat_messages required")
94
+
95
+ # Generate
96
+ outputs = await asyncio.to_thread(
97
+ self.pipeline,
98
+ formatted_prompt,
99
+ max_new_tokens=max_new_tokens,
100
+ do_sample=True,
101
+ temperature=temperature,
102
+ top_p=top_p,
103
+ eos_token_id=self.tokenizer.eos_token_id,
104
+ pad_token_id=self.tokenizer.eos_token_id if self.tokenizer.pad_token_id is None else self.tokenizer.pad_token_id,
105
+ )
106
+
107
+ # Extract response
108
+ if outputs and isinstance(outputs, list) and "generated_text" in outputs[0]:
109
+ full_text = outputs[0]["generated_text"]
110
+
111
+ # Remove prompt from output
112
+ if full_text.startswith(formatted_prompt):
113
+ response = full_text[len(formatted_prompt):]
114
+ else:
115
+ response = full_text
116
+
117
+ # Clean up special tokens
118
+ if response.endswith("<|im_end|>"):
119
+ response = response[:-len("<|im_end|>")]
120
+
121
+ return response.strip()
122
+
123
+ return ""
124
+
125
+ def get_info(self) -> Dict[str, Any]:
126
+ """Return model info."""
127
+ return {
128
+ "name": self.name,
129
+ "model_id": self.model_id,
130
+ "type": "local",
131
+ "initialized": self._initialized,
132
+ "device": self.device
133
+ }
app/models/registry.py ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Model Registry - Central configuration and factory for all LLM models.
3
+ """
4
+
5
+ import os
6
+ from typing import Dict, List, Any, Optional
7
+
8
+ from app.models.base_llm import BaseLLM
9
+ from app.models.huggingface_local import HuggingFaceLocal
10
+ from app.models.huggingface_inference_api import HuggingFaceInferenceAPI
11
+
12
+
13
+ # Model configuration
14
+ MODEL_CONFIG = {
15
+ "bielik-1.5b": {
16
+ "id": "speakleash/Bielik-1.5B-v3.0-Instruct",
17
+ "type": "local",
18
+ "polish_support": "excellent",
19
+ "size": "1.5B",
20
+ },
21
+ "pllum-12b": {
22
+ "id": "CYFRAGOVPL/PLLuM-12B-instruct",
23
+ "type": "inference_api",
24
+ "polish_support": "excellent",
25
+ "size": "12B",
26
+ },
27
+ "mistral-small-3": {
28
+ "id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
29
+ "type": "inference_api",
30
+ "polish_support": "good",
31
+ "size": "24B",
32
+ },
33
+ "gemma-2-9b": {
34
+ "id": "google/gemma-2-9b-it",
35
+ "type": "inference_api",
36
+ "polish_support": "medium",
37
+ "size": "9B",
38
+ },
39
+ }
40
+
41
+ # For local model override (when model is pre-downloaded in container)
42
+ LOCAL_MODEL_PATH = os.getenv("LOCAL_MODEL_PATH", "/app/pretrain_model")
43
+
44
+
45
+ class ModelRegistry:
46
+ """
47
+ Central registry for managing all LLM models.
48
+ Handles model instantiation, initialization, and access.
49
+ """
50
+
51
+ def __init__(self):
52
+ self._models: Dict[str, BaseLLM] = {}
53
+ self._config = MODEL_CONFIG.copy()
54
+
55
+ def _create_model(self, name: str) -> BaseLLM:
56
+ """Factory method to create model instance."""
57
+
58
+ if name not in self._config:
59
+ raise ValueError(f"Unknown model: {name}")
60
+
61
+ config = self._config[name]
62
+ model_type = config["type"]
63
+ model_id = config["id"]
64
+
65
+ # Special case: local Bielik uses pre-downloaded path
66
+ if name == "bielik-1.5b" and os.path.exists(LOCAL_MODEL_PATH):
67
+ model_id = LOCAL_MODEL_PATH
68
+
69
+ if model_type == "local":
70
+ return HuggingFaceLocal(
71
+ name=name,
72
+ model_id=model_id,
73
+ device="cpu"
74
+ )
75
+ elif model_type == "inference_api":
76
+ return HuggingFaceInferenceAPI(
77
+ name=name,
78
+ model_id=model_id
79
+ )
80
+ else:
81
+ raise ValueError(f"Unknown model type: {model_type}")
82
+
83
+ async def get_model(self, name: str) -> BaseLLM:
84
+ """Get or create and initialize a model."""
85
+
86
+ if name not in self._models:
87
+ model = self._create_model(name)
88
+ await model.initialize()
89
+ self._models[name] = model
90
+
91
+ return self._models[name]
92
+
93
+ async def initialize_model(self, name: str) -> None:
94
+ """Pre-initialize a specific model."""
95
+ await self.get_model(name)
96
+
97
+ async def initialize_local_models(self) -> None:
98
+ """Initialize all local models at startup."""
99
+ for name, config in self._config.items():
100
+ if config["type"] == "local":
101
+ await self.initialize_model(name)
102
+
103
+ def list_models(self) -> List[Dict[str, Any]]:
104
+ """List all available models with their info."""
105
+ models = []
106
+ for name, config in self._config.items():
107
+ model_info = {
108
+ "name": name,
109
+ "model_id": config["id"],
110
+ "type": config["type"],
111
+ "polish_support": config["polish_support"],
112
+ "size": config["size"],
113
+ "initialized": name in self._models and self._models[name].is_initialized,
114
+ }
115
+ models.append(model_info)
116
+ return models
117
+
118
+ def get_available_model_names(self) -> List[str]:
119
+ """Get list of available model names."""
120
+ return list(self._config.keys())
121
+
122
+
123
+ # Global registry instance
124
+ registry = ModelRegistry()
app/schemas/schemas.py CHANGED
@@ -1,4 +1,6 @@
1
  from pydantic import BaseModel
 
 
2
 
3
  class EnhancedDescriptionResponse(BaseModel):
4
  description: str
@@ -6,3 +8,32 @@ class EnhancedDescriptionResponse(BaseModel):
6
  generation_time: float
7
  user_email: str
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  from pydantic import BaseModel
2
+ from typing import List, Optional, Dict, Any
3
+
4
 
5
  class EnhancedDescriptionResponse(BaseModel):
6
  description: str
 
8
  generation_time: float
9
  user_email: str
10
 
11
+
12
+ class ModelInfo(BaseModel):
13
+ name: str
14
+ model_id: str
15
+ type: str
16
+ polish_support: str
17
+ size: str
18
+ initialized: bool
19
+
20
+
21
+ class CompareRequest(BaseModel):
22
+ domain: str
23
+ data: Dict[str, Any]
24
+ models: Optional[List[str]] = None # If None, use all models
25
+
26
+
27
+ class ModelResult(BaseModel):
28
+ model: str
29
+ output: str
30
+ time: float
31
+ type: str
32
+ error: Optional[str] = None
33
+
34
+
35
+ class CompareResponse(BaseModel):
36
+ domain: str
37
+ results: List[ModelResult]
38
+ total_time: float
39
+
llm_app_rework.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLM App Rework Plan
2
+
3
+ ## Goal
4
+ Transform single-model app β†’ multi-model comparison platform for A/B testing open-source LLMs on car descriptions.
5
+
6
+ ---
7
+
8
+ ## Current State
9
+ - Single model: Bielik-1.5B (local HuggingFace)
10
+ - Single domain: cars
11
+ - No comparison capability
12
+
13
+ ## Target State
14
+ - Multiple open-source LLMs via HuggingFace
15
+ - Same prompt β†’ multiple outputs β†’ compare results
16
+ - Support compression/decompression testing
17
+
18
+ ---
19
+
20
+ ## Architecture Changes
21
+
22
+ ### 1. Model Registry
23
+ ```
24
+ app/models/
25
+ β”œβ”€β”€ registry.py # Model registry + factory
26
+ β”œβ”€β”€ base_llm.py # Abstract base class
27
+ β”œβ”€β”€ huggingface_local.py # Refactored current service
28
+ ```
29
+
30
+ ### 2. Base LLM Interface
31
+ ```python
32
+ class BaseLLM(ABC):
33
+ name: str
34
+ model_id: str
35
+ async def generate(prompt, **params) -> str
36
+ async def initialize() -> None
37
+ def is_initialized() -> bool
38
+ ```
39
+
40
+ ### 3. Model Registry
41
+ ```python
42
+ MODELS = {
43
+ "bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
44
+ "pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
45
+ "mistral-small-3": {"id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503", "type": "inference_api"},
46
+ "gemma-2-9b": {"id": "google/gemma-2-9b-it", "type": "inference_api"},
47
+ }
48
+ ```
49
+
50
+ ### 4. Two Model Types
51
+ | Type | Description | Use Case |
52
+ |------|-------------|----------|
53
+ | `local` | Loaded in container memory | Bielik-1.5B (small, fits in RAM) |
54
+ | `inference_api` | HuggingFace Inference API | Larger models (7B+) via API |
55
+
56
+ ### 5. New Endpoints
57
+
58
+ | Endpoint | Purpose |
59
+ |----------|---------|
60
+ | `POST /enhance` | Single model (existing) |
61
+ | `POST /compare` | Multiple models, return all outputs |
62
+ | `GET /models` | List available models |
63
+
64
+ ### 6. Compare Request/Response
65
+ ```python
66
+ # Request
67
+ {
68
+ "domain": "cars",
69
+ "data": {...},
70
+ "models": ["bielik-1.5b", "pllum-12b", "mistral-small-3", "gemma-2-9b"]
71
+ }
72
+
73
+ # Response
74
+ {
75
+ "results": [
76
+ {"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
77
+ {"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"},
78
+ {"model": "mistral-small-3", "output": "...", "time": 0.9, "type": "inference_api"},
79
+ {"model": "gemma-2-9b", "output": "...", "time": 1.0, "type": "inference_api"}
80
+ ]
81
+ }
82
+ ```
83
+
84
+ ---
85
+
86
+ ## Implementation Steps
87
+
88
+ 1. **Create base_llm.py** - abstract interface
89
+ 2. **Create huggingface_inference_api.py** - HF Inference API client
90
+ 3. **Refactor huggingface_service.py** β†’ HuggingFaceLocal (implements BaseLLM)
91
+ 4. **Create registry.py** - model factory + config
92
+ 5. **Add /compare endpoint** in main.py
93
+ 6. **Add /models endpoint** - list available
94
+ 7. **Update schemas** - CompareRequest, CompareResponse
95
+
96
+ ---
97
+
98
+ ## HuggingFace Inference API
99
+ ```python
100
+ from huggingface_hub import InferenceClient
101
+
102
+ client = InferenceClient(token=HF_TOKEN)
103
+ response = client.text_generation(
104
+ model="mistralai/Mistral-7B-Instruct-v0.3",
105
+ prompt=formatted_prompt,
106
+ max_new_tokens=150
107
+ )
108
+ ```
109
+
110
+ ---
111
+
112
+ ## Env Vars (HuggingFace Secrets)
113
+ ```
114
+ HF_TOKEN=hf_... # For Inference API access
115
+ ```
116
+
117
+ ---
118
+
119
+ ## Models (Approved)
120
+
121
+ | Model | Size | Polish Support | HuggingFace ID |
122
+ |-------|------|----------------|----------------|
123
+ | Bielik-1.5B | 1.5B | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
124
+ | PLLuM-12B | 12B | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
125
+ | Mistral-Small-3 | 24B | Good | mistralai/Mistral-Small-3.1-24B-Instruct-2503 |
126
+ | Gemma-2-9B | 9B | Medium | google/gemma-2-9b-it |
127
+
128
+ ---
129
+
130
+ ## Priority
131
+ 1. HuggingFace Inference API integration
132
+ 2. /compare endpoint
133
+ 3. /models endpoint
134
+
135
+ ---
136
+
137
+ ## Notes
138
+ - All models = open source via HuggingFace
139
+ - Local model = Bielik-1.5B (already works)
140
+ - Larger models = HF Inference API (no local GPU needed)
141
+ - HF_TOKEN needed for gated models (Gemma, etc)
requirements.txt CHANGED
@@ -1,4 +1,5 @@
1
  fastapi
2
  uvicorn[standard]
3
  transformers[torch]
4
- accelerate
 
 
1
  fastapi
2
  uvicorn[standard]
3
  transformers[torch]
4
+ accelerate
5
+ huggingface_hub