Spaces:
Running
Running
Patryk Studzinski
commited on
Commit
Β·
a7fd202
1
Parent(s):
a1c0774
first-imrpvement-commit
Browse files- PROJECT_CONTEXT.md +0 -107
- README.md +141 -276
- answer.md +0 -213
- app/main.py +115 -57
- app/models/__init__.py +16 -0
- app/models/base_llm.py +54 -0
- app/models/huggingface_inference_api.py +93 -0
- app/models/huggingface_local.py +133 -0
- app/models/registry.py +124 -0
- app/schemas/schemas.py +31 -0
- llm_app_rework.md +141 -0
- requirements.txt +2 -1
PROJECT_CONTEXT.md
DELETED
|
@@ -1,107 +0,0 @@
|
|
| 1 |
-
# GPT4All Service - Project Context
|
| 2 |
-
|
| 3 |
-
## Project Overview
|
| 4 |
-
This is a **Polish Car Description Enhancement Service** built as a FastAPI microservice that uses a Hugging Face Large Language Model to generate enhanced marketing descriptions for cars in Polish language.
|
| 5 |
-
|
| 6 |
-
## Core Functionality
|
| 7 |
-
The service takes basic car information (make, model, year, mileage, features, condition) and generates compelling, marketing-friendly descriptions in Polish using the `speakleash/Bielik-1.5B-v3.0-Instruct` model - a Polish language model from the Bielik series.
|
| 8 |
-
|
| 9 |
-
## Project Structure
|
| 10 |
-
|
| 11 |
-
```
|
| 12 |
-
gpt4all-service/
|
| 13 |
-
βββ app/
|
| 14 |
-
β βββ main.py # FastAPI application with endpoints
|
| 15 |
-
β βββ models/
|
| 16 |
-
β β βββ huggingface_service.py # Core LLM service wrapper
|
| 17 |
-
β βββ schemas/
|
| 18 |
-
β βββ schemas.py # Pydantic data models
|
| 19 |
-
βββ Dockerfile # Multi-stage Docker build
|
| 20 |
-
βββ download_model.py # Model download script for Docker
|
| 21 |
-
βββ requirements.txt # Python dependencies
|
| 22 |
-
βββ start_container.ps1 # PowerShell startup script
|
| 23 |
-
βββ start_container.sh # Bash startup script
|
| 24 |
-
βββ README.md # Comprehensive documentation
|
| 25 |
-
```
|
| 26 |
-
|
| 27 |
-
## Technical Architecture
|
| 28 |
-
|
| 29 |
-
### 1. FastAPI Application (`app/main.py`)
|
| 30 |
-
- **Framework**: FastAPI with CORS middleware
|
| 31 |
-
- **Main Endpoint**: `POST /enhance-description` - takes car data, returns enhanced description
|
| 32 |
-
- **Health Check**: `GET /health` - service status and model initialization check
|
| 33 |
-
- **CORS**: Configured for frontend on `http://localhost:5173` (likely React/Vue dev server)
|
| 34 |
-
|
| 35 |
-
### 2. LLM Service (`app/models/huggingface_service.py`)
|
| 36 |
-
- **Purpose**: Wrapper around Hugging Face Transformers pipeline
|
| 37 |
-
- **Model**: `speakleash/Bielik-1.5B-v3.0-Instruct` (Polish language model)
|
| 38 |
-
- **Features**:
|
| 39 |
-
- Async initialization and text generation
|
| 40 |
-
- Support for both GPU (CUDA) and CPU inference
|
| 41 |
-
- Chat template support for conversation-style prompts
|
| 42 |
-
- Configurable generation parameters (temperature, top_p, max_tokens)
|
| 43 |
-
- Smart response parsing to extract only the assistant's response
|
| 44 |
-
|
| 45 |
-
### 3. Data Models (`app/schemas/schemas.py`)
|
| 46 |
-
- **CarData**: Input model with make, model, year, mileage, features[], condition
|
| 47 |
-
- **EnhancedDescriptionResponse**: Output model with generated description
|
| 48 |
-
|
| 49 |
-
### 4. Containerization
|
| 50 |
-
- **Docker**: Self-contained image with pre-downloaded model (~3.2GB)
|
| 51 |
-
- **Security**: Uses Docker BuildKit secrets for Hugging Face token handling
|
| 52 |
-
- **Model Storage**: Downloaded to `/app/pretrain_model` during build
|
| 53 |
-
- **Runtime**: Python 3.9-slim base image
|
| 54 |
-
|
| 55 |
-
## Key Technical Details
|
| 56 |
-
|
| 57 |
-
### Model Configuration
|
| 58 |
-
- **Model Path**: `/app/pretrain_model` (in container) or configurable for local dev
|
| 59 |
-
- **Device**: Currently set to CPU in main.py, but service supports GPU
|
| 60 |
-
- **Generation Params**: 150 max tokens, temperature 0.75, top_p 0.9
|
| 61 |
-
|
| 62 |
-
### Prompt Engineering
|
| 63 |
-
The service uses a carefully crafted Polish system prompt:
|
| 64 |
-
- Instructs the model to create marketing descriptions in Polish
|
| 65 |
-
- Limits output to 500 characters maximum
|
| 66 |
-
- Tells the model to ignore off-topic content
|
| 67 |
-
- Uses chat template format with system/user roles
|
| 68 |
-
|
| 69 |
-
### Dependencies
|
| 70 |
-
- **fastapi**: Web framework
|
| 71 |
-
- **uvicorn[standard]**: ASGI server
|
| 72 |
-
- **transformers[torch]**: Hugging Face transformers with PyTorch
|
| 73 |
-
- **accelerate**: Hugging Face optimization library
|
| 74 |
-
|
| 75 |
-
## Current State & Issues
|
| 76 |
-
|
| 77 |
-
### Git Status
|
| 78 |
-
- Modified `app/main.py` (likely recent changes)
|
| 79 |
-
- Deleted `app/models/gpt4all.py` (indicates migration from GPT4All to Hugging Face)
|
| 80 |
-
|
| 81 |
-
### Linter Issues in `huggingface_service.py`
|
| 82 |
-
1. Import issues: `pipeline` and `AutoTokenizer` imports need specific paths
|
| 83 |
-
2. Type annotations: `device: str = None` should be `Optional[str] = None`
|
| 84 |
-
3. Method parameters: Similar optional parameter typing issues
|
| 85 |
-
|
| 86 |
-
## Usage Scenarios
|
| 87 |
-
1. **Car Dealership Websites**: Auto-generate compelling descriptions from basic car specs
|
| 88 |
-
2. **Marketplace Applications**: Enhance user-provided car listings
|
| 89 |
-
3. **Inventory Management**: Bulk description generation for car databases
|
| 90 |
-
|
| 91 |
-
## Deployment Options
|
| 92 |
-
1. **Local Development**: Direct Python/uvicorn execution
|
| 93 |
-
2. **Docker Container**: Self-contained deployment with pre-downloaded model
|
| 94 |
-
3. **Production**: Containerized deployment with proper authentication
|
| 95 |
-
|
| 96 |
-
## Authentication Requirements
|
| 97 |
-
- Hugging Face Hub token required for model download (gated model)
|
| 98 |
-
- Token stored in `my_hf_token.txt` during Docker build
|
| 99 |
-
- Securely handled via Docker BuildKit secrets
|
| 100 |
-
|
| 101 |
-
## Performance Considerations
|
| 102 |
-
- Model size: ~3.2GB (significant memory footprint)
|
| 103 |
-
- CPU inference: Slower but more accessible
|
| 104 |
-
- GPU inference: Faster but requires CUDA setup
|
| 105 |
-
- Async design: Non-blocking text generation
|
| 106 |
-
|
| 107 |
-
This service represents a specialized AI application for the Polish automotive market, focusing on generating marketing content using state-of-the-art Polish language models.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -1,314 +1,179 @@
|
|
| 1 |
-
|
| 2 |
-
title: AI Car Description Enhancer
|
| 3 |
-
emoji: πβ¨
|
| 4 |
-
colorFrom: red
|
| 5 |
-
colorTo: yellow
|
| 6 |
-
sdk: docker
|
| 7 |
-
pinned: false
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
-
<div align="center">
|
| 11 |
-
|
| 12 |
-
# AI Car Description Enhancer "Bielik"
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
## Contents
|
| 21 |
-
|
| 22 |
-
1. [Features](#features)
|
| 23 |
-
2. [Prerequisites](#prerequisites)
|
| 24 |
-
3. [Project Structure](#project-structure)
|
| 25 |
-
4. [Installation (Local Development)](#installation-local-development)
|
| 26 |
-
5. [Usage (Local Development)](#usage-local-development)
|
| 27 |
-
6. [Docker Usage](#docker-usage)
|
| 28 |
-
7. [Quick Start with PowerShell (`start_container.ps1`)](#quick-start-with-powershell-start_containerps1)
|
| 29 |
-
8. [API Endpoints](#api-endpoints)
|
| 30 |
-
- [Health Check](#health-check)
|
| 31 |
-
- [Enhance Description](#enhance-description)
|
| 32 |
-
9. [Core Service (`app/models/huggingface_service.py`)](#core-service-appmodelshuggingface_servicepy)
|
| 33 |
-
10. [Configuration](#configuration)
|
| 34 |
-
11. [Schemas (`app/schemas/schemas.py`)](#schemas-appschemasschemaspy)
|
| 35 |
-
- [CarData](#cardata)
|
| 36 |
-
- [EnhancedDescriptionResponse](#enhanceddescriptionresponse)
|
| 37 |
-
12. [Contributing](#contributing)
|
| 38 |
-
13. [License](#license)
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
-
|
| 47 |
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
-
|
| 51 |
-
- Utilizes the `speakleash/Bielik-1.5B-v3.0-Instruct` model via the Hugging Face `transformers` library.
|
| 52 |
-
- Health check endpoint.
|
| 53 |
-
- Docker support for easy deployment, with the model included in the image.
|
| 54 |
-
- Includes a `start_container.sh` script for convenient container startup.
|
| 55 |
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
-
|
| 59 |
-
- `pip` (Python package installer)
|
| 60 |
-
- Docker (for containerized deployment, Docker BuildKit enabled recommended for secrets)
|
| 61 |
-
- Git (for cloning the repository)
|
| 62 |
-
- A Hugging Face Hub account and an access token (with `read` permissions) if the chosen model is gated (see Docker Usage section).
|
| 63 |
-
- For using `start_container.sh`: A bash-compatible shell (like those on Linux, macOS, or Git Bash on Windows).
|
| 64 |
|
| 65 |
-
##
|
| 66 |
|
| 67 |
-
|
| 68 |
|
| 69 |
-
|
| 70 |
-
.
|
| 71 |
-
βββ app/
|
| 72 |
-
β βββ __init__.py
|
| 73 |
-
β βββ main.py # FastAPI application, endpoints
|
| 74 |
-
β βββ models/
|
| 75 |
-
β β βββ __init__.py
|
| 76 |
-
β β βββ huggingface_service.py # Service for interacting with the LLM
|
| 77 |
-
β βββ schemas/
|
| 78 |
-
β βββ __init__.py
|
| 79 |
-
β βββ schemas.py # Pydantic schemas for request/response
|
| 80 |
-
βββ .gitignore
|
| 81 |
-
βββ Dockerfile
|
| 82 |
-
βββ download_model.py # Script to download model during Docker build
|
| 83 |
-
βββ my_hf_token.txt # (Should be created locally) For storing HF token
|
| 84 |
-
βββ requirements.txt
|
| 85 |
-
βββ start_container.sh # Helper script to run the Docker container
|
| 86 |
-
βββ README.md
|
| 87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
```
|
| 89 |
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
1. **Clone the repository:**
|
| 93 |
-
```bash
|
| 94 |
-
git clone [https://github.com/studzin-sky/llm-description-enhancer.git](https://github.com/studzin-sky/llm-description-enhancer.git)
|
| 95 |
-
cd llm-description-enhancer
|
| 96 |
-
```
|
| 97 |
-
|
| 98 |
-
2. **Create and activate a virtual environment:**
|
| 99 |
-
(Recommended to keep dependencies isolated)
|
| 100 |
-
```bash
|
| 101 |
-
python -m venv venv
|
| 102 |
-
```
|
| 103 |
-
* On macOS/Linux:
|
| 104 |
-
```bash
|
| 105 |
-
source venv/bin/activate
|
| 106 |
-
```
|
| 107 |
-
* On Windows (PowerShell):
|
| 108 |
-
```bash
|
| 109 |
-
.\venv\Scripts\Activate.ps1
|
| 110 |
-
```
|
| 111 |
-
* On Windows (Command Prompt):
|
| 112 |
-
```bash
|
| 113 |
-
venv\Scripts\activate.bat
|
| 114 |
-
```
|
| 115 |
-
|
| 116 |
-
3. **Install the required dependencies:**
|
| 117 |
-
Ensure your `requirements.txt` includes `fastapi`, `uvicorn[standard]`, `transformers[torch]`, `torch`, `accelerate`, and `huggingface_hub`.
|
| 118 |
-
```bash
|
| 119 |
-
pip install -r requirements.txt
|
| 120 |
-
```
|
| 121 |
-
*Note: The first time you run the application locally (or if the model cache is empty), the Hugging Face model (~3.2GB) will be downloaded. This might take some time. **If the model (`speakleash/Bielik-1.5B-v3.0-Instruct` or the one configured) is gated or requires authentication, you may need to log in using `huggingface-cli login` in your terminal before running the application locally.** After logging in, your token will be cached by the `huggingface_hub` library.*
|
| 122 |
-
|
| 123 |
-
## Usage (Local Development)
|
| 124 |
-
|
| 125 |
-
1. **Start the FastAPI server:**
|
| 126 |
-
From the project root directory:
|
| 127 |
-
```bash
|
| 128 |
-
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
| 129 |
-
```
|
| 130 |
-
* `--reload` enables auto-reloading for development.
|
| 131 |
-
* `--host 0.0.0.0` makes the server accessible on your network.
|
| 132 |
-
|
| 133 |
-
2. **Access the application:**
|
| 134 |
-
* Health Check: [http://127.0.0.1:8000/health](http://127.0.0.1:8000/health)
|
| 135 |
-
* API Documentation (Swagger UI): [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
|
| 136 |
-
* Enhance Description: `POST` requests to [http://127.0.0.1:8000/enhance-description](http://127.0.0.1:8000/enhance-description)
|
| 137 |
-
|
| 138 |
-
## Docker Usage
|
| 139 |
-
|
| 140 |
-
The included `Dockerfile` builds an image with the application and the pre-downloaded Hugging Face model, making it self-contained. Downloading gated models during the build process requires a Hugging Face Hub token.
|
| 141 |
-
|
| 142 |
-
1. **Prepare Hugging Face Hub Token (for Gated Models):**
|
| 143 |
-
The `speakleash/Bielik-1.5B-v3.0-Instruct` model may require authentication to download.
|
| 144 |
-
* **Get a Token:**
|
| 145 |
-
1. Go to your Hugging Face account settings: [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
|
| 146 |
-
2. Create a new token (e.g., named "docker-bielik-access") with `read` permissions.
|
| 147 |
-
3. Copy the generated token (it will start with `hf_`).
|
| 148 |
-
* **Create Token File:**
|
| 149 |
-
1. In your project's root directory (next to your `Dockerfile`), create a file named `my_hf_token.txt`.
|
| 150 |
-
2. Paste **only the token string** (e.g., `hf_YourActualTokenValueHere`) into this file. Do not add any other text or variable names.
|
| 151 |
-
|
| 152 |
-
2. **Build the Docker image:**
|
| 153 |
-
From the project root directory, run:
|
| 154 |
-
```bash
|
| 155 |
-
DOCKER_BUILDKIT=1 docker build --secret id=huggingface_token,src=my_hf_token.txt -t llm-description-enhancer .
|
| 156 |
-
```
|
| 157 |
-
* `DOCKER_BUILDKIT=1`: Enables BuildKit, which is required for using `--secret`.
|
| 158 |
-
* `--secret id=huggingface_token,src=my_hf_token.txt`: Securely provides the content of `my_hf_token.txt` to the build process. The `id=huggingface_token` must match the ID used in the `RUN --mount` directive in your `Dockerfile`.
|
| 159 |
-
* *(This step will take a while, especially the first time, as it downloads the LLM using your token).*
|
| 160 |
-
|
| 161 |
-
3. **Run the Docker container using the Helper Script (`start_container.sh`):**
|
| 162 |
-
A helper script `start_container.sh` is included in the repository to simplify starting the Docker container. This script typically handles stopping/removing any pre-existing container with the same configured name and then starts a new one.
|
| 163 |
-
|
| 164 |
-
* **Ensure the script is executable:**
|
| 165 |
-
After cloning the repository, or if the execute permission isn't set, you might need to make the script executable (on Linux, macOS, or Git Bash on Windows):
|
| 166 |
-
```bash
|
| 167 |
-
chmod +x start_container.sh
|
| 168 |
-
```
|
| 169 |
-
|
| 170 |
-
* **Run the script:**
|
| 171 |
-
From the project root directory:
|
| 172 |
-
```bash
|
| 173 |
-
./start_container.sh
|
| 174 |
-
```
|
| 175 |
-
|
| 176 |
-
* **Expected Outcome (depends on your script's content):**
|
| 177 |
-
The script will likely:
|
| 178 |
-
* Output messages indicating it's managing the container.
|
| 179 |
-
* Start the container (possibly in detached mode).
|
| 180 |
-
* Inform you that the service is available at `http://127.0.0.1:8000`.
|
| 181 |
-
* Provide commands to view logs or stop the container if it's running in detached mode (e.g., `docker logs <container_name> -f` and `docker stop <container_name>`).
|
| 182 |
-
|
| 183 |
-
*(Alternatively, you can run the container manually: `docker run --rm -p 8000:8000 llm-description-enhancer`)*
|
| 184 |
-
|
| 185 |
-
4. **Test the containerized application:**
|
| 186 |
-
Once the container is running (via the script or manually), send requests to `http://127.0.0.1:8000` as described in the API Endpoints section.
|
| 187 |
-
|
| 188 |
-
## Quick Start with PowerShell (`start_container.ps1`)
|
| 189 |
-
|
| 190 |
-
For Windows users, you can automate the Docker build and run process using the provided PowerShell script. This script will:
|
| 191 |
-
- Build the Docker image using your Hugging Face token (from `my_hf_token.txt`)
|
| 192 |
-
- Stop and remove any existing container named `bielik_app_instance`
|
| 193 |
-
- Start a new container and map port 8000
|
| 194 |
-
|
| 195 |
-
**Steps:**
|
| 196 |
-
|
| 197 |
-
1. Ensure your Hugging Face token is saved in `my_hf_token.txt` in the project root (see above for details).
|
| 198 |
-
2. Open PowerShell in the project directory.
|
| 199 |
-
3. (Optional, but recommended) Temporarily allow running unsigned scripts for this session:
|
| 200 |
-
```powershell
|
| 201 |
-
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process
|
| 202 |
-
```
|
| 203 |
-
4. Run the script:
|
| 204 |
-
```powershell
|
| 205 |
-
.\start_container.ps1
|
| 206 |
-
```
|
| 207 |
-
|
| 208 |
-
The script will build the image and start the container. Your FastAPI service will be available at [http://127.0.0.1:8000](http://127.0.0.1:8000).
|
| 209 |
-
|
| 210 |
-
You can view logs with:
|
| 211 |
-
```powershell
|
| 212 |
-
docker logs bielik_app_instance -f
|
| 213 |
-
```
|
| 214 |
-
To stop the container:
|
| 215 |
-
```powershell
|
| 216 |
-
docker stop bielik_app_instance
|
| 217 |
-
```
|
| 218 |
|
| 219 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
|
| 221 |
---
|
| 222 |
|
| 223 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 224 |
|
| 225 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 226 |
|
| 227 |
-
|
| 228 |
-
- **Method:** `GET`
|
| 229 |
-
- **Description:** Returns the status of the application and model initialization.
|
| 230 |
-
- **Example Response:**
|
| 231 |
-
```json
|
| 232 |
-
{
|
| 233 |
-
"status": "ok",
|
| 234 |
-
"model_initialized": true,
|
| 235 |
-
"model_path": "/app/pretrain_model"
|
| 236 |
-
}
|
| 237 |
-
```
|
| 238 |
|
| 239 |
-
###
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 240 |
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
{
|
| 247 |
-
"
|
| 248 |
-
"
|
| 249 |
-
"
|
| 250 |
-
"
|
| 251 |
-
"
|
| 252 |
-
|
| 253 |
-
}
|
| 254 |
-
```
|
| 255 |
-
- **Response (`application/json`):**
|
| 256 |
-
```json
|
| 257 |
{
|
| 258 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 259 |
}
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
-H "Content-Type: application/json" \
|
| 265 |
-
-d '{
|
| 266 |
-
"make": "Toyota",
|
| 267 |
-
"model": "Corolla",
|
| 268 |
-
"year": 2021,
|
| 269 |
-
"mileage": 25000,
|
| 270 |
-
"features": ["Kamera cofania", "Apple CarPlay", "Android Auto", "System bezkluczykowy"],
|
| 271 |
-
"condition": "Bardzo dobry"
|
| 272 |
-
}'
|
| 273 |
-
```
|
| 274 |
-
|
| 275 |
-
## Core Service (`app/models/huggingface_service.py`)
|
| 276 |
-
|
| 277 |
-
The `HuggingFaceTextGenerationService` class handles the interaction with the Large Language Model.
|
| 278 |
-
|
| 279 |
-
- **Key Methods:**
|
| 280 |
-
- `async initialize()`: Loads the pre-trained model and tokenizer from the path specified during service instantiation (e.g., `/app/pretrain_model` in Docker, or from Hugging Face cache locally).
|
| 281 |
-
- `async generate_text(chat_template_messages: list, max_new_tokens: int, ...)`: Generates text based on a structured chat prompt, applying appropriate chat templates and parsing the model's output to return only the assistant's response.
|
| 282 |
-
|
| 283 |
-
## Configuration
|
| 284 |
-
|
| 285 |
-
- **Model Used:** `speakleash/Bielik-1.5B-v3.0-Instruct`. This is baked into `/app/pretrain_model` in the Docker image. For local development, it's downloaded to the Hugging Face cache.
|
| 286 |
-
- **Language:** The primary focus is on generating descriptions in **Polish**.
|
| 287 |
-
- **Prompt Engineering:** The system and user prompts in `app/main.py` are crafted to guide the model towards generating concise and relevant marketing descriptions.
|
| 288 |
|
| 289 |
-
|
| 290 |
|
| 291 |
-
|
| 292 |
|
| 293 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 294 |
|
| 295 |
-
|
| 296 |
-
- `make`: `str`
|
| 297 |
-
- `model`: `str`
|
| 298 |
-
- `year`: `int`
|
| 299 |
-
- `mileage`: `int`
|
| 300 |
-
- `features`: `list[str]`
|
| 301 |
-
- `condition`: `str`
|
| 302 |
|
| 303 |
-
|
|
|
|
|
|
|
| 304 |
|
| 305 |
-
|
| 306 |
-
|
|
|
|
| 307 |
|
| 308 |
-
##
|
| 309 |
|
| 310 |
-
|
|
|
|
|
|
|
|
|
|
| 311 |
|
| 312 |
-
|
| 313 |
|
| 314 |
-
|
|
|
|
| 1 |
+
# Bielik App Service
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
Multi-model LLM service for description enhancement and A/B testing.
|
| 4 |
|
| 5 |
+
## Overview
|
| 6 |
|
| 7 |
+
This service provides an API for generating enhanced descriptions using multiple open-source LLMs. It supports comparing outputs across different models to evaluate quality, speed, and Polish language support.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
## Models
|
| 10 |
|
| 11 |
+
| Model | Size | Polish Support | Type |
|
| 12 |
+
|-------|------|----------------|------|
|
| 13 |
+
| Bielik-1.5B | 1.5B | Excellent | Local |
|
| 14 |
+
| PLLuM-12B | 12B | Excellent | API |
|
| 15 |
+
| Mistral-Small-3 | 24B | Good | API |
|
| 16 |
+
| Gemma-2-9B | 9B | Medium | API |
|
| 17 |
|
| 18 |
+
## API Endpoints
|
| 19 |
|
| 20 |
+
### Health & Info
|
| 21 |
|
| 22 |
+
| Method | Endpoint | Description |
|
| 23 |
+
|--------|----------|-------------|
|
| 24 |
+
| `GET` | `/` | Welcome message |
|
| 25 |
+
| `GET` | `/health` | API health check and model status |
|
| 26 |
+
| `GET` | `/models` | List all available models |
|
| 27 |
|
| 28 |
+
### Generation
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
| Method | Endpoint | Description |
|
| 31 |
+
|--------|----------|-------------|
|
| 32 |
+
| `POST` | `/enhance-description` | Generate description with single model |
|
| 33 |
+
| `POST` | `/compare` | Compare outputs from multiple models |
|
| 34 |
|
| 35 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
+
## Endpoint Details
|
| 38 |
|
| 39 |
+
### `GET /health`
|
| 40 |
|
| 41 |
+
Check API status and model initialization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
**Response:**
|
| 44 |
+
```json
|
| 45 |
+
{
|
| 46 |
+
"status": "ok",
|
| 47 |
+
"local_models_initialized": true,
|
| 48 |
+
"available_models": 4
|
| 49 |
+
}
|
| 50 |
```
|
| 51 |
|
| 52 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
### `GET /models`
|
| 55 |
+
|
| 56 |
+
List all available models with their details.
|
| 57 |
+
|
| 58 |
+
**Response:**
|
| 59 |
+
```json
|
| 60 |
+
[
|
| 61 |
+
{
|
| 62 |
+
"name": "bielik-1.5b",
|
| 63 |
+
"model_id": "speakleash/Bielik-1.5B-v3.0-Instruct",
|
| 64 |
+
"type": "local",
|
| 65 |
+
"polish_support": "excellent",
|
| 66 |
+
"size": "1.5B",
|
| 67 |
+
"initialized": true
|
| 68 |
+
}
|
| 69 |
+
]
|
| 70 |
+
```
|
| 71 |
|
| 72 |
---
|
| 73 |
|
| 74 |
+
### `POST /enhance-description`
|
| 75 |
+
|
| 76 |
+
Generate enhanced description using a single model.
|
| 77 |
+
|
| 78 |
+
**Request:**
|
| 79 |
+
```json
|
| 80 |
+
{
|
| 81 |
+
"domain": "cars",
|
| 82 |
+
"data": {
|
| 83 |
+
"make": "BMW",
|
| 84 |
+
"model": "320i",
|
| 85 |
+
"year": 2020,
|
| 86 |
+
"mileage": 45000,
|
| 87 |
+
"features": ["nawigacja", "klimatyzacja"],
|
| 88 |
+
"condition": "bardzo dobry"
|
| 89 |
+
},
|
| 90 |
+
"model": "bielik-1.5b"
|
| 91 |
+
}
|
| 92 |
+
```
|
| 93 |
|
| 94 |
+
**Response:**
|
| 95 |
+
```json
|
| 96 |
+
{
|
| 97 |
+
"description": "Generated description text...",
|
| 98 |
+
"model_used": "speakleash/Bielik-1.5B-v3.0-Instruct",
|
| 99 |
+
"generation_time": 2.34,
|
| 100 |
+
"user_email": "anonymous"
|
| 101 |
+
}
|
| 102 |
+
```
|
| 103 |
|
| 104 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
+
### `POST /compare`
|
| 107 |
+
|
| 108 |
+
Compare outputs from multiple models for the same input.
|
| 109 |
+
|
| 110 |
+
**Request:**
|
| 111 |
+
```json
|
| 112 |
+
{
|
| 113 |
+
"domain": "cars",
|
| 114 |
+
"data": {
|
| 115 |
+
"make": "BMW",
|
| 116 |
+
"model": "320i",
|
| 117 |
+
"year": 2020,
|
| 118 |
+
"mileage": 45000,
|
| 119 |
+
"features": ["nawigacja", "klimatyzacja"],
|
| 120 |
+
"condition": "bardzo dobry"
|
| 121 |
+
},
|
| 122 |
+
"models": ["bielik-1.5b", "pllum-12b", "gemma-2-9b"]
|
| 123 |
+
}
|
| 124 |
+
```
|
| 125 |
|
| 126 |
+
**Response:**
|
| 127 |
+
```json
|
| 128 |
+
{
|
| 129 |
+
"domain": "cars",
|
| 130 |
+
"results": [
|
| 131 |
{
|
| 132 |
+
"model": "bielik-1.5b",
|
| 133 |
+
"output": "Generated text from Bielik...",
|
| 134 |
+
"time": 2.3,
|
| 135 |
+
"type": "local",
|
| 136 |
+
"error": null
|
| 137 |
+
},
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
{
|
| 139 |
+
"model": "pllum-12b",
|
| 140 |
+
"output": "Generated text from PLLuM...",
|
| 141 |
+
"time": 1.1,
|
| 142 |
+
"type": "inference_api",
|
| 143 |
+
"error": null
|
| 144 |
}
|
| 145 |
+
],
|
| 146 |
+
"total_time": 5.67
|
| 147 |
+
}
|
| 148 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
|
| 150 |
+
---
|
| 151 |
|
| 152 |
+
## Environment Variables
|
| 153 |
|
| 154 |
+
| Variable | Description | Required |
|
| 155 |
+
|----------|-------------|----------|
|
| 156 |
+
| `HF_TOKEN` | HuggingFace API token for Inference API | Yes (for API models) |
|
| 157 |
+
| `LOCAL_MODEL_PATH` | Path to pre-downloaded local model | No (default: `/app/pretrain_model`) |
|
| 158 |
+
| `FRONTEND_URL` | Frontend URL for CORS | No |
|
| 159 |
|
| 160 |
+
## Running Locally
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
|
| 162 |
+
```bash
|
| 163 |
+
# Install dependencies
|
| 164 |
+
pip install -r requirements.txt
|
| 165 |
|
| 166 |
+
# Run server
|
| 167 |
+
uvicorn app.main:app --reload --port 8000
|
| 168 |
+
```
|
| 169 |
|
| 170 |
+
## Docker
|
| 171 |
|
| 172 |
+
```bash
|
| 173 |
+
# Build and run
|
| 174 |
+
./start_container.ps1
|
| 175 |
+
```
|
| 176 |
|
| 177 |
+
API available at `http://localhost:8000`
|
| 178 |
|
| 179 |
+
Docs at `http://localhost:8000/docs`
|
answer.md
DELETED
|
@@ -1,213 +0,0 @@
|
|
| 1 |
-
Great choice! Creating a separate service for GPT4All is a clean and scalable approach. Letβs break this down step by step, starting from setting up the project structure to deploying the service.
|
| 2 |
-
|
| 3 |
-
---
|
| 4 |
-
|
| 5 |
-
### **1. Project Structure**
|
| 6 |
-
Hereβs a suggested folder structure for your GPT4All service:
|
| 7 |
-
|
| 8 |
-
```
|
| 9 |
-
gpt4all-service/
|
| 10 |
-
βββ app/
|
| 11 |
-
β βββ __init__.py
|
| 12 |
-
β βββ main.py # FastAPI/Flask app entry point
|
| 13 |
-
β βββ models/ # GPT4All model loading and inference logic
|
| 14 |
-
β β βββ gpt4all.py
|
| 15 |
-
β βββ schemas/ # Pydantic models for request/response validation
|
| 16 |
-
β βββ schemas.py
|
| 17 |
-
βββ tests/ # Unit and integration tests
|
| 18 |
-
β βββ test_api.py
|
| 19 |
-
βββ requirements.txt # Python dependencies
|
| 20 |
-
βββ Dockerfile # For containerization
|
| 21 |
-
βββ README.md # Project documentation
|
| 22 |
-
βββ .env # Environment variables (optional)
|
| 23 |
-
```
|
| 24 |
-
|
| 25 |
-
---
|
| 26 |
-
|
| 27 |
-
### **2. Setting Up the Project**
|
| 28 |
-
1. **Create the Project Folder**:
|
| 29 |
-
```bash
|
| 30 |
-
mkdir gpt4all-service
|
| 31 |
-
cd gpt4all-service
|
| 32 |
-
```
|
| 33 |
-
|
| 34 |
-
2. **Initialize a Virtual Environment**:
|
| 35 |
-
```bash
|
| 36 |
-
python -m venv venv
|
| 37 |
-
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 38 |
-
```
|
| 39 |
-
|
| 40 |
-
3. **Install Dependencies**:
|
| 41 |
-
Create a `requirements.txt` file:
|
| 42 |
-
```plaintext
|
| 43 |
-
fastapi
|
| 44 |
-
uvicorn
|
| 45 |
-
gpt4all
|
| 46 |
-
pydantic
|
| 47 |
-
python-dotenv
|
| 48 |
-
```
|
| 49 |
-
|
| 50 |
-
Install the dependencies:
|
| 51 |
-
```bash
|
| 52 |
-
pip install -r requirements.txt
|
| 53 |
-
```
|
| 54 |
-
|
| 55 |
-
---
|
| 56 |
-
|
| 57 |
-
### **3. Build the GPT4All Service**
|
| 58 |
-
#### **Step 1: Create the Model Loading Logic**
|
| 59 |
-
- Create `app/models/gpt4all.py`:
|
| 60 |
-
```python
|
| 61 |
-
from gpt4all import GPT4All
|
| 62 |
-
|
| 63 |
-
class GPT4AllService:
|
| 64 |
-
def __init__(self, model_path: str):
|
| 65 |
-
self.model = GPT4All(model_path)
|
| 66 |
-
|
| 67 |
-
def generate_description(self, prompt: str) -> str:
|
| 68 |
-
response = self.model.generate(prompt, max_tokens=300)
|
| 69 |
-
return response
|
| 70 |
-
```
|
| 71 |
-
|
| 72 |
-
#### **Step 2: Define Request/Response Schemas**
|
| 73 |
-
- Create `app/schemas/schemas.py`:
|
| 74 |
-
```python
|
| 75 |
-
from pydantic import BaseModel
|
| 76 |
-
|
| 77 |
-
class CarData(BaseModel):
|
| 78 |
-
make: str
|
| 79 |
-
model: str
|
| 80 |
-
year: int
|
| 81 |
-
mileage: int
|
| 82 |
-
features: list[str]
|
| 83 |
-
condition: str
|
| 84 |
-
|
| 85 |
-
class EnhancedDescriptionResponse(BaseModel):
|
| 86 |
-
description: str
|
| 87 |
-
```
|
| 88 |
-
|
| 89 |
-
#### **Step 3: Create the FastAPI App**
|
| 90 |
-
- Create `app/main.py`:
|
| 91 |
-
```python
|
| 92 |
-
from fastapi import FastAPI, HTTPException
|
| 93 |
-
from app.models.gpt4all import GPT4AllService
|
| 94 |
-
from app.schemas.schemas import CarData, EnhancedDescriptionResponse
|
| 95 |
-
|
| 96 |
-
app = FastAPI()
|
| 97 |
-
|
| 98 |
-
# Initialize GPT4All service
|
| 99 |
-
gpt4all_service = GPT4AllService("ggml-model-gpt4all-falcon-q4_0.bin")
|
| 100 |
-
|
| 101 |
-
@app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
|
| 102 |
-
async def enhance_description(car_data: CarData):
|
| 103 |
-
try:
|
| 104 |
-
# Create a prompt from car data
|
| 105 |
-
prompt = f"""
|
| 106 |
-
Enhance this car description for an auction portal:
|
| 107 |
-
- Make: {car_data.make}
|
| 108 |
-
- Model: {car_data.model}
|
| 109 |
-
- Year: {car_data.year}
|
| 110 |
-
- Mileage: {car_data.mileage}
|
| 111 |
-
- Features: {', '.join(car_data.features)}
|
| 112 |
-
- Condition: {car_data.condition}
|
| 113 |
-
"""
|
| 114 |
-
# Generate description
|
| 115 |
-
description = gpt4all_service.generate_description(prompt)
|
| 116 |
-
return {"description": description}
|
| 117 |
-
except Exception as e:
|
| 118 |
-
raise HTTPException(status_code=500, detail=str(e))
|
| 119 |
-
```
|
| 120 |
-
|
| 121 |
-
---
|
| 122 |
-
|
| 123 |
-
### **4. Run the Service**
|
| 124 |
-
1. **Start the Service**:
|
| 125 |
-
```bash
|
| 126 |
-
uvicorn app.main:app --reload --port 8000
|
| 127 |
-
```
|
| 128 |
-
|
| 129 |
-
2. **Test the API**:
|
| 130 |
-
Use `curl` or Postman to send a POST request:
|
| 131 |
-
```bash
|
| 132 |
-
curl -X POST "http://localhost:8000/enhance-description" \
|
| 133 |
-
-H "Content-Type: application/json" \
|
| 134 |
-
-d '{
|
| 135 |
-
"make": "Toyota",
|
| 136 |
-
"model": "Camry",
|
| 137 |
-
"year": 2020,
|
| 138 |
-
"mileage": 45000,
|
| 139 |
-
"features": ["sunroof", "leather seats", "lane assist"],
|
| 140 |
-
"condition": "excellent"
|
| 141 |
-
}'
|
| 142 |
-
```
|
| 143 |
-
|
| 144 |
-
**Expected Response**:
|
| 145 |
-
```json
|
| 146 |
-
{
|
| 147 |
-
"description": "This 2020 Toyota Camry is a well-maintained vehicle with only 45,000 miles on the odometer. It comes equipped with a sunroof, luxurious leather seats, and advanced lane assist technology. Perfect for families or commuters, this car is in excellent condition and ready to hit the road. Donβt miss out on this fantastic deal!"
|
| 148 |
-
}
|
| 149 |
-
```
|
| 150 |
-
|
| 151 |
-
---
|
| 152 |
-
|
| 153 |
-
### **5. Add Logging and Error Handling**
|
| 154 |
-
- Add logging to track requests and errors:
|
| 155 |
-
```python
|
| 156 |
-
import logging
|
| 157 |
-
|
| 158 |
-
logging.basicConfig(level=logging.INFO)
|
| 159 |
-
logger = logging.getLogger(__name__)
|
| 160 |
-
|
| 161 |
-
@app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
|
| 162 |
-
async def enhance_description(car_data: CarData):
|
| 163 |
-
try:
|
| 164 |
-
logger.info(f"Received request for car: {car_data.make} {car_data.model}")
|
| 165 |
-
prompt = f"Enhance this car description: {car_data}"
|
| 166 |
-
description = gpt4all_service.generate_description(prompt)
|
| 167 |
-
return {"description": description}
|
| 168 |
-
except Exception as e:
|
| 169 |
-
logger.error(f"Error generating description: {e}")
|
| 170 |
-
raise HTTPException(status_code=500, detail=str(e))
|
| 171 |
-
```
|
| 172 |
-
|
| 173 |
-
---
|
| 174 |
-
|
| 175 |
-
### **6. Containerize the Service (Optional)**
|
| 176 |
-
Create a `Dockerfile` to containerize the service:
|
| 177 |
-
```dockerfile
|
| 178 |
-
# Use an official Python runtime as a parent image
|
| 179 |
-
FROM python:3.9-slim
|
| 180 |
-
|
| 181 |
-
# Set the working directory
|
| 182 |
-
WORKDIR /app
|
| 183 |
-
|
| 184 |
-
# Copy the requirements file
|
| 185 |
-
COPY requirements.txt .
|
| 186 |
-
|
| 187 |
-
# Install dependencies
|
| 188 |
-
RUN pip install --no-cache-dir -r requirements.txt
|
| 189 |
-
|
| 190 |
-
# Copy the application code
|
| 191 |
-
COPY . .
|
| 192 |
-
|
| 193 |
-
# Expose the port the app runs on
|
| 194 |
-
EXPOSE 8000
|
| 195 |
-
|
| 196 |
-
# Run the application
|
| 197 |
-
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
| 198 |
-
```
|
| 199 |
-
|
| 200 |
-
Build and run the Docker container:
|
| 201 |
-
```bash
|
| 202 |
-
docker build -t gpt4all-service .
|
| 203 |
-
docker run -p 8000:8000 gpt4all-service
|
| 204 |
-
```
|
| 205 |
-
|
| 206 |
-
---
|
| 207 |
-
|
| 208 |
-
### **7. Next Steps**
|
| 209 |
-
- Add unit tests in the `tests/` folder.
|
| 210 |
-
- Add environment variables for configuration (e.g., model path, port).
|
| 211 |
-
- Integrate with your Flask backend by calling this service via HTTP.
|
| 212 |
-
|
| 213 |
-
Let me know if you need help with any specific part (e.g., testing, deployment, or advanced features)!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app/main.py
CHANGED
|
@@ -1,20 +1,26 @@
|
|
| 1 |
import os
|
| 2 |
import time
|
|
|
|
| 3 |
import importlib
|
| 4 |
from fastapi import FastAPI, HTTPException, Depends, Body
|
| 5 |
-
from typing import Optional
|
| 6 |
from pydantic import ValidationError
|
| 7 |
|
| 8 |
-
from app.models.
|
| 9 |
from fastapi.middleware.cors import CORSMiddleware
|
| 10 |
-
from app.schemas.schemas import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
from app.auth.placeholder_auth import get_authenticated_user
|
| 12 |
-
# MCP imports removed
|
| 13 |
|
| 14 |
app = FastAPI(
|
| 15 |
-
title="
|
| 16 |
-
description="AI-powered service for enhancing descriptions
|
| 17 |
-
version="
|
| 18 |
)
|
| 19 |
|
| 20 |
# CORS configuration
|
|
@@ -30,21 +36,15 @@ app.add_middleware(
|
|
| 30 |
allow_headers=["*"],
|
| 31 |
)
|
| 32 |
|
| 33 |
-
# Global service initialization
|
| 34 |
-
MODEL_PATH_IN_CONTAINER = "/app/pretrain_model"
|
| 35 |
-
hf_service = HuggingFaceTextGenerationService(
|
| 36 |
-
model_name_or_path=MODEL_PATH_IN_CONTAINER,
|
| 37 |
-
device="cpu"
|
| 38 |
-
)
|
| 39 |
-
|
| 40 |
@app.on_event("startup")
|
| 41 |
async def startup_event():
|
| 42 |
-
|
|
|
|
| 43 |
try:
|
| 44 |
-
await
|
| 45 |
-
print(
|
| 46 |
except Exception as e:
|
| 47 |
-
print(f"
|
| 48 |
raise
|
| 49 |
|
| 50 |
# --- Helper function to load domain logic ---
|
|
@@ -59,95 +59,153 @@ def get_domain_config(domain: str):
|
|
| 59 |
|
| 60 |
@app.get("/")
|
| 61 |
async def read_root():
|
| 62 |
-
return {"message": "Welcome to the
|
| 63 |
|
| 64 |
@app.get("/health")
|
| 65 |
async def health_check():
|
|
|
|
|
|
|
|
|
|
| 66 |
return {
|
| 67 |
"status": "ok",
|
| 68 |
-
"
|
|
|
|
| 69 |
}
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
@app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
|
| 72 |
async def enhance_description(
|
| 73 |
domain: str = Body(..., embed=True),
|
| 74 |
data: dict = Body(..., embed=True),
|
|
|
|
| 75 |
user: Optional[dict] = Depends(get_authenticated_user)
|
| 76 |
):
|
| 77 |
"""
|
| 78 |
-
Generate an enhanced description
|
| 79 |
- **domain**: The name of the domain (e.g., 'cars').
|
| 80 |
- **data**: A dictionary with the data for the description.
|
|
|
|
| 81 |
"""
|
| 82 |
start_time = time.time()
|
| 83 |
|
| 84 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
domain_config = get_domain_config(domain)
|
| 86 |
DomainSchema = domain_config["schema"]
|
| 87 |
create_prompt = domain_config["create_prompt"]
|
| 88 |
-
# mcp_rules removed
|
| 89 |
|
| 90 |
-
#
|
| 91 |
try:
|
| 92 |
validated_data = DomainSchema(**data)
|
| 93 |
except ValidationError as e:
|
| 94 |
raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
|
| 95 |
|
| 96 |
-
#
|
| 97 |
chat_messages = create_prompt(validated_data)
|
| 98 |
|
| 99 |
-
#
|
| 100 |
try:
|
| 101 |
-
|
| 102 |
-
|
|
|
|
| 103 |
max_new_tokens=150,
|
| 104 |
temperature=0.75,
|
| 105 |
top_p=0.9,
|
| 106 |
)
|
| 107 |
except Exception as e:
|
| 108 |
-
print(f"
|
| 109 |
-
raise HTTPException(status_code=500, detail=f"
|
| 110 |
-
|
| 111 |
-
# --- 5. MCP Guardrails & Post-processing removed ---
|
| 112 |
-
# if not guardrails.check_compliance(generated_description, mcp_rules.get("guardrails", {})):
|
| 113 |
-
# raise HTTPException(status_code=400, detail="Generated description failed compliance checks.")
|
| 114 |
-
|
| 115 |
-
# final_description = postprocessor.format_output(generated_description, mcp_rules.get("postprocessor", {}))
|
| 116 |
-
final_description = generated_description # No post-processing here
|
| 117 |
|
| 118 |
generation_time = time.time() - start_time
|
| 119 |
user_email = user['email'] if user else "anonymous"
|
| 120 |
|
| 121 |
return EnhancedDescriptionResponse(
|
| 122 |
-
description=
|
| 123 |
-
model_used="
|
| 124 |
generation_time=round(generation_time, 2),
|
| 125 |
user_email=user_email
|
| 126 |
)
|
| 127 |
|
| 128 |
-
@app.post("/
|
| 129 |
-
async def
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
temperature: float = 0.75,
|
| 133 |
-
top_p: float = 0.9
|
| 134 |
):
|
| 135 |
"""
|
| 136 |
-
|
| 137 |
-
|
| 138 |
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
try:
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
temperature=temperature,
|
| 144 |
-
top_p=top_p,
|
| 145 |
-
)
|
| 146 |
-
return {"generated_text": generated_text}
|
| 147 |
-
except Exception as e:
|
| 148 |
-
print(f"Unexpected error during raw text generation: {e}")
|
| 149 |
-
raise HTTPException(status_code=500, detail=f"An unexpected error occurred during text generation: {str(e)}")
|
| 150 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
|
| 152 |
@app.get("/user/me")
|
| 153 |
async def get_user_info(user: dict = Depends(get_authenticated_user)):
|
|
|
|
| 1 |
import os
|
| 2 |
import time
|
| 3 |
+
import asyncio
|
| 4 |
import importlib
|
| 5 |
from fastapi import FastAPI, HTTPException, Depends, Body
|
| 6 |
+
from typing import Optional, List
|
| 7 |
from pydantic import ValidationError
|
| 8 |
|
| 9 |
+
from app.models.registry import registry, MODEL_CONFIG
|
| 10 |
from fastapi.middleware.cors import CORSMiddleware
|
| 11 |
+
from app.schemas.schemas import (
|
| 12 |
+
EnhancedDescriptionResponse,
|
| 13 |
+
CompareRequest,
|
| 14 |
+
CompareResponse,
|
| 15 |
+
ModelResult,
|
| 16 |
+
ModelInfo,
|
| 17 |
+
)
|
| 18 |
from app.auth.placeholder_auth import get_authenticated_user
|
|
|
|
| 19 |
|
| 20 |
app = FastAPI(
|
| 21 |
+
title="Multi-Model Description Enhancer",
|
| 22 |
+
description="AI-powered service for enhancing descriptions using multiple LLMs for A/B testing",
|
| 23 |
+
version="3.0.0"
|
| 24 |
)
|
| 25 |
|
| 26 |
# CORS configuration
|
|
|
|
| 36 |
allow_headers=["*"],
|
| 37 |
)
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
@app.on_event("startup")
|
| 40 |
async def startup_event():
|
| 41 |
+
"""Initialize local models at startup."""
|
| 42 |
+
print("Starting up and initializing local models...")
|
| 43 |
try:
|
| 44 |
+
await registry.initialize_local_models()
|
| 45 |
+
print("Local models initialized successfully.")
|
| 46 |
except Exception as e:
|
| 47 |
+
print(f"Error during model initialization: {e}")
|
| 48 |
raise
|
| 49 |
|
| 50 |
# --- Helper function to load domain logic ---
|
|
|
|
| 59 |
|
| 60 |
@app.get("/")
|
| 61 |
async def read_root():
|
| 62 |
+
return {"message": "Welcome to the Multi-Model Description Enhancer API! Go to /docs for documentation."}
|
| 63 |
|
| 64 |
@app.get("/health")
|
| 65 |
async def health_check():
|
| 66 |
+
"""Check API health and model status."""
|
| 67 |
+
models = registry.list_models()
|
| 68 |
+
local_initialized = any(m["initialized"] for m in models if m["type"] == "local")
|
| 69 |
return {
|
| 70 |
"status": "ok",
|
| 71 |
+
"local_models_initialized": local_initialized,
|
| 72 |
+
"available_models": len(models),
|
| 73 |
}
|
| 74 |
|
| 75 |
+
@app.get("/models", response_model=List[ModelInfo])
|
| 76 |
+
async def list_models():
|
| 77 |
+
"""List all available models."""
|
| 78 |
+
return registry.list_models()
|
| 79 |
+
|
| 80 |
@app.post("/enhance-description", response_model=EnhancedDescriptionResponse)
|
| 81 |
async def enhance_description(
|
| 82 |
domain: str = Body(..., embed=True),
|
| 83 |
data: dict = Body(..., embed=True),
|
| 84 |
+
model: str = Body("bielik-1.5b", embed=True),
|
| 85 |
user: Optional[dict] = Depends(get_authenticated_user)
|
| 86 |
):
|
| 87 |
"""
|
| 88 |
+
Generate an enhanced description using a single model.
|
| 89 |
- **domain**: The name of the domain (e.g., 'cars').
|
| 90 |
- **data**: A dictionary with the data for the description.
|
| 91 |
+
- **model**: Model to use (default: bielik-1.5b)
|
| 92 |
"""
|
| 93 |
start_time = time.time()
|
| 94 |
|
| 95 |
+
# Validate model
|
| 96 |
+
if model not in registry.get_available_model_names():
|
| 97 |
+
raise HTTPException(status_code=400, detail=f"Unknown model: {model}")
|
| 98 |
+
|
| 99 |
+
# Load Domain Configuration
|
| 100 |
domain_config = get_domain_config(domain)
|
| 101 |
DomainSchema = domain_config["schema"]
|
| 102 |
create_prompt = domain_config["create_prompt"]
|
|
|
|
| 103 |
|
| 104 |
+
# Validate Input Data
|
| 105 |
try:
|
| 106 |
validated_data = DomainSchema(**data)
|
| 107 |
except ValidationError as e:
|
| 108 |
raise HTTPException(status_code=422, detail=f"Invalid data for domain '{domain}': {e}")
|
| 109 |
|
| 110 |
+
# Prompt Construction
|
| 111 |
chat_messages = create_prompt(validated_data)
|
| 112 |
|
| 113 |
+
# Text Generation
|
| 114 |
try:
|
| 115 |
+
llm = await registry.get_model(model)
|
| 116 |
+
generated_description = await llm.generate(
|
| 117 |
+
chat_messages=chat_messages,
|
| 118 |
max_new_tokens=150,
|
| 119 |
temperature=0.75,
|
| 120 |
top_p=0.9,
|
| 121 |
)
|
| 122 |
except Exception as e:
|
| 123 |
+
print(f"Error during text generation with {model}: {e}")
|
| 124 |
+
raise HTTPException(status_code=500, detail=f"Generation error: {str(e)}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
generation_time = time.time() - start_time
|
| 127 |
user_email = user['email'] if user else "anonymous"
|
| 128 |
|
| 129 |
return EnhancedDescriptionResponse(
|
| 130 |
+
description=generated_description,
|
| 131 |
+
model_used=MODEL_CONFIG[model]["id"],
|
| 132 |
generation_time=round(generation_time, 2),
|
| 133 |
user_email=user_email
|
| 134 |
)
|
| 135 |
|
| 136 |
+
@app.post("/compare", response_model=CompareResponse)
|
| 137 |
+
async def compare_models(
|
| 138 |
+
request: CompareRequest,
|
| 139 |
+
user: Optional[dict] = Depends(get_authenticated_user)
|
|
|
|
|
|
|
| 140 |
):
|
| 141 |
"""
|
| 142 |
+
Compare outputs from multiple models for the same input.
|
| 143 |
+
Returns results from all specified models (or all available if not specified).
|
| 144 |
"""
|
| 145 |
+
total_start = time.time()
|
| 146 |
+
|
| 147 |
+
# Get models to compare
|
| 148 |
+
available_models = registry.get_available_model_names()
|
| 149 |
+
models_to_use = request.models if request.models else available_models
|
| 150 |
+
|
| 151 |
+
# Validate requested models
|
| 152 |
+
for model in models_to_use:
|
| 153 |
+
if model not in available_models:
|
| 154 |
+
raise HTTPException(status_code=400, detail=f"Unknown model: {model}")
|
| 155 |
+
|
| 156 |
+
# Load Domain Configuration
|
| 157 |
+
domain_config = get_domain_config(request.domain)
|
| 158 |
+
DomainSchema = domain_config["schema"]
|
| 159 |
+
create_prompt = domain_config["create_prompt"]
|
| 160 |
+
|
| 161 |
+
# Validate Input Data
|
| 162 |
try:
|
| 163 |
+
validated_data = DomainSchema(**request.data)
|
| 164 |
+
except ValidationError as e:
|
| 165 |
+
raise HTTPException(status_code=422, detail=f"Invalid data: {e}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
|
| 167 |
+
# Prompt Construction
|
| 168 |
+
chat_messages = create_prompt(validated_data)
|
| 169 |
+
|
| 170 |
+
# Generate with each model
|
| 171 |
+
results = []
|
| 172 |
+
|
| 173 |
+
async def generate_with_model(model_name: str) -> ModelResult:
|
| 174 |
+
start_time = time.time()
|
| 175 |
+
try:
|
| 176 |
+
llm = await registry.get_model(model_name)
|
| 177 |
+
output = await llm.generate(
|
| 178 |
+
chat_messages=chat_messages,
|
| 179 |
+
max_new_tokens=150,
|
| 180 |
+
temperature=0.75,
|
| 181 |
+
top_p=0.9,
|
| 182 |
+
)
|
| 183 |
+
return ModelResult(
|
| 184 |
+
model=model_name,
|
| 185 |
+
output=output,
|
| 186 |
+
time=round(time.time() - start_time, 2),
|
| 187 |
+
type=MODEL_CONFIG[model_name]["type"],
|
| 188 |
+
error=None
|
| 189 |
+
)
|
| 190 |
+
except Exception as e:
|
| 191 |
+
return ModelResult(
|
| 192 |
+
model=model_name,
|
| 193 |
+
output="",
|
| 194 |
+
time=round(time.time() - start_time, 2),
|
| 195 |
+
type=MODEL_CONFIG[model_name]["type"],
|
| 196 |
+
error=str(e)
|
| 197 |
+
)
|
| 198 |
+
|
| 199 |
+
# Run all models (sequentially to avoid memory issues)
|
| 200 |
+
for model_name in models_to_use:
|
| 201 |
+
result = await generate_with_model(model_name)
|
| 202 |
+
results.append(result)
|
| 203 |
+
|
| 204 |
+
return CompareResponse(
|
| 205 |
+
domain=request.domain,
|
| 206 |
+
results=results,
|
| 207 |
+
total_time=round(time.time() - total_start, 2)
|
| 208 |
+
)
|
| 209 |
|
| 210 |
@app.get("/user/me")
|
| 211 |
async def get_user_info(user: dict = Depends(get_authenticated_user)):
|
app/models/__init__.py
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Models module - LLM implementations and registry.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from app.models.base_llm import BaseLLM
|
| 6 |
+
from app.models.huggingface_local import HuggingFaceLocal
|
| 7 |
+
from app.models.huggingface_inference_api import HuggingFaceInferenceAPI
|
| 8 |
+
from app.models.registry import registry, MODEL_CONFIG
|
| 9 |
+
|
| 10 |
+
__all__ = [
|
| 11 |
+
"BaseLLM",
|
| 12 |
+
"HuggingFaceLocal",
|
| 13 |
+
"HuggingFaceInferenceAPI",
|
| 14 |
+
"registry",
|
| 15 |
+
"MODEL_CONFIG",
|
| 16 |
+
]
|
app/models/base_llm.py
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Abstract base class for all LLM implementations.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from abc import ABC, abstractmethod
|
| 6 |
+
from typing import Optional, List, Dict, Any
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
class BaseLLM(ABC):
|
| 10 |
+
"""Abstract interface for LLM models."""
|
| 11 |
+
|
| 12 |
+
def __init__(self, name: str, model_id: str):
|
| 13 |
+
self.name = name
|
| 14 |
+
self.model_id = model_id
|
| 15 |
+
self._initialized = False
|
| 16 |
+
|
| 17 |
+
@property
|
| 18 |
+
def is_initialized(self) -> bool:
|
| 19 |
+
return self._initialized
|
| 20 |
+
|
| 21 |
+
@abstractmethod
|
| 22 |
+
async def initialize(self) -> None:
|
| 23 |
+
"""Initialize the model. Must be called before generate()."""
|
| 24 |
+
pass
|
| 25 |
+
|
| 26 |
+
@abstractmethod
|
| 27 |
+
async def generate(
|
| 28 |
+
self,
|
| 29 |
+
prompt: str = None,
|
| 30 |
+
chat_messages: List[Dict[str, str]] = None,
|
| 31 |
+
max_new_tokens: int = 150,
|
| 32 |
+
temperature: float = 0.7,
|
| 33 |
+
top_p: float = 0.9,
|
| 34 |
+
**kwargs
|
| 35 |
+
) -> str:
|
| 36 |
+
"""
|
| 37 |
+
Generate text from prompt or chat messages.
|
| 38 |
+
|
| 39 |
+
Args:
|
| 40 |
+
prompt: Raw text prompt
|
| 41 |
+
chat_messages: List of {"role": "...", "content": "..."} messages
|
| 42 |
+
max_new_tokens: Maximum tokens to generate
|
| 43 |
+
temperature: Sampling temperature
|
| 44 |
+
top_p: Nucleus sampling parameter
|
| 45 |
+
|
| 46 |
+
Returns:
|
| 47 |
+
Generated text string
|
| 48 |
+
"""
|
| 49 |
+
pass
|
| 50 |
+
|
| 51 |
+
@abstractmethod
|
| 52 |
+
def get_info(self) -> Dict[str, Any]:
|
| 53 |
+
"""Return model information for /models endpoint."""
|
| 54 |
+
pass
|
app/models/huggingface_inference_api.py
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
HuggingFace Inference API client for remote model access.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
from typing import List, Dict, Any, Optional
|
| 7 |
+
from huggingface_hub import InferenceClient
|
| 8 |
+
|
| 9 |
+
from app.models.base_llm import BaseLLM
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
class HuggingFaceInferenceAPI(BaseLLM):
|
| 13 |
+
"""
|
| 14 |
+
Remote model access via HuggingFace Inference API.
|
| 15 |
+
Best for larger models (7B+) that don't fit in local RAM.
|
| 16 |
+
"""
|
| 17 |
+
|
| 18 |
+
def __init__(self, name: str, model_id: str, token: str = None):
|
| 19 |
+
super().__init__(name, model_id)
|
| 20 |
+
self.token = token or os.getenv("HF_TOKEN")
|
| 21 |
+
self.client: Optional[InferenceClient] = None
|
| 22 |
+
|
| 23 |
+
async def initialize(self) -> None:
|
| 24 |
+
"""Initialize the Inference API client."""
|
| 25 |
+
if self._initialized:
|
| 26 |
+
return
|
| 27 |
+
|
| 28 |
+
try:
|
| 29 |
+
print(f"[{self.name}] Initializing Inference API for: {self.model_id}")
|
| 30 |
+
|
| 31 |
+
self.client = InferenceClient(
|
| 32 |
+
model=self.model_id,
|
| 33 |
+
token=self.token
|
| 34 |
+
)
|
| 35 |
+
|
| 36 |
+
self._initialized = True
|
| 37 |
+
print(f"[{self.name}] Inference API ready")
|
| 38 |
+
|
| 39 |
+
except Exception as e:
|
| 40 |
+
print(f"[{self.name}] Failed to initialize: {e}")
|
| 41 |
+
raise
|
| 42 |
+
|
| 43 |
+
async def generate(
|
| 44 |
+
self,
|
| 45 |
+
prompt: str = None,
|
| 46 |
+
chat_messages: List[Dict[str, str]] = None,
|
| 47 |
+
max_new_tokens: int = 150,
|
| 48 |
+
temperature: float = 0.7,
|
| 49 |
+
top_p: float = 0.9,
|
| 50 |
+
**kwargs
|
| 51 |
+
) -> str:
|
| 52 |
+
"""Generate text using HuggingFace Inference API."""
|
| 53 |
+
|
| 54 |
+
if not self._initialized or not self.client:
|
| 55 |
+
raise RuntimeError(f"[{self.name}] Client not initialized")
|
| 56 |
+
|
| 57 |
+
try:
|
| 58 |
+
# Use chat completion if chat_messages provided
|
| 59 |
+
if chat_messages:
|
| 60 |
+
response = self.client.chat_completion(
|
| 61 |
+
messages=chat_messages,
|
| 62 |
+
max_tokens=max_new_tokens,
|
| 63 |
+
temperature=temperature,
|
| 64 |
+
top_p=top_p,
|
| 65 |
+
)
|
| 66 |
+
return response.choices[0].message.content.strip()
|
| 67 |
+
|
| 68 |
+
# Otherwise use text generation
|
| 69 |
+
elif prompt:
|
| 70 |
+
response = self.client.text_generation(
|
| 71 |
+
prompt=prompt,
|
| 72 |
+
max_new_tokens=max_new_tokens,
|
| 73 |
+
temperature=temperature,
|
| 74 |
+
top_p=top_p,
|
| 75 |
+
do_sample=True,
|
| 76 |
+
)
|
| 77 |
+
return response.strip()
|
| 78 |
+
|
| 79 |
+
else:
|
| 80 |
+
raise ValueError("Either prompt or chat_messages required")
|
| 81 |
+
|
| 82 |
+
except Exception as e:
|
| 83 |
+
print(f"[{self.name}] Generation error: {e}")
|
| 84 |
+
raise
|
| 85 |
+
|
| 86 |
+
def get_info(self) -> Dict[str, Any]:
|
| 87 |
+
"""Return model info."""
|
| 88 |
+
return {
|
| 89 |
+
"name": self.name,
|
| 90 |
+
"model_id": self.model_id,
|
| 91 |
+
"type": "inference_api",
|
| 92 |
+
"initialized": self._initialized,
|
| 93 |
+
}
|
app/models/huggingface_local.py
ADDED
|
@@ -0,0 +1,133 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Local HuggingFace model implementation using transformers pipeline.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from typing import List, Dict, Any, Optional
|
| 6 |
+
from transformers import pipeline, AutoTokenizer
|
| 7 |
+
import torch
|
| 8 |
+
import asyncio
|
| 9 |
+
|
| 10 |
+
from app.models.base_llm import BaseLLM
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
class HuggingFaceLocal(BaseLLM):
|
| 14 |
+
"""
|
| 15 |
+
Local HuggingFace model loaded into container memory.
|
| 16 |
+
Best for smaller models (< 3B parameters) that fit in RAM.
|
| 17 |
+
"""
|
| 18 |
+
|
| 19 |
+
def __init__(self, name: str, model_id: str, device: str = "cpu"):
|
| 20 |
+
super().__init__(name, model_id)
|
| 21 |
+
self.device = device
|
| 22 |
+
self.pipeline = None
|
| 23 |
+
self.tokenizer = None
|
| 24 |
+
|
| 25 |
+
# Determine device index
|
| 26 |
+
if device == "cuda" and torch.cuda.is_available():
|
| 27 |
+
self.device_index = 0
|
| 28 |
+
else:
|
| 29 |
+
self.device_index = -1 # CPU
|
| 30 |
+
|
| 31 |
+
async def initialize(self) -> None:
|
| 32 |
+
"""Load model into memory."""
|
| 33 |
+
if self._initialized:
|
| 34 |
+
return
|
| 35 |
+
|
| 36 |
+
try:
|
| 37 |
+
print(f"[{self.name}] Loading local model: {self.model_id}")
|
| 38 |
+
|
| 39 |
+
self.tokenizer = await asyncio.to_thread(
|
| 40 |
+
AutoTokenizer.from_pretrained,
|
| 41 |
+
self.model_id,
|
| 42 |
+
trust_remote_code=True
|
| 43 |
+
)
|
| 44 |
+
|
| 45 |
+
self.pipeline = await asyncio.to_thread(
|
| 46 |
+
pipeline,
|
| 47 |
+
"text-generation",
|
| 48 |
+
model=self.model_id,
|
| 49 |
+
tokenizer=self.tokenizer,
|
| 50 |
+
device=self.device_index,
|
| 51 |
+
torch_dtype=torch.float32,
|
| 52 |
+
trust_remote_code=True,
|
| 53 |
+
)
|
| 54 |
+
|
| 55 |
+
self._initialized = True
|
| 56 |
+
print(f"[{self.name}] Model loaded successfully")
|
| 57 |
+
|
| 58 |
+
except Exception as e:
|
| 59 |
+
print(f"[{self.name}] Failed to load model: {e}")
|
| 60 |
+
raise
|
| 61 |
+
|
| 62 |
+
async def generate(
|
| 63 |
+
self,
|
| 64 |
+
prompt: str = None,
|
| 65 |
+
chat_messages: List[Dict[str, str]] = None,
|
| 66 |
+
max_new_tokens: int = 150,
|
| 67 |
+
temperature: float = 0.7,
|
| 68 |
+
top_p: float = 0.9,
|
| 69 |
+
**kwargs
|
| 70 |
+
) -> str:
|
| 71 |
+
"""Generate text using local pipeline."""
|
| 72 |
+
|
| 73 |
+
if not self._initialized:
|
| 74 |
+
raise RuntimeError(f"[{self.name}] Model not initialized")
|
| 75 |
+
|
| 76 |
+
# Format prompt
|
| 77 |
+
if chat_messages:
|
| 78 |
+
try:
|
| 79 |
+
formatted_prompt = self.tokenizer.apply_chat_template(
|
| 80 |
+
chat_messages,
|
| 81 |
+
tokenize=False,
|
| 82 |
+
add_generation_prompt=True
|
| 83 |
+
)
|
| 84 |
+
except Exception:
|
| 85 |
+
# Fallback to raw prompt
|
| 86 |
+
if prompt:
|
| 87 |
+
formatted_prompt = prompt
|
| 88 |
+
else:
|
| 89 |
+
raise ValueError("Cannot generate without valid prompt")
|
| 90 |
+
elif prompt:
|
| 91 |
+
formatted_prompt = prompt
|
| 92 |
+
else:
|
| 93 |
+
raise ValueError("Either prompt or chat_messages required")
|
| 94 |
+
|
| 95 |
+
# Generate
|
| 96 |
+
outputs = await asyncio.to_thread(
|
| 97 |
+
self.pipeline,
|
| 98 |
+
formatted_prompt,
|
| 99 |
+
max_new_tokens=max_new_tokens,
|
| 100 |
+
do_sample=True,
|
| 101 |
+
temperature=temperature,
|
| 102 |
+
top_p=top_p,
|
| 103 |
+
eos_token_id=self.tokenizer.eos_token_id,
|
| 104 |
+
pad_token_id=self.tokenizer.eos_token_id if self.tokenizer.pad_token_id is None else self.tokenizer.pad_token_id,
|
| 105 |
+
)
|
| 106 |
+
|
| 107 |
+
# Extract response
|
| 108 |
+
if outputs and isinstance(outputs, list) and "generated_text" in outputs[0]:
|
| 109 |
+
full_text = outputs[0]["generated_text"]
|
| 110 |
+
|
| 111 |
+
# Remove prompt from output
|
| 112 |
+
if full_text.startswith(formatted_prompt):
|
| 113 |
+
response = full_text[len(formatted_prompt):]
|
| 114 |
+
else:
|
| 115 |
+
response = full_text
|
| 116 |
+
|
| 117 |
+
# Clean up special tokens
|
| 118 |
+
if response.endswith("<|im_end|>"):
|
| 119 |
+
response = response[:-len("<|im_end|>")]
|
| 120 |
+
|
| 121 |
+
return response.strip()
|
| 122 |
+
|
| 123 |
+
return ""
|
| 124 |
+
|
| 125 |
+
def get_info(self) -> Dict[str, Any]:
|
| 126 |
+
"""Return model info."""
|
| 127 |
+
return {
|
| 128 |
+
"name": self.name,
|
| 129 |
+
"model_id": self.model_id,
|
| 130 |
+
"type": "local",
|
| 131 |
+
"initialized": self._initialized,
|
| 132 |
+
"device": self.device
|
| 133 |
+
}
|
app/models/registry.py
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Model Registry - Central configuration and factory for all LLM models.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
from typing import Dict, List, Any, Optional
|
| 7 |
+
|
| 8 |
+
from app.models.base_llm import BaseLLM
|
| 9 |
+
from app.models.huggingface_local import HuggingFaceLocal
|
| 10 |
+
from app.models.huggingface_inference_api import HuggingFaceInferenceAPI
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
# Model configuration
|
| 14 |
+
MODEL_CONFIG = {
|
| 15 |
+
"bielik-1.5b": {
|
| 16 |
+
"id": "speakleash/Bielik-1.5B-v3.0-Instruct",
|
| 17 |
+
"type": "local",
|
| 18 |
+
"polish_support": "excellent",
|
| 19 |
+
"size": "1.5B",
|
| 20 |
+
},
|
| 21 |
+
"pllum-12b": {
|
| 22 |
+
"id": "CYFRAGOVPL/PLLuM-12B-instruct",
|
| 23 |
+
"type": "inference_api",
|
| 24 |
+
"polish_support": "excellent",
|
| 25 |
+
"size": "12B",
|
| 26 |
+
},
|
| 27 |
+
"mistral-small-3": {
|
| 28 |
+
"id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
|
| 29 |
+
"type": "inference_api",
|
| 30 |
+
"polish_support": "good",
|
| 31 |
+
"size": "24B",
|
| 32 |
+
},
|
| 33 |
+
"gemma-2-9b": {
|
| 34 |
+
"id": "google/gemma-2-9b-it",
|
| 35 |
+
"type": "inference_api",
|
| 36 |
+
"polish_support": "medium",
|
| 37 |
+
"size": "9B",
|
| 38 |
+
},
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
# For local model override (when model is pre-downloaded in container)
|
| 42 |
+
LOCAL_MODEL_PATH = os.getenv("LOCAL_MODEL_PATH", "/app/pretrain_model")
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
class ModelRegistry:
|
| 46 |
+
"""
|
| 47 |
+
Central registry for managing all LLM models.
|
| 48 |
+
Handles model instantiation, initialization, and access.
|
| 49 |
+
"""
|
| 50 |
+
|
| 51 |
+
def __init__(self):
|
| 52 |
+
self._models: Dict[str, BaseLLM] = {}
|
| 53 |
+
self._config = MODEL_CONFIG.copy()
|
| 54 |
+
|
| 55 |
+
def _create_model(self, name: str) -> BaseLLM:
|
| 56 |
+
"""Factory method to create model instance."""
|
| 57 |
+
|
| 58 |
+
if name not in self._config:
|
| 59 |
+
raise ValueError(f"Unknown model: {name}")
|
| 60 |
+
|
| 61 |
+
config = self._config[name]
|
| 62 |
+
model_type = config["type"]
|
| 63 |
+
model_id = config["id"]
|
| 64 |
+
|
| 65 |
+
# Special case: local Bielik uses pre-downloaded path
|
| 66 |
+
if name == "bielik-1.5b" and os.path.exists(LOCAL_MODEL_PATH):
|
| 67 |
+
model_id = LOCAL_MODEL_PATH
|
| 68 |
+
|
| 69 |
+
if model_type == "local":
|
| 70 |
+
return HuggingFaceLocal(
|
| 71 |
+
name=name,
|
| 72 |
+
model_id=model_id,
|
| 73 |
+
device="cpu"
|
| 74 |
+
)
|
| 75 |
+
elif model_type == "inference_api":
|
| 76 |
+
return HuggingFaceInferenceAPI(
|
| 77 |
+
name=name,
|
| 78 |
+
model_id=model_id
|
| 79 |
+
)
|
| 80 |
+
else:
|
| 81 |
+
raise ValueError(f"Unknown model type: {model_type}")
|
| 82 |
+
|
| 83 |
+
async def get_model(self, name: str) -> BaseLLM:
|
| 84 |
+
"""Get or create and initialize a model."""
|
| 85 |
+
|
| 86 |
+
if name not in self._models:
|
| 87 |
+
model = self._create_model(name)
|
| 88 |
+
await model.initialize()
|
| 89 |
+
self._models[name] = model
|
| 90 |
+
|
| 91 |
+
return self._models[name]
|
| 92 |
+
|
| 93 |
+
async def initialize_model(self, name: str) -> None:
|
| 94 |
+
"""Pre-initialize a specific model."""
|
| 95 |
+
await self.get_model(name)
|
| 96 |
+
|
| 97 |
+
async def initialize_local_models(self) -> None:
|
| 98 |
+
"""Initialize all local models at startup."""
|
| 99 |
+
for name, config in self._config.items():
|
| 100 |
+
if config["type"] == "local":
|
| 101 |
+
await self.initialize_model(name)
|
| 102 |
+
|
| 103 |
+
def list_models(self) -> List[Dict[str, Any]]:
|
| 104 |
+
"""List all available models with their info."""
|
| 105 |
+
models = []
|
| 106 |
+
for name, config in self._config.items():
|
| 107 |
+
model_info = {
|
| 108 |
+
"name": name,
|
| 109 |
+
"model_id": config["id"],
|
| 110 |
+
"type": config["type"],
|
| 111 |
+
"polish_support": config["polish_support"],
|
| 112 |
+
"size": config["size"],
|
| 113 |
+
"initialized": name in self._models and self._models[name].is_initialized,
|
| 114 |
+
}
|
| 115 |
+
models.append(model_info)
|
| 116 |
+
return models
|
| 117 |
+
|
| 118 |
+
def get_available_model_names(self) -> List[str]:
|
| 119 |
+
"""Get list of available model names."""
|
| 120 |
+
return list(self._config.keys())
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
# Global registry instance
|
| 124 |
+
registry = ModelRegistry()
|
app/schemas/schemas.py
CHANGED
|
@@ -1,4 +1,6 @@
|
|
| 1 |
from pydantic import BaseModel
|
|
|
|
|
|
|
| 2 |
|
| 3 |
class EnhancedDescriptionResponse(BaseModel):
|
| 4 |
description: str
|
|
@@ -6,3 +8,32 @@ class EnhancedDescriptionResponse(BaseModel):
|
|
| 6 |
generation_time: float
|
| 7 |
user_email: str
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
from pydantic import BaseModel
|
| 2 |
+
from typing import List, Optional, Dict, Any
|
| 3 |
+
|
| 4 |
|
| 5 |
class EnhancedDescriptionResponse(BaseModel):
|
| 6 |
description: str
|
|
|
|
| 8 |
generation_time: float
|
| 9 |
user_email: str
|
| 10 |
|
| 11 |
+
|
| 12 |
+
class ModelInfo(BaseModel):
|
| 13 |
+
name: str
|
| 14 |
+
model_id: str
|
| 15 |
+
type: str
|
| 16 |
+
polish_support: str
|
| 17 |
+
size: str
|
| 18 |
+
initialized: bool
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
class CompareRequest(BaseModel):
|
| 22 |
+
domain: str
|
| 23 |
+
data: Dict[str, Any]
|
| 24 |
+
models: Optional[List[str]] = None # If None, use all models
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
class ModelResult(BaseModel):
|
| 28 |
+
model: str
|
| 29 |
+
output: str
|
| 30 |
+
time: float
|
| 31 |
+
type: str
|
| 32 |
+
error: Optional[str] = None
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
class CompareResponse(BaseModel):
|
| 36 |
+
domain: str
|
| 37 |
+
results: List[ModelResult]
|
| 38 |
+
total_time: float
|
| 39 |
+
|
llm_app_rework.md
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LLM App Rework Plan
|
| 2 |
+
|
| 3 |
+
## Goal
|
| 4 |
+
Transform single-model app β multi-model comparison platform for A/B testing open-source LLMs on car descriptions.
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## Current State
|
| 9 |
+
- Single model: Bielik-1.5B (local HuggingFace)
|
| 10 |
+
- Single domain: cars
|
| 11 |
+
- No comparison capability
|
| 12 |
+
|
| 13 |
+
## Target State
|
| 14 |
+
- Multiple open-source LLMs via HuggingFace
|
| 15 |
+
- Same prompt β multiple outputs β compare results
|
| 16 |
+
- Support compression/decompression testing
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## Architecture Changes
|
| 21 |
+
|
| 22 |
+
### 1. Model Registry
|
| 23 |
+
```
|
| 24 |
+
app/models/
|
| 25 |
+
βββ registry.py # Model registry + factory
|
| 26 |
+
βββ base_llm.py # Abstract base class
|
| 27 |
+
βββ huggingface_local.py # Refactored current service
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
### 2. Base LLM Interface
|
| 31 |
+
```python
|
| 32 |
+
class BaseLLM(ABC):
|
| 33 |
+
name: str
|
| 34 |
+
model_id: str
|
| 35 |
+
async def generate(prompt, **params) -> str
|
| 36 |
+
async def initialize() -> None
|
| 37 |
+
def is_initialized() -> bool
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
### 3. Model Registry
|
| 41 |
+
```python
|
| 42 |
+
MODELS = {
|
| 43 |
+
"bielik-1.5b": {"id": "speakleash/Bielik-1.5B-v3.0-Instruct", "type": "local"},
|
| 44 |
+
"pllum-12b": {"id": "CYFRAGOVPL/PLLuM-12B-instruct", "type": "inference_api"},
|
| 45 |
+
"mistral-small-3": {"id": "mistralai/Mistral-Small-3.1-24B-Instruct-2503", "type": "inference_api"},
|
| 46 |
+
"gemma-2-9b": {"id": "google/gemma-2-9b-it", "type": "inference_api"},
|
| 47 |
+
}
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
### 4. Two Model Types
|
| 51 |
+
| Type | Description | Use Case |
|
| 52 |
+
|------|-------------|----------|
|
| 53 |
+
| `local` | Loaded in container memory | Bielik-1.5B (small, fits in RAM) |
|
| 54 |
+
| `inference_api` | HuggingFace Inference API | Larger models (7B+) via API |
|
| 55 |
+
|
| 56 |
+
### 5. New Endpoints
|
| 57 |
+
|
| 58 |
+
| Endpoint | Purpose |
|
| 59 |
+
|----------|---------|
|
| 60 |
+
| `POST /enhance` | Single model (existing) |
|
| 61 |
+
| `POST /compare` | Multiple models, return all outputs |
|
| 62 |
+
| `GET /models` | List available models |
|
| 63 |
+
|
| 64 |
+
### 6. Compare Request/Response
|
| 65 |
+
```python
|
| 66 |
+
# Request
|
| 67 |
+
{
|
| 68 |
+
"domain": "cars",
|
| 69 |
+
"data": {...},
|
| 70 |
+
"models": ["bielik-1.5b", "pllum-12b", "mistral-small-3", "gemma-2-9b"]
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
# Response
|
| 74 |
+
{
|
| 75 |
+
"results": [
|
| 76 |
+
{"model": "bielik-1.5b", "output": "...", "time": 2.3, "type": "local"},
|
| 77 |
+
{"model": "pllum-12b", "output": "...", "time": 1.1, "type": "inference_api"},
|
| 78 |
+
{"model": "mistral-small-3", "output": "...", "time": 0.9, "type": "inference_api"},
|
| 79 |
+
{"model": "gemma-2-9b", "output": "...", "time": 1.0, "type": "inference_api"}
|
| 80 |
+
]
|
| 81 |
+
}
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
---
|
| 85 |
+
|
| 86 |
+
## Implementation Steps
|
| 87 |
+
|
| 88 |
+
1. **Create base_llm.py** - abstract interface
|
| 89 |
+
2. **Create huggingface_inference_api.py** - HF Inference API client
|
| 90 |
+
3. **Refactor huggingface_service.py** β HuggingFaceLocal (implements BaseLLM)
|
| 91 |
+
4. **Create registry.py** - model factory + config
|
| 92 |
+
5. **Add /compare endpoint** in main.py
|
| 93 |
+
6. **Add /models endpoint** - list available
|
| 94 |
+
7. **Update schemas** - CompareRequest, CompareResponse
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
+
|
| 98 |
+
## HuggingFace Inference API
|
| 99 |
+
```python
|
| 100 |
+
from huggingface_hub import InferenceClient
|
| 101 |
+
|
| 102 |
+
client = InferenceClient(token=HF_TOKEN)
|
| 103 |
+
response = client.text_generation(
|
| 104 |
+
model="mistralai/Mistral-7B-Instruct-v0.3",
|
| 105 |
+
prompt=formatted_prompt,
|
| 106 |
+
max_new_tokens=150
|
| 107 |
+
)
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
|
| 112 |
+
## Env Vars (HuggingFace Secrets)
|
| 113 |
+
```
|
| 114 |
+
HF_TOKEN=hf_... # For Inference API access
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
---
|
| 118 |
+
|
| 119 |
+
## Models (Approved)
|
| 120 |
+
|
| 121 |
+
| Model | Size | Polish Support | HuggingFace ID |
|
| 122 |
+
|-------|------|----------------|----------------|
|
| 123 |
+
| Bielik-1.5B | 1.5B | Excellent | speakleash/Bielik-1.5B-v3.0-Instruct |
|
| 124 |
+
| PLLuM-12B | 12B | Excellent | CYFRAGOVPL/PLLuM-12B-instruct |
|
| 125 |
+
| Mistral-Small-3 | 24B | Good | mistralai/Mistral-Small-3.1-24B-Instruct-2503 |
|
| 126 |
+
| Gemma-2-9B | 9B | Medium | google/gemma-2-9b-it |
|
| 127 |
+
|
| 128 |
+
---
|
| 129 |
+
|
| 130 |
+
## Priority
|
| 131 |
+
1. HuggingFace Inference API integration
|
| 132 |
+
2. /compare endpoint
|
| 133 |
+
3. /models endpoint
|
| 134 |
+
|
| 135 |
+
---
|
| 136 |
+
|
| 137 |
+
## Notes
|
| 138 |
+
- All models = open source via HuggingFace
|
| 139 |
+
- Local model = Bielik-1.5B (already works)
|
| 140 |
+
- Larger models = HF Inference API (no local GPU needed)
|
| 141 |
+
- HF_TOKEN needed for gated models (Gemma, etc)
|
requirements.txt
CHANGED
|
@@ -1,4 +1,5 @@
|
|
| 1 |
fastapi
|
| 2 |
uvicorn[standard]
|
| 3 |
transformers[torch]
|
| 4 |
-
accelerate
|
|
|
|
|
|
| 1 |
fastapi
|
| 2 |
uvicorn[standard]
|
| 3 |
transformers[torch]
|
| 4 |
+
accelerate
|
| 5 |
+
huggingface_hub
|