Spaces:

studzinsky
/

bielik_app_service

Running

App Files Files Community

bielik_app_service / PROJECT_CONTEXT.md

Patryk Studzinski

adding-github-files-to-spaces

9a9ec03 6 months ago

preview code

raw

history blame

4.96 kB

GPT4All Service - Project Context

Project Overview

This is a Polish Car Description Enhancement Service built as a FastAPI microservice that uses a Hugging Face Large Language Model to generate enhanced marketing descriptions for cars in Polish language.

Core Functionality

The service takes basic car information (make, model, year, mileage, features, condition) and generates compelling, marketing-friendly descriptions in Polish using the speakleash/Bielik-1.5B-v3.0-Instruct model - a Polish language model from the Bielik series.

Project Structure

gpt4all-service/
├── app/
│   ├── main.py                    # FastAPI application with endpoints
│   ├── models/
│   │   └── huggingface_service.py # Core LLM service wrapper
│   └── schemas/
│       └── schemas.py             # Pydantic data models
├── Dockerfile                     # Multi-stage Docker build
├── download_model.py             # Model download script for Docker
├── requirements.txt              # Python dependencies
├── start_container.ps1           # PowerShell startup script
├── start_container.sh            # Bash startup script
└── README.md                     # Comprehensive documentation

Technical Architecture

1. FastAPI Application (`app/main.py`)

Framework: FastAPI with CORS middleware
Main Endpoint: POST /enhance-description - takes car data, returns enhanced description
Health Check: GET /health - service status and model initialization check
CORS: Configured for frontend on http://localhost:5173 (likely React/Vue dev server)

2. LLM Service (`app/models/huggingface_service.py`)

Purpose: Wrapper around Hugging Face Transformers pipeline
Model: speakleash/Bielik-1.5B-v3.0-Instruct (Polish language model)
Features:
- Async initialization and text generation
- Support for both GPU (CUDA) and CPU inference
- Chat template support for conversation-style prompts
- Configurable generation parameters (temperature, top_p, max_tokens)
- Smart response parsing to extract only the assistant's response

3. Data Models (`app/schemas/schemas.py`)

CarData: Input model with make, model, year, mileage, features[], condition
EnhancedDescriptionResponse: Output model with generated description

4. Containerization

Docker: Self-contained image with pre-downloaded model (~3.2GB)
Security: Uses Docker BuildKit secrets for Hugging Face token handling
Model Storage: Downloaded to /app/pretrain_model during build
Runtime: Python 3.9-slim base image

Key Technical Details

Model Configuration

Model Path: /app/pretrain_model (in container) or configurable for local dev
Device: Currently set to CPU in main.py, but service supports GPU
Generation Params: 150 max tokens, temperature 0.75, top_p 0.9

Prompt Engineering

The service uses a carefully crafted Polish system prompt:

Instructs the model to create marketing descriptions in Polish
Limits output to 500 characters maximum
Tells the model to ignore off-topic content
Uses chat template format with system/user roles

Dependencies

fastapi: Web framework
uvicorn[standard]: ASGI server
transformers[torch]: Hugging Face transformers with PyTorch
accelerate: Hugging Face optimization library

Current State & Issues

Git Status

Modified app/main.py (likely recent changes)
Deleted app/models/gpt4all.py (indicates migration from GPT4All to Hugging Face)

Linter Issues in `huggingface_service.py`

Import issues: pipeline and AutoTokenizer imports need specific paths
Type annotations: device: str = None should be Optional[str] = None
Method parameters: Similar optional parameter typing issues

Usage Scenarios

Car Dealership Websites: Auto-generate compelling descriptions from basic car specs
Marketplace Applications: Enhance user-provided car listings
Inventory Management: Bulk description generation for car databases

Deployment Options

Local Development: Direct Python/uvicorn execution
Docker Container: Self-contained deployment with pre-downloaded model
Production: Containerized deployment with proper authentication

Authentication Requirements

Hugging Face Hub token required for model download (gated model)
Token stored in my_hf_token.txt during Docker build
Securely handled via Docker BuildKit secrets

Performance Considerations

Model size: ~3.2GB (significant memory footprint)
CPU inference: Slower but more accessible
GPU inference: Faster but requires CUDA setup
Async design: Non-blocking text generation

This service represents a specialized AI application for the Polish automotive market, focusing on generating marketing content using state-of-the-art Polish language models.