bielik_app_service / PROJECT_CONTEXT.md
Patryk Studzinski
adding-github-files-to-spaces
9a9ec03
|
raw
history blame
4.96 kB

GPT4All Service - Project Context

Project Overview

This is a Polish Car Description Enhancement Service built as a FastAPI microservice that uses a Hugging Face Large Language Model to generate enhanced marketing descriptions for cars in Polish language.

Core Functionality

The service takes basic car information (make, model, year, mileage, features, condition) and generates compelling, marketing-friendly descriptions in Polish using the speakleash/Bielik-1.5B-v3.0-Instruct model - a Polish language model from the Bielik series.

Project Structure

gpt4all-service/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py                    # FastAPI application with endpoints
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── huggingface_service.py # Core LLM service wrapper
β”‚   └── schemas/
β”‚       └── schemas.py             # Pydantic data models
β”œβ”€β”€ Dockerfile                     # Multi-stage Docker build
β”œβ”€β”€ download_model.py             # Model download script for Docker
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ start_container.ps1           # PowerShell startup script
β”œβ”€β”€ start_container.sh            # Bash startup script
└── README.md                     # Comprehensive documentation

Technical Architecture

1. FastAPI Application (app/main.py)

  • Framework: FastAPI with CORS middleware
  • Main Endpoint: POST /enhance-description - takes car data, returns enhanced description
  • Health Check: GET /health - service status and model initialization check
  • CORS: Configured for frontend on http://localhost:5173 (likely React/Vue dev server)

2. LLM Service (app/models/huggingface_service.py)

  • Purpose: Wrapper around Hugging Face Transformers pipeline
  • Model: speakleash/Bielik-1.5B-v3.0-Instruct (Polish language model)
  • Features:
    • Async initialization and text generation
    • Support for both GPU (CUDA) and CPU inference
    • Chat template support for conversation-style prompts
    • Configurable generation parameters (temperature, top_p, max_tokens)
    • Smart response parsing to extract only the assistant's response

3. Data Models (app/schemas/schemas.py)

  • CarData: Input model with make, model, year, mileage, features[], condition
  • EnhancedDescriptionResponse: Output model with generated description

4. Containerization

  • Docker: Self-contained image with pre-downloaded model (~3.2GB)
  • Security: Uses Docker BuildKit secrets for Hugging Face token handling
  • Model Storage: Downloaded to /app/pretrain_model during build
  • Runtime: Python 3.9-slim base image

Key Technical Details

Model Configuration

  • Model Path: /app/pretrain_model (in container) or configurable for local dev
  • Device: Currently set to CPU in main.py, but service supports GPU
  • Generation Params: 150 max tokens, temperature 0.75, top_p 0.9

Prompt Engineering

The service uses a carefully crafted Polish system prompt:

  • Instructs the model to create marketing descriptions in Polish
  • Limits output to 500 characters maximum
  • Tells the model to ignore off-topic content
  • Uses chat template format with system/user roles

Dependencies

  • fastapi: Web framework
  • uvicorn[standard]: ASGI server
  • transformers[torch]: Hugging Face transformers with PyTorch
  • accelerate: Hugging Face optimization library

Current State & Issues

Git Status

  • Modified app/main.py (likely recent changes)
  • Deleted app/models/gpt4all.py (indicates migration from GPT4All to Hugging Face)

Linter Issues in huggingface_service.py

  1. Import issues: pipeline and AutoTokenizer imports need specific paths
  2. Type annotations: device: str = None should be Optional[str] = None
  3. Method parameters: Similar optional parameter typing issues

Usage Scenarios

  1. Car Dealership Websites: Auto-generate compelling descriptions from basic car specs
  2. Marketplace Applications: Enhance user-provided car listings
  3. Inventory Management: Bulk description generation for car databases

Deployment Options

  1. Local Development: Direct Python/uvicorn execution
  2. Docker Container: Self-contained deployment with pre-downloaded model
  3. Production: Containerized deployment with proper authentication

Authentication Requirements

  • Hugging Face Hub token required for model download (gated model)
  • Token stored in my_hf_token.txt during Docker build
  • Securely handled via Docker BuildKit secrets

Performance Considerations

  • Model size: ~3.2GB (significant memory footprint)
  • CPU inference: Slower but more accessible
  • GPU inference: Faster but requires CUDA setup
  • Async design: Non-blocking text generation

This service represents a specialized AI application for the Polish automotive market, focusing on generating marketing content using state-of-the-art Polish language models.