Spaces:
Running
Running
GPT4All Service - Project Context
Project Overview
This is a Polish Car Description Enhancement Service built as a FastAPI microservice that uses a Hugging Face Large Language Model to generate enhanced marketing descriptions for cars in Polish language.
Core Functionality
The service takes basic car information (make, model, year, mileage, features, condition) and generates compelling, marketing-friendly descriptions in Polish using the speakleash/Bielik-1.5B-v3.0-Instruct model - a Polish language model from the Bielik series.
Project Structure
gpt4all-service/
βββ app/
β βββ main.py # FastAPI application with endpoints
β βββ models/
β β βββ huggingface_service.py # Core LLM service wrapper
β βββ schemas/
β βββ schemas.py # Pydantic data models
βββ Dockerfile # Multi-stage Docker build
βββ download_model.py # Model download script for Docker
βββ requirements.txt # Python dependencies
βββ start_container.ps1 # PowerShell startup script
βββ start_container.sh # Bash startup script
βββ README.md # Comprehensive documentation
Technical Architecture
1. FastAPI Application (app/main.py)
- Framework: FastAPI with CORS middleware
- Main Endpoint:
POST /enhance-description- takes car data, returns enhanced description - Health Check:
GET /health- service status and model initialization check - CORS: Configured for frontend on
http://localhost:5173(likely React/Vue dev server)
2. LLM Service (app/models/huggingface_service.py)
- Purpose: Wrapper around Hugging Face Transformers pipeline
- Model:
speakleash/Bielik-1.5B-v3.0-Instruct(Polish language model) - Features:
- Async initialization and text generation
- Support for both GPU (CUDA) and CPU inference
- Chat template support for conversation-style prompts
- Configurable generation parameters (temperature, top_p, max_tokens)
- Smart response parsing to extract only the assistant's response
3. Data Models (app/schemas/schemas.py)
- CarData: Input model with make, model, year, mileage, features[], condition
- EnhancedDescriptionResponse: Output model with generated description
4. Containerization
- Docker: Self-contained image with pre-downloaded model (~3.2GB)
- Security: Uses Docker BuildKit secrets for Hugging Face token handling
- Model Storage: Downloaded to
/app/pretrain_modelduring build - Runtime: Python 3.9-slim base image
Key Technical Details
Model Configuration
- Model Path:
/app/pretrain_model(in container) or configurable for local dev - Device: Currently set to CPU in main.py, but service supports GPU
- Generation Params: 150 max tokens, temperature 0.75, top_p 0.9
Prompt Engineering
The service uses a carefully crafted Polish system prompt:
- Instructs the model to create marketing descriptions in Polish
- Limits output to 500 characters maximum
- Tells the model to ignore off-topic content
- Uses chat template format with system/user roles
Dependencies
- fastapi: Web framework
- uvicorn[standard]: ASGI server
- transformers[torch]: Hugging Face transformers with PyTorch
- accelerate: Hugging Face optimization library
Current State & Issues
Git Status
- Modified
app/main.py(likely recent changes) - Deleted
app/models/gpt4all.py(indicates migration from GPT4All to Hugging Face)
Linter Issues in huggingface_service.py
- Import issues:
pipelineandAutoTokenizerimports need specific paths - Type annotations:
device: str = Noneshould beOptional[str] = None - Method parameters: Similar optional parameter typing issues
Usage Scenarios
- Car Dealership Websites: Auto-generate compelling descriptions from basic car specs
- Marketplace Applications: Enhance user-provided car listings
- Inventory Management: Bulk description generation for car databases
Deployment Options
- Local Development: Direct Python/uvicorn execution
- Docker Container: Self-contained deployment with pre-downloaded model
- Production: Containerized deployment with proper authentication
Authentication Requirements
- Hugging Face Hub token required for model download (gated model)
- Token stored in
my_hf_token.txtduring Docker build - Securely handled via Docker BuildKit secrets
Performance Considerations
- Model size: ~3.2GB (significant memory footprint)
- CPU inference: Slower but more accessible
- GPU inference: Faster but requires CUDA setup
- Async design: Non-blocking text generation
This service represents a specialized AI application for the Polish automotive market, focusing on generating marketing content using state-of-the-art Polish language models.