Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
title: chatassistant_retail
emoji: π¦
colorFrom: indigo
colorTo: pink
sdk: gradio
short_description: Web based conversational AI chatbot designed specifically...
chatassistant_retail
chatassistant_retail is a production-ready conversational AI chatbot designed specifically for the retail industry, providing intelligent assistance for retail inventory management. It features a multi-modal interface (text + images) powered by Azure OpenAI GPT-4o-mini, hybrid RAG search with Azure Cognitive Search, stateful conversation management via LangGraph, and flexible session persistence (Memory/Redis/PostgreSQL). The system includes a Gradio-based web UI, MCP tool integration, and comprehensive observability with LangFuse.
- Python Version: >= 3.10 (tested on 3.10, 3.11, 3.12, and 3.13)
Table of Contents
- Features
- Architecture
- Project Structure
- Installation
- Usage
- Development
- Scripts Reference
- Session Management
- Multi-Modal Processing
- Observability
- Deployment Options
- Testing
- Contributing
- Credits
Features
Core Capabilities
- Conversational Interface: Gradio-based web UI for natural language interactions with retail inventory systems
- Retail Inventory Management: Specialized chatbot for handling inventory queries, stock levels, and purchase orders
- Natural Language Understanding: Powered by Azure OpenAI (GPT-4o-mini) for understanding and responding to retail-related questions
- Agentic Workflow: LangGraph-based state machine for complex multi-step conversations
- Tool Integration: MCP (Model Context Protocol) server for inventory and purchase order tools
Technical Features
- Multi-Modal Input Processing: Handle both text and images (PNG, JPG, JPEG, WebP) for product analysis and visual queries
- Context Caching System: Smart data reuse across conversation turns reduces redundant I/O operations and improves response times
- Image-Based Product Lookup: Upload product images for AI-powered identification, catalog matching, and automated inventory checking
- Low-Stock Automation: Automatic reorder recommendations when visually identified products are low in stock
- LangGraph Orchestration: Stateful conversation management with persistent session storage
- Hybrid RAG Search: Vector + keyword + semantic search via Azure Cognitive Search with automatic fallback to local data
- Flexible Session Persistence: Three backend options - Memory (fast, ephemeral), Redis (distributed), or PostgreSQL (full persistence)
- Semantic Search: AI-powered relevance ranking with Free tier support (1,000 queries/month)
- Observability: Built-in LangFuse integration for tracing, monitoring, and analytics
- Async Processing: Asynchronous operations for high-performance request handling
- Graceful Fallbacks: Automatic degradation when Azure services unavailable (local data, keyword search)
Retail-Specific Features
- Inventory Queries: Check stock levels, product availability, and warehouse information
- Purchase Order Management: Create, track, and manage purchase orders
- Sample Data Generation: Generate realistic product catalogs and sales history using Faker (500+ products, 6 months sales)
- Product Search: Semantic search across product catalog using Azure Cognitive Search with visual product matching
Deployment & Development
- Deployment Flexibility: Local development, HuggingFace Spaces, or production deployment (Azure App Service, Docker, K8s)
- Development Tools: Comprehensive test suite, data generation scripts, Azure Search setup automation
- Multi-Environment Configuration: Environment-based settings with validation and graceful fallbacks
Architecture
The chatbot follows an agentic architecture pattern using LangGraph for state management and orchestration:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gradio Web Interface β
β βββββββββββββββ βββββββββββββββββ β
β β Chat UI β β Session β β
β β β β Management β β
β βββββββββββββββ βββββββββββββββββ β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β LangGraph State Manager β
β ββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β State Graph β β Session Store β β
β β (Workflow) β β (Memory/PostgreSQL) β β
β ββββββββββββββββββββ ββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β Business Logic Layer β
β ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
β β Azure β β Inventory β β Purchase β β
β β OpenAI β β Tools β β Order Tools β β
β ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β Data/Integration Layer β
β ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
β β PostgreSQL β β Azure β β Redis β β
β β (Sessions) β β Search β β (Cache) β β
β ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β Observability Layer β
β ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
β β LangFuse β β Metrics β β Python β β
β β Tracing β β Collector β β Logging β β
β ββββββββββββββββ βββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Components
- Gradio UI (
ui/): Web-based chat interface with multi-modal input (text + images) - LangGraph State Manager (
state/): Conversation state management with Memory/Redis/PostgreSQL session stores - Workflow Orchestration (
workflow/): Image-based product lookup with multi-step automation (vision β search β inventory β recommendations) - Context Utilities (
tools/context_utils.py): Smart caching layer for performance optimization and data coherence - Azure OpenAI Client (
llm/): Multi-modal LLM integration (GPT-4o-mini) with prompt templates and response parsing - Context-Aware Tools (
tools/): Inventory and purchase order tools with optional state parameter for intelligent cache reuse - RAG System (
rag/): Hybrid search with Azure Cognitive Search (vector + keyword + semantic) and local fallback - Observability (
observability/): LangFuse tracing and metrics collection across all components - Data Models (
data/): Pydantic models for products, sales, and purchase orders with sample data generation
Design Principles
- Separation of Concerns: Clear separation between UI, orchestration, business logic, and data layers
- Stateful Conversations: LangGraph manages conversation state with checkpointing
- Tool-Based Architecture: LLM invokes tools (inventory queries, purchase orders) through structured outputs
- Observable by Default: All LLM calls and tool invocations traced with LangFuse
- Error Resilience: Graceful degradation and comprehensive error handling
Project Structure
This project uses the src-layout pattern for better development and testing practices:
chatassistant_retail/
βββ app.py # HuggingFace Spaces entry point
β
βββ src/
β βββ chatassistant_retail/
β βββ __init__.py # Package initialization
β βββ __main__.py # Application entry point
β βββ cli.py # CLI entry point
β βββ chatbot.py # Main chatbot orchestrator (multi-modal)
β β
β βββ ui/ # Gradio web interface
β β βββ __init__.py
β β βββ gradio_app.py # Main Gradio application
β β βββ chat_interface.py # Chat UI components
β β βββ metrics_dashboard.py # Observability dashboard (UI currently disabled)
β β
β βββ state/ # LangGraph state management
β β βββ __init__.py
β β βββ langgraph_manager.py # State graph orchestration
β β βββ session_store.py # Abstract session interface
β β βββ memory_store.py # In-memory store (HF Spaces)
β β βββ redis_store.py # Redis store (distributed)
β β βββ postgresql_store.py # PostgreSQL store (persistent)
β β
β βββ llm/ # LLM integration
β β βββ __init__.py
β β βββ azure_openai_client.py # Azure OpenAI client (GPT-4o-mini)
β β βββ prompt_templates.py # System/user prompts
β β βββ response_parser.py # Response parsing
β β
β βββ workflow/ # NEW: Workflow orchestration
β β βββ __init__.py
β β βββ image_processor.py # Image-based product lookup
β β
β βββ tools/ # Inventory & PO tools
β β βββ __init__.py
β β βββ context_utils.py # NEW: Context caching utilities
β β βββ inventory_tools.py # UPDATED: Context-aware inventory operations
β β βββ purchase_order_tools.py # UPDATED: Context-aware PO operations
β β βββ mcp_server.py # UPDATED: MCP server with state passing
β β
β βββ rag/ # Azure Cognitive Search RAG
β β βββ __init__.py
β β βββ azure_search_client.py # Hybrid search client (vector+keyword+semantic)
β β βββ retriever.py # Document retrieval with fallback
β β βββ embeddings.py # Embedding generation
β β
β βββ data/ # Data models and generation
β β βββ __init__.py
β β βββ models.py # Product, Sale, PurchaseOrder models
β β βββ generator.py # Sample data generator (Faker)
β β
β βββ observability/ # LangFuse observability
β β βββ __init__.py
β β βββ langfuse_client.py # LangFuse wrapper
β β βββ decorators.py # @trace decorator
β β βββ metrics_collector.py # Metrics aggregation
β β
β βββ config/ # Configuration
β βββ __init__.py
β βββ settings.py # Pydantic settings (env-based)
β βββ deployment.py # Deployment configs
β
βββ data/ # Sample data files
β βββ products.json # 500+ sample products (216KB)
β βββ sales_history.json # 6 months sales data (3.5MB)
β βββ purchase_orders.json # Sample purchase orders
β
βββ scripts/ # Utility scripts
β βββ setup_azure_search.py # Azure Search index setup
β βββ generate_sample_data.py # Generate sample product/sales data
β βββ test_gradio_ui.py # UI testing script
β βββ test_phase2.py # Integration testing
β βββ test_phase3.py # E2E scenario testing
β
βββ tests/
β βββ __init__.py
β βββ unit/ # Unit tests
β β βββ test_context_utils.py # NEW: Context caching tests
β β βββ test_image_processor.py # NEW: Image workflow tests
β β βββ test_observability.py
β β βββ test_inventory_tools.py
β β βββ test_session_store.py
β β βββ test_retriever.py
β β βββ test_azure_search_client.py
β β βββ test_azure_openai_client.py
β β βββ test_mcp_server.py
β β βββ test_data_generator.py
β βββ integration/ # Integration tests
β β βββ test_tool_context_integration.py # NEW: Context-aware tool tests
β β βββ test_state_manager.py
β βββ test_chatassistant_retail.py # Main tests
β
βββ docs/ # Sphinx documentation
β βββ conf.py
β βββ index.rst
β βββ usage.rst
β
βββ .github/
β βββ workflows/
β βββ test.yml # CI/CD pipeline
β
βββ pyproject.toml # Project metadata and dependencies
βββ justfile # Task automation
βββ CLAUDE.md # Claude Code guidance
βββ README.md # This file
βββ HISTORY.md # Changelog
βββ LICENSE # MIT License
Key Directories
- app.py: HuggingFace Spaces entry point (sets deployment mode and launches Gradio on 0.0.0.0:7860)
- ui/: Gradio-based web interface with multi-modal chat
- state/: LangGraph state machine with three session backends (Memory/Redis/PostgreSQL)
- llm/: Azure OpenAI integration (GPT-4o-mini) with prompt engineering and multi-modal support
- tools/: Inventory and purchase order tools with MCP server integration
- rag/: Hybrid search with Azure Cognitive Search (vector+keyword+semantic) and local fallback
- data/: Pydantic data models and Faker-based sample data generation
- observability/: LangFuse tracing, metrics collection, and monitoring
- config/: Pydantic settings with environment variable support and validation
- scripts/: Setup scripts (Azure Search index, data generation, testing)
- data/ (root): Sample JSON files (products, sales history, purchase orders)
- tests/: Comprehensive PyTest suite (unit and integration tests)
Installation
Prerequisites
- Python >= 3.10
- uv (Rust-based Python package manager) - Required
- just (Command runner for task automation) - Required for development
- Azure OpenAI API access
- Azure Cognitive Search instance (optional, for RAG)
- PostgreSQL database (optional, for persistent sessions)
- Redis instance (optional, for caching)
- LangFuse account (optional, for observability)
Installing uv and just
Both uv and just need to be installed system-wide (not in a virtual environment):
macOS:
# Install with Homebrew (recommended)
brew install uv just
# Or install uv via curl
curl -LsSf https://astral.sh/uv/install.sh | sh
# And install just separately
brew install just
Linux:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install just
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash
Windows:
# Install uv
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# Install just with cargo
cargo install just
After installation, verify both are available:
uv --version # Should show: uv 0.9.x or later
just --version # Should show: just 1.x.x or later
Install from PyPI (when published)
pip install chatassistant_retail
Development Installation
Clone the repository:
git clone https://github.com/samir72/chatassistant_retail.git cd chatassistant_retailInstall dependencies with uv:
uv syncSet up environment variables (create
.envfile):# Azure OpenAI (Required) AZURE_OPENAI_API_KEY=your-api-key AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/ AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini # GPT-4o-mini deployment AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-ada-002 # For RAG embeddings AZURE_OPENAI_API_VERSION=2024-02-15-preview # Azure Cognitive Search (Optional - fallback to local data if not configured) AZURE_COGNITIVE_SEARCH_ENDPOINT=https://your-search.search.windows.net AZURE_COGNITIVE_SEARCH_API_KEY=your-search-key AZURE_SEARCH_INDEX_NAME=products # Note: Semantic search must be enabled in Azure Portal (Search Service β Semantic ranker β Free) # Session Persistence (Optional - defaults to Memory store) SESSION_STORE_TYPE=memory # Options: memory, redis, postgresql REDIS_URL=redis://localhost:6379/0 # If using redis POSTGRES_CONNECTION_STRING=postgresql://user:password@localhost:5432/chatbot # If using postgresql # Deployment Configuration (Optional) DEPLOYMENT_MODE=local # Options: local, hf_spaces LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR # LangFuse Observability (Optional - recommended for production) LANGFUSE_PUBLIC_KEY=pk-lf-... LANGFUSE_SECRET_KEY=sk-lf-... LANGFUSE_HOST=https://cloud.langfuse.com LANGFUSE_ENABLED=trueVerify installation:
just test
Sample Data
The repository includes sample data files for immediate testing and development:
| File | Size | Description |
|---|---|---|
data/products.json |
220 KB | 500+ sample products across 8 retail categories (Electronics, Clothing, Groceries, etc.) |
data/sales_history.json |
3.6 MB | 6 months of sales transaction history with seasonal patterns |
data/purchase_orders.json |
1.3 KB | Sample purchase orders (pending/fulfilled statuses) |
Total size: ~3.7 MB
Why Sample Data is Included
Previously, the data/ directory was excluded from version control to keep the repository lean. As of version 0.1.1, sample data is included by default for:
- β Faster setup for new developers
- β Consistent test data across environments
- β No need to run data generation scripts initially
Regenerating Sample Data (Optional)
If you want to regenerate or customize sample data:
python scripts/generate_sample_data.py
This will create fresh sample data with configurable parameters.
Usage
Setting Up Azure Cognitive Search
If you're using Azure Cognitive Search for RAG (Retrieval-Augmented Generation), you need to create the search index before running the chatbot.
Prerequisites
- Azure Cognitive Search service created
- Environment variables configured:
AZURE_COGNITIVE_SEARCH_ENDPOINTAZURE_COGNITIVE_SEARCH_API_KEYAZURE_SEARCH_INDEX_NAME(optional, defaults to "products")
Create the Index
Run the setup script:
python scripts/setup_azure_search.py
This will:
- Create the "products" index with the proper schema
- Configure vector search (1536-dimensional embeddings)
- Set up semantic search capabilities
- Optionally load 500 sample products with embeddings
Verify Setup
Check index health:
from chatassistant_retail.rag import AzureSearchClient
from chatassistant_retail.config import get_settings
client = AzureSearchClient(get_settings())
health = client.check_index_health()
print(f"Status: {health['overall_status']}")
print(f"Document count: {health['stats']['document_count']}")
Note: If the index doesn't exist, the chatbot will still work but will fall back to local product data without vector search capabilities.
Running the Chatbot
Start Gradio Web Interface
# Using Python module
python -m chatassistant_retail
# Or using the CLI entry point
chatassistant-retail
# With custom port
chatassistant-retail --port 7860
The web interface will be available at http://localhost:7860 with:
- Chat Interface: Main conversational UI with multi-modal support (text + images)
Command-Line Usage
# Interactive CLI mode
chatassistant-retail --cli
# Single query
chatassistant-retail --query "What is the stock level for SKU-12345?"
Example Conversations
Inventory Query
User: What is the current stock level for SKU-12345?
Assistant: Let me check the inventory for SKU-12345...
[Tool: inventory_tools.check_stock_level(sku="SKU-12345")]
The current stock level for SKU-12345 is 150 units across 3 warehouses.
Purchase Order Management
User: Create a purchase order for 500 units of SKU-67890 from Supplier ABC
Assistant: I'll create a purchase order for you...
[Tool: purchase_order_tools.create_po(sku="SKU-67890", quantity=500, supplier="ABC")]
Purchase order PO-2025-001 created successfully for 500 units of SKU-67890.
Expected delivery: 2025-01-15
Product Search with RAG
User: Tell me about our winter jacket inventory
Assistant: Let me search our product catalog...
[RAG: Azure Cognitive Search retrieval with semantic search]
We have 5 winter jacket styles in stock with a total of 1,250 units.
Most popular: "Alpine Puffer Jacket" (SKU-WJ-001) with 450 units available.
Multi-Modal Query (Image + Text)
User: [Uploads product image] Is this the same as SKU-12345? What's the stock level?
Assistant: Analyzing the product image...
[Multi-Modal: GPT-4o-mini processing]
[Tool: inventory_tools.check_stock_level(sku="SKU-12345")]
Yes, this appears to be the same product as SKU-12345 (Blue Athletic Sneaker).
Current stock: 85 units available across 2 warehouses.
Image-Based Product Workflow (NEW)
User: [Uploads product image] "Check inventory for this item"
Assistant: Let me analyze the product image...
[ImageProductProcessor workflow]
- Vision Extraction: Identifies "Wireless Optical Mouse" (Electronics, confidence: 0.89)
- Catalog Search: Finds 2 matching products via RAG
- Inventory Check:
* SKU-10001: 50 units (adequate stock)
* SKU-10002: 8 units (below reorder level of 20)
- Reorder Recommendation: Order 50 units of SKU-10002 (HIGH urgency, 5 days until stockout)
Response:
"π Product Identification Results
I identified: Wireless Optical Mouse
Category: Electronics
π¦ Matching Products in Inventory:
1. Wireless Optical Mouse (SKU-10001)
- Price: $24.99
- Current Stock: 50 units
- Reorder Level: 10 units
- Status: OK
- Supplier: TechSupplies Inc
2. Ergonomic Wireless Mouse (SKU-10002)
- Price: $34.99
- Current Stock: 8 units
- Reorder Level: 20 units
- Status: LOW STOCK
- Supplier: ComfortTech
π‘ Recommendations:
β οΈ Ergonomic Wireless Mouse (SKU-10002) is running low:
- Days until stockout: 5
- Suggested order quantity: 50 units
- Urgency: HIGH
Would you like me to create a purchase order?"
Development
This project uses just for task automation. All development commands are defined in the justfile.
Quick Start
# List all available commands
just list
# Run full QA pipeline (format, lint, type-check, test)
just qa
# Run tests
just test
# Run tests with debugger on failure
just pdb
# Generate coverage report
just coverage
Development Workflow
- Make changes to the codebase
- Run QA:
just qa(formats, lints, type-checks, and tests) - Debug failures:
just pdbif tests fail - Check coverage:
just coverageto ensure adequate test coverage - Build package:
just buildwhen ready to release
Code Quality Tools
- Ruff: Fast Python linter and formatter (line length: 120)
- isort: Import sorting (integrated with Ruff)
- ty: Type checking with all rules enabled
- pytest: Testing framework with async support and coverage reporting
Local Development Setup
# Activate virtual environment (if not using uv)
source .venv/bin/activate
# Install pre-commit hooks (optional)
pre-commit install
# Run development server with auto-reload
python -m chatassistant_retail --reload
# Run tests in watch mode
pytest-watch
Data Generation
The project includes tools for generating realistic sample data for development and testing.
Generating Sample Data
Use the generate_sample_data.py script to create sample products and sales history:
# Generate default data (500 products, 6 months sales history)
python scripts/generate_sample_data.py
# Custom data generation
python scripts/generate_sample_data.py --count 1000 --months 12
What Gets Generated
Products (data/products.json):
- 500+ realistic retail products (configurable)
- Categories: Electronics, Clothing, Home & Garden, Sports, Books
- Realistic pricing, descriptions, and metadata
- Auto-generated embeddings for Azure Search (1536 dimensions)
Sales History (data/sales_history.json):
- 6 months of transactional data (configurable)
- Realistic sales patterns (seasonality, trends)
- Multiple warehouses and channels
- Customer demographics
Purchase Orders (data/purchase_orders.json):
- Sample PO data for testing
- Various suppliers and statuses
- Delivery tracking information
Integration with Azure Search
The generated data includes pre-computed embeddings for immediate upload to Azure Search:
# 1. Generate sample data with embeddings
python scripts/generate_sample_data.py
# 2. Upload to Azure Search
python scripts/setup_azure_search.py --load-data
Customization
The data generator uses Faker for realistic data generation. Customize by modifying src/chatassistant_retail/data/generator.py.
Scripts Reference
The scripts/ directory contains utility scripts for setup, testing, and data management.
setup_azure_search.py
Purpose: Create and configure Azure Cognitive Search index with proper schema for RAG.
Usage:
# Create index (prompts to load sample data)
python scripts/setup_azure_search.py
# Create index and auto-load data
python scripts/setup_azure_search.py --load-data
# Recreate existing index
python scripts/setup_azure_search.py --recreate
What It Does:
- Creates "products" index with hybrid search configuration
- Configures vector search (HNSW algorithm, 1536 dimensions)
- Sets up semantic search capabilities
- Optionally loads 500 sample products with embeddings
- Uploads in batches of 100 for efficiency
- Performs health check and displays statistics
Requirements:
AZURE_COGNITIVE_SEARCH_ENDPOINTenvironment variableAZURE_COGNITIVE_SEARCH_API_KEYenvironment variableAZURE_OPENAI_EMBEDDING_DEPLOYMENTfor embedding generation
generate_sample_data.py
Purpose: Generate realistic retail data for development and testing.
Usage:
# Default: 500 products, 6 months sales
python scripts/generate_sample_data.py
# Custom counts
python scripts/generate_sample_data.py --count 1000 --months 12
# Dry run (don't save files)
python scripts/generate_sample_data.py --dry-run
Output:
data/products.json- Product catalog with embeddingsdata/sales_history.json- Sales transactionsdata/purchase_orders.json- PO data
test_gradio_ui.py
Purpose: Interactive testing of Gradio UI components.
Usage:
python scripts/test_gradio_ui.py
Launches the Gradio interface for manual testing and validation.
test_phase2.py / test_phase3.py
Purpose: Integration and end-to-end testing scripts.
Usage:
# Integration testing
python scripts/test_phase2.py
# E2E scenario testing
python scripts/test_phase3.py
Tests the complete chatbot workflow including LangGraph state management, tool execution, and RAG retrieval.
Session Management
The chatbot supports three different session storage backends for conversation state persistence, allowing you to choose the right balance between simplicity, performance, and durability.
Session Store Backends
Memory Store (Default)
The in-memory session store is the default and simplest option, ideal for development and HuggingFace Spaces deployment.
Characteristics:
- β Fast: No network latency, instant access
- β Simple: No external dependencies required
- β Auto-configured: Default for HF Spaces deployment
- β Ephemeral: Sessions lost on restart
- β Single-instance: Not shared across multiple app instances
Configuration:
SESSION_STORE_TYPE=memory # or omit (default)
Use When:
- Developing locally
- Deploying to HuggingFace Spaces
- Session persistence not critical
- Running single instance
Redis Store
Redis provides distributed session storage with persistence, ideal for production deployments with multiple instances.
Characteristics:
- β Fast: In-memory with disk persistence
- β Distributed: Shared across multiple app instances
- β Persistent: Survives app restarts (with RDB/AOF)
- β TTL Support: Automatic session expiration
- β οΈ Requires Redis: External service dependency
Configuration:
SESSION_STORE_TYPE=redis
REDIS_URL=redis://localhost:6379/0
# Or for production with auth:
REDIS_URL=redis://:password@redis-host:6379/0
Use When:
- Running multiple app instances (load balanced)
- Need distributed session sharing
- Want automatic session expiration
- Production deployment with high availability
PostgreSQL Store
PostgreSQL provides full persistence with queryable session history, ideal for audit requirements and analytics.
Characteristics:
- β Fully Persistent: Durable storage with ACID guarantees
- β Queryable: SQL access to session data and history
- β Audit Trail: Complete conversation history
- β Backup/Recovery: Standard database backup tools
- β οΈ Slower: Disk I/O overhead vs in-memory stores
- β οΈ Requires PostgreSQL: External database dependency
Configuration:
SESSION_STORE_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://user:password@localhost:5432/chatbot
Use When:
- Need complete audit trail
- Compliance/regulatory requirements
- Want to query conversation history
- Long-term session retention needed
- Analytics on conversation patterns
Configuration Examples
Local Development (Memory):
# .env
SESSION_STORE_TYPE=memory # Fast, simple, ephemeral
HuggingFace Spaces (Memory):
# Automatically configured via app.py
DEPLOYMENT_MODE=hf_spaces
SESSION_STORE_TYPE=memory # Default for HF Spaces
Production (Redis):
# .env
SESSION_STORE_TYPE=redis
REDIS_URL=redis://:your-password@redis.example.com:6379/0
Enterprise (PostgreSQL):
# .env
SESSION_STORE_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://chatbot:password@db.example.com:5432/chatbot
Choosing a Backend
| Criteria | Memory | Redis | PostgreSQL |
|---|---|---|---|
| Speed | βββ Fastest | ββ Very Fast | β Fast |
| Persistence | β None | ββ Configurable | βββ Full |
| Multi-Instance | β No | β Yes | β Yes |
| Setup Complexity | βββ None | ββ Moderate | β Complex |
| Cost | Free | $ Low | $$ Moderate |
| Best For | Dev, HF Spaces | Production | Enterprise, Audit |
Multi-Modal Processing
The chatbot supports multi-modal input, allowing users to send both text and images for visual product analysis, comparison, and identification.
Overview
Powered by Azure OpenAI GPT-4o-mini, the chatbot can:
- Analyze product images to identify items
- Compare products visually against catalog images
- Extract product details from photos (color, style, features)
- Verify product authenticity and condition
- Assist with visual inventory checks
Supported Formats
Image Formats:
- PNG (.png)
- JPEG (.jpg, .jpeg)
- WebP (.webp)
Size Limits:
- Maximum file size: 20MB (Azure OpenAI limit)
- Recommended resolution: 2048x2048 pixels or less
- Images automatically resized if too large
Usage Examples
Text-Only Query
# Via Gradio UI: Type in chat box
User: "What is the stock level for SKU-12345?"
Image + Text Query
# Via Gradio UI: Click image upload button, select image, then type query
User: [Uploads product photo] "Is this product in our catalog? Check inventory?"
Assistant analyzes the image using GPT-4o-mini, compares it against the catalog, and provides inventory information.
Product Image Analysis
User: [Uploads warehouse photo showing multiple items] "Identify all products in this image and check stock levels"
The chatbot can identify multiple products in a single image and provide bulk inventory information.
Best Practices
Image Quality:
- Use clear, well-lit photos
- Ensure products are centered and in focus
- Avoid excessive image compression
Query Construction:
- Combine images with specific questions for best results
- Reference SKUs or product names when known
- Ask focused questions (inventory, identification, comparison)
Supported Use Cases:
- β Product identification from photos
- β Visual comparison against catalog
- β Quality/authenticity verification
- β Bulk identification from warehouse photos
- β Image generation or editing (not supported)
Observability
The chatbot includes comprehensive observability using LangFuse for distributed tracing and monitoring.
LangFuse Integration
LangFuse is integrated throughout the application for automatic tracing of:
- LLM Calls: All Azure OpenAI requests with prompts, completions, and token usage
- Tool Invocations: Inventory queries and purchase order operations
- State Transitions: LangGraph state changes and workflow steps
- RAG Operations: Document retrieval and embedding generation
Configuration
Set up LangFuse in your .env file:
LANGFUSE_PUBLIC_KEY=pk-lf-your-public-key
LANGFUSE_SECRET_KEY=sk-lf-your-secret-key
LANGFUSE_HOST=https://cloud.langfuse.com # or self-hosted
LANGFUSE_ENABLED=true
Using the @trace Decorator
Automatically trace any function:
from chatassistant_retail.observability import trace
@trace(name="inventory_check", trace_type="tool")
async def check_inventory(sku: str):
# Function automatically traced in LangFuse
return await inventory_service.get_stock(sku)
Manual Tracing (Advanced Usage)
Note: For most use cases, the @trace decorator is recommended. Use manual tracing only when you need fine-grained control over span lifecycle.
from chatassistant_retail.observability import create_span, log_event
# Create a span (must call .end() when done)
span = create_span(
name="complex_workflow",
input_data={"query": "user input"},
metadata={"user_id": "123"}
)
try:
# Do work
result = perform_operation()
# Log events within the span
log_event(
name="operation_milestone",
level="INFO",
input_data={"checkpoint": "halfway"}
)
# Update span with output
span.update(output={"result": result})
finally:
# Always end the span
span.end()
Important: Spans created with create_span() must be explicitly ended with .end() to avoid memory leaks. Use the @trace decorator for automatic lifecycle management.
Metrics Dashboard
Status: The Gradio UI metrics dashboard is currently disabled.
Access Metrics:
- LangFuse Web Dashboard: https://cloud.langfuse.com (recommended for production monitoring)
- Programmatic Access: Use the
MetricsCollectorclass directly (see Metrics Collection section below)
All observability infrastructure remains active. The MetricsCollector class continues to aggregate data from LangFuse traces for programmatic access.
Metrics Collection
The MetricsCollector class aggregates data from LangFuse traces:
from chatassistant_retail.observability import MetricsCollector
collector = MetricsCollector()
metrics = collector.get_dashboard_data()
print(f"Total queries: {metrics['total_queries']}")
print(f"Avg response time: {metrics['avg_response_time']:.2f}s")
print(f"Success rate: {metrics['success_rate']:.1f}%")
Logging
Structured logging with Python's logging module:
import logging
logger = logging.getLogger(__name__)
logger.info("Processing user query", extra={
"session_id": session_id,
"query_length": len(query)
})
Log levels are configurable via environment variables:
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
Monitoring in Production
LangFuse provides:
- Request Tracing: End-to-end visibility of each conversation
- Performance Metrics: Latency, throughput, and error rates
- Cost Tracking: Token usage and API costs per request
- User Analytics: Session duration, query patterns, tool usage
- Error Analysis: Exception tracking and debugging
Access your LangFuse dashboard at https://cloud.langfuse.com to view:
- Real-time trace explorer
- Analytics dashboards
- Cost reports
- User session replays
Deployment Options
The chatbot supports multiple deployment scenarios, from local development to production hosting on cloud platforms.
Local Development
Quick Start:
# Clone repository
git clone https://github.com/samir72/chatassistant_retail.git
cd chatassistant_retail
# Install dependencies
uv sync
# Set up environment variables
cp .env.example .env
# Edit .env with your Azure credentials
# Run locally
python -m chatassistant_retail
Features:
- Hot reload with
--reloadflag - Full debugging capabilities
- All features enabled (Azure Search, Redis, PostgreSQL, LangFuse)
- Access at
http://localhost:7860
Session Storage: Any (Memory, Redis, PostgreSQL)
HuggingFace Spaces
Deploy directly to HuggingFace Spaces for free hosting with automatic HTTPS and sharing.
Prerequisites
- HuggingFace account (https://huggingface.co)
- Azure OpenAI API credentials
- (Optional) Azure Cognitive Search for RAG
Configuration
The app.py file is pre-configured for HF Spaces deployment:
# app.py sets deployment mode automatically
os.environ["DEPLOYMENT_MODE"] = "hf_spaces"
Important: src-layout Workaround
Due to HuggingFace Spaces' Docker build process, this project uses a sys.path manipulation workaround instead of standard package installation:
requirements.txtinstalls only dependencies (no package self-installation via.)app.pyaddssrc/directory to Python path at startup- Imports work without formal package installation
This is intentional and necessary because HF Spaces' auto-generated Dockerfile mounts requirements.txt before copying pyproject.toml, preventing standard pip install . from working. For local development, continue using pip install -e . as normal.
Environment Variables (HF Spaces Secrets):
# Required
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini
# Recommended
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-ada-002
AZURE_COGNITIVE_SEARCH_ENDPOINT=...
AZURE_COGNITIVE_SEARCH_API_KEY=...
# Optional
LANGFUSE_PUBLIC_KEY=...
LANGFUSE_SECRET_KEY=...
Deployment Steps
Create Space:
- Go to https://huggingface.co/new-space
- Select "Gradio" as SDK
- Choose "Public" or "Private"
Upload Files:
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME cd YOUR_SPACE_NAME cp -r chatassistant_retail/* . git add . git commit -m "Initial deployment" git pushSet Secrets:
- Go to Space Settings β Repository secrets
- Add all required environment variables
- Space will automatically rebuild and deploy
Access:
- Your app will be live at:
https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
- Your app will be live at:
Limitations:
- Uses Memory session store (sessions lost on restart)
- No Redis or PostgreSQL (Spaces compute limitations)
- Limited to 16GB RAM, 8 CPU cores (Free tier)
- Automatic sleep after 48 hours of inactivity
Best For:
- Demos and prototypes
- Sharing with stakeholders
- Testing without infrastructure setup
- Free tier hosting
Production Deployment
Deploy to cloud platforms for scalable, production-grade hosting.
Azure App Service
Requirements:
- Azure subscription
- Azure App Service (B1 or higher)
- Azure OpenAI, Azure Cognitive Search, Azure Database for PostgreSQL
Steps:
# 1. Install Azure CLI
az login
# 2. Create App Service
az webapp create \
--resource-group retail-chatbot-rg \
--plan retail-chatbot-plan \
--name retail-chatbot-app \
--runtime "PYTHON:3.12"
# 3. Configure environment variables
az webapp config appsettings set \
--resource-group retail-chatbot-rg \
--name retail-chatbot-app \
--settings \
DEPLOYMENT_MODE=production \
SESSION_STORE_TYPE=postgresql \
POSTGRES_CONNECTION_STRING="..." \
AZURE_OPENAI_API_KEY="..."
# 4. Deploy
az webapp up \
--resource-group retail-chatbot-rg \
--name retail-chatbot-app
Recommended Configuration:
- Compute: App Service B2 or higher (3.5GB RAM)
- Session Store: Azure Database for PostgreSQL
- Cache: Azure Cache for Redis (optional)
- Monitoring: Azure Application Insights + LangFuse
Docker Deployment
Dockerfile Example:
FROM python:3.12-slim
WORKDIR /app
# Install uv
RUN pip install uv
# Copy project files
COPY . /app
# Install dependencies
RUN uv sync
# Expose port
EXPOSE 7860
# Run application
CMD ["python", "-m", "chatassistant_retail"]
Build and Run:
# Build image
docker build -t retail-chatbot .
# Run container
docker run -p 7860:7860 \
-e AZURE_OPENAI_API_KEY="..." \
-e DEPLOYMENT_MODE=production \
retail-chatbot
Kubernetes Deployment
For high-availability, multi-replica deployments:
Key Considerations:
- Use PostgreSQL or Redis for session storage (not Memory)
- Configure horizontal pod autoscaling (HPA)
- Set up ingress with TLS/SSL
- Use Azure Key Vault for secrets management
- Configure health checks and liveness probes
Session Store: Redis or PostgreSQL (required for multi-replica)
Architecture Recommendations
| Deployment | Compute | Session Store | Cost | Best For |
|---|---|---|---|---|
| Local | Developer machine | Memory | Free | Development |
| HF Spaces | Free tier (16GB) | Memory | Free | Demos, prototypes |
| Azure App Service | B2+ (3.5GB+) | PostgreSQL | $$ | Small-medium production |
| Docker | Custom | Redis/PostgreSQL | $ | Flexible hosting |
| Kubernetes | Multi-node cluster | Redis/PostgreSQL | $$$ | Enterprise, high-availability |
Scaling Considerations
Vertical Scaling (Single Instance):
- Increase CPU/RAM allocation
- Use Memory or Redis session store
- Suitable for up to 1000 concurrent users
Horizontal Scaling (Multiple Instances):
- Deploy multiple replicas behind load balancer
- Required: Redis or PostgreSQL session store
- Configure sticky sessions (optional, for performance)
- Use Azure Front Door or Application Gateway
- Suitable for 1000+ concurrent users
Performance Optimization:
- Enable LangFuse for monitoring bottlenecks
- Use Redis for caching frequent queries
- Optimize Azure Search index (partition keys, replicas)
- Consider Azure OpenAI provisioned throughput for high volume
Testing
The project uses PyTest for comprehensive testing with both unit and integration tests.
Test Structure
tests/
βββ unit/ # Unit tests (isolated components)
β βββ test_observability.py # LangFuse client and metrics
β βββ test_inventory_tools.py # Inventory tool functions
β βββ test_session_store.py # Session persistence
β βββ test_retriever.py # RAG retrieval logic
β βββ test_data_generator.py # Synthetic data generation
βββ integration/ # Integration tests (multiple components)
β βββ test_state_manager.py # LangGraph state machine
βββ test_chatassistant_retail.py # Main chatbot tests
Running Tests
# Run all tests
just test
# Run with verbose output
just test -v
# Run specific test file
just test tests/unit/test_observability.py
# Run specific test function
just test tests/unit/test_observability.py::TestLangFuseClient::test_get_langfuse_client_disabled
# Run with keyword filter
just test -k "inventory"
# Test on all Python versions (3.10, 3.11, 3.12, 3.13)
just testall
Running Tests with Debugger
Use ipdb debugger on test failures:
# Drop into debugger on first failure
just pdb
# Debug specific test
just pdb tests/unit/test_inventory_tools.py
# Limit to first 10 failures
pytest --pdb --maxfail=10
Coverage Reporting
# Run tests with coverage
just coverage
# View coverage report in terminal
coverage report
# Generate HTML coverage report
coverage html
# Open htmlcov/index.html in browser
Target coverage: >= 90%
Test Fixtures
PyTest fixtures are used for common test setup:
import pytest
@pytest.fixture
def mock_langfuse_client():
"""Provide mocked LangFuse client."""
from unittest.mock import MagicMock
return MagicMock()
@pytest.fixture
async def inventory_tool():
"""Provide inventory tool instance."""
from chatassistant_retail.tools import InventoryTools
return InventoryTools()
def test_inventory_query(inventory_tool):
result = inventory_tool.check_stock("SKU-123")
assert result["stock_level"] > 0
Async Testing
Tests for async functions use pytest-asyncio:
import pytest
@pytest.mark.asyncio
async def test_async_llm_call():
from chatassistant_retail.llm import AzureOpenAIClient
client = AzureOpenAIClient()
response = await client.chat("Test query")
assert response is not None
Mocking External Services
Use pytest-mock for mocking Azure services:
def test_azure_search(mocker):
# Mock Azure Cognitive Search
mock_search = mocker.patch("azure.search.documents.SearchClient")
mock_search.return_value.search.return_value = [
{"sku": "SKU-123", "name": "Test Product"}
]
# Test retriever
from chatassistant_retail.rag import Retriever
retriever = Retriever()
results = retriever.search("test query")
assert len(results) == 1
CI/CD Testing
GitHub Actions runs tests automatically:
# .github/workflows/test.yml
- Run tests on Python 3.12 and 3.13
- Check code formatting with Ruff
- Verify type hints with ty
- Generate coverage report
View test results in GitHub Actions: https://github.com/samir72/chatassistant_retail/actions
Contributing
Contributions are welcome! Please follow these guidelines:
Getting Started
- Fork the repository on GitHub
- Clone your fork:
git clone https://github.com/your-username/chatassistant_retail.git cd chatassistant_retail - Install dependencies:
uv sync - Create a feature branch:
git checkout -b feature/your-feature-name
Development Process
- Make your changes following the code standards below
- Add tests for new functionality (maintain >= 90% coverage)
- Run QA checks:
just qa # Format, lint, type-check, and test - Update documentation if needed (README, docstrings, CLAUDE.md)
- Commit your changes:
git commit -m "Add feature: description" - Push to your fork:
git push origin feature/your-feature-name - Submit a pull request to the main repository
Code Standards
- Style Guide: PEP 8 (enforced by Ruff)
- Line Length: 120 characters maximum
- Type Hints: Required for all function signatures
- Docstrings: Required for all public functions and classes (Google style)
- Test Coverage: >= 90% for all new code
- Import Sorting: Automatic with Ruff (isort rules)
Example Code Style
from typing import Optional
from chatassistant_retail.observability import trace
@trace(name="example_function", trace_type="function")
async def example_function(param1: str, param2: int = 0) -> Optional[dict]:
"""
Brief description of function.
Args:
param1: Description of param1
param2: Description of param2 (default: 0)
Returns:
Description of return value
Raises:
ValueError: When param2 is negative
"""
if param2 < 0:
raise ValueError("param2 must be non-negative")
return {"param1": param1, "param2": param2}
Pull Request Guidelines
- Title: Clear, concise description (e.g., "Add purchase order export feature")
- Description: Explain what changed and why
- Tests: Include test results showing all tests pass
- Coverage: Show coverage hasn't decreased
- Documentation: Update README/docs if needed
- Breaking Changes: Clearly mark any breaking changes
Reporting Bugs
Use GitHub Issues: https://github.com/samir72/chatassistant_retail/issues
Include:
- Python version
- Environment (OS, dependencies)
- Steps to reproduce
- Expected vs actual behavior
- Error messages/stack traces
- Minimal code example
Credits
This package was created with Cookiecutter and the audreyfeldroy/cookiecutter-pypackage project template.
Technologies Used
- LangGraph: Agentic workflow orchestration
- LangChain: LLM framework and integrations
- Azure OpenAI: GPT-4o-mini language model
- Azure Cognitive Search: Vector search and RAG
- LangFuse: Observability and tracing
- Gradio: Web UI framework
- FastMCP: Model Context Protocol server
- PyTest: Testing framework
- Ruff: Python linter and formatter
- uv: Fast Python package manager
License
This project is licensed under the MIT License - see the LICENSE file for details.