Spaces:

samir72
/

chatassistant_retail

Sleeping

App Files Files Community

chatassistant_retail / README.md

github-actions[bot]

Sync from https://github.com/samir72/chatassistant_retail

8b30412 3 months ago

preview code

raw

history blame contribute delete

53.5 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: chatassistant_retail
emoji: 🦄
colorFrom: indigo
colorTo: pink
sdk: gradio
short_description: Web based conversational AI chatbot designed specifically...

chatassistant_retail

chatassistant_retail is a production-ready conversational AI chatbot designed specifically for the retail industry, providing intelligent assistance for retail inventory management. It features a multi-modal interface (text + images) powered by Azure OpenAI GPT-4o-mini, hybrid RAG search with Azure Cognitive Search, stateful conversation management via LangGraph, and flexible session persistence (Memory/Redis/PostgreSQL). The system includes a Gradio-based web UI, MCP tool integration, and comprehensive observability with LangFuse.

Python Version: >= 3.10 (tested on 3.10, 3.11, 3.12, and 3.13)

Features
Architecture
Project Structure
Installation
Usage
Development
Scripts Reference
Session Management
Multi-Modal Processing
Observability
Deployment Options
Testing
Contributing
Credits

Features

Core Capabilities

Conversational Interface: Gradio-based web UI for natural language interactions with retail inventory systems
Retail Inventory Management: Specialized chatbot for handling inventory queries, stock levels, and purchase orders
Natural Language Understanding: Powered by Azure OpenAI (GPT-4o-mini) for understanding and responding to retail-related questions
Agentic Workflow: LangGraph-based state machine for complex multi-step conversations
Tool Integration: MCP (Model Context Protocol) server for inventory and purchase order tools

Technical Features

Multi-Modal Input Processing: Handle both text and images (PNG, JPG, JPEG, WebP) for product analysis and visual queries
Context Caching System: Smart data reuse across conversation turns reduces redundant I/O operations and improves response times
Image-Based Product Lookup: Upload product images for AI-powered identification, catalog matching, and automated inventory checking
Low-Stock Automation: Automatic reorder recommendations when visually identified products are low in stock
LangGraph Orchestration: Stateful conversation management with persistent session storage
Hybrid RAG Search: Vector + keyword + semantic search via Azure Cognitive Search with automatic fallback to local data
Flexible Session Persistence: Three backend options - Memory (fast, ephemeral), Redis (distributed), or PostgreSQL (full persistence)
Semantic Search: AI-powered relevance ranking with Free tier support (1,000 queries/month)
Observability: Built-in LangFuse integration for tracing, monitoring, and analytics
Async Processing: Asynchronous operations for high-performance request handling
Graceful Fallbacks: Automatic degradation when Azure services unavailable (local data, keyword search)

Retail-Specific Features

Inventory Queries: Check stock levels, product availability, and warehouse information
Purchase Order Management: Create, track, and manage purchase orders
Sample Data Generation: Generate realistic product catalogs and sales history using Faker (500+ products, 6 months sales)
Product Search: Semantic search across product catalog using Azure Cognitive Search with visual product matching

Deployment & Development

Deployment Flexibility: Local development, HuggingFace Spaces, or production deployment (Azure App Service, Docker, K8s)
Development Tools: Comprehensive test suite, data generation scripts, Azure Search setup automation
Multi-Environment Configuration: Environment-based settings with validation and graceful fallbacks

Architecture

The chatbot follows an agentic architecture pattern using LangGraph for state management and orchestration:

┌─────────────────────────────────────────────────────────┐
│                  Gradio Web Interface                    │
│  ┌─────────────┐                      ┌───────────────┐  │
│  │  Chat UI    │                      │  Session      │  │
│  │             │                      │  Management   │  │
│  └─────────────┘                      └───────────────┘  │
└────────────────────────┬────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────┐
│              LangGraph State Manager                     │
│  ┌──────────────────┐  ┌──────────────────────────┐    │
│  │  State Graph     │  │  Session Store           │    │
│  │  (Workflow)      │  │  (Memory/PostgreSQL)     │    │
│  └──────────────────┘  └──────────────────────────┘    │
└────────────────────────┬────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────┐
│              Business Logic Layer                        │
│  ┌──────────────┐  ┌─────────────┐  ┌──────────────┐   │
│  │  Azure       │  │  Inventory  │  │  Purchase    │   │
│  │  OpenAI      │  │  Tools      │  │  Order Tools │   │
│  └──────────────┘  └─────────────┘  └──────────────┘   │
└────────────────────────┬────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────┐
│              Data/Integration Layer                      │
│  ┌──────────────┐  ┌─────────────┐  ┌──────────────┐   │
│  │  PostgreSQL  │  │  Azure      │  │  Redis       │   │
│  │  (Sessions)  │  │  Search     │  │  (Cache)     │   │
│  └──────────────┘  └─────────────┘  └──────────────┘   │
└─────────────────────────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────┐
│                  Observability Layer                     │
│  ┌──────────────┐  ┌─────────────┐  ┌──────────────┐   │
│  │  LangFuse    │  │  Metrics    │  │  Python      │   │
│  │  Tracing     │  │  Collector  │  │  Logging     │   │
│  └──────────────┘  └─────────────┘  └──────────────┘   │
└─────────────────────────────────────────────────────────┘

Key Components

Gradio UI (ui/): Web-based chat interface with multi-modal input (text + images)
LangGraph State Manager (state/): Conversation state management with Memory/Redis/PostgreSQL session stores
Workflow Orchestration (workflow/): Image-based product lookup with multi-step automation (vision → search → inventory → recommendations)
Context Utilities (tools/context_utils.py): Smart caching layer for performance optimization and data coherence
Azure OpenAI Client (llm/): Multi-modal LLM integration (GPT-4o-mini) with prompt templates and response parsing
Context-Aware Tools (tools/): Inventory and purchase order tools with optional state parameter for intelligent cache reuse
RAG System (rag/): Hybrid search with Azure Cognitive Search (vector + keyword + semantic) and local fallback
Observability (observability/): LangFuse tracing and metrics collection across all components
Data Models (data/): Pydantic models for products, sales, and purchase orders with sample data generation

Design Principles

Separation of Concerns: Clear separation between UI, orchestration, business logic, and data layers
Stateful Conversations: LangGraph manages conversation state with checkpointing
Tool-Based Architecture: LLM invokes tools (inventory queries, purchase orders) through structured outputs
Observable by Default: All LLM calls and tool invocations traced with LangFuse
Error Resilience: Graceful degradation and comprehensive error handling

Project Structure

This project uses the src-layout pattern for better development and testing practices:

chatassistant_retail/
├── app.py                        # HuggingFace Spaces entry point
│
├── src/
│   └── chatassistant_retail/
│       ├── __init__.py           # Package initialization
│       ├── __main__.py           # Application entry point
│       ├── cli.py                # CLI entry point
│       ├── chatbot.py            # Main chatbot orchestrator (multi-modal)
│       │
│       ├── ui/                   # Gradio web interface
│       │   ├── __init__.py
│       │   ├── gradio_app.py     # Main Gradio application
│       │   ├── chat_interface.py # Chat UI components
│       │   └── metrics_dashboard.py  # Observability dashboard (UI currently disabled)
│       │
│       ├── state/                # LangGraph state management
│       │   ├── __init__.py
│       │   ├── langgraph_manager.py  # State graph orchestration
│       │   ├── session_store.py      # Abstract session interface
│       │   ├── memory_store.py       # In-memory store (HF Spaces)
│       │   ├── redis_store.py        # Redis store (distributed)
│       │   └── postgresql_store.py   # PostgreSQL store (persistent)
│       │
│       ├── llm/                  # LLM integration
│       │   ├── __init__.py
│       │   ├── azure_openai_client.py  # Azure OpenAI client (GPT-4o-mini)
│       │   ├── prompt_templates.py     # System/user prompts
│       │   └── response_parser.py      # Response parsing
│       │
│       ├── workflow/               # NEW: Workflow orchestration
│       │   ├── __init__.py
│       │   └── image_processor.py  # Image-based product lookup
│       │
│       ├── tools/                # Inventory & PO tools
│       │   ├── __init__.py
│       │   ├── context_utils.py    # NEW: Context caching utilities
│       │   ├── inventory_tools.py      # UPDATED: Context-aware inventory operations
│       │   ├── purchase_order_tools.py # UPDATED: Context-aware PO operations
│       │   └── mcp_server.py           # UPDATED: MCP server with state passing
│       │
│       ├── rag/                  # Azure Cognitive Search RAG
│       │   ├── __init__.py
│       │   ├── azure_search_client.py  # Hybrid search client (vector+keyword+semantic)
│       │   ├── retriever.py            # Document retrieval with fallback
│       │   └── embeddings.py           # Embedding generation
│       │
│       ├── data/                 # Data models and generation
│       │   ├── __init__.py
│       │   ├── models.py         # Product, Sale, PurchaseOrder models
│       │   └── generator.py      # Sample data generator (Faker)
│       │
│       ├── observability/        # LangFuse observability
│       │   ├── __init__.py
│       │   ├── langfuse_client.py      # LangFuse wrapper
│       │   ├── decorators.py           # @trace decorator
│       │   └── metrics_collector.py    # Metrics aggregation
│       │
│       └── config/               # Configuration
│           ├── __init__.py
│           ├── settings.py       # Pydantic settings (env-based)
│           └── deployment.py     # Deployment configs
│
├── data/                         # Sample data files
│   ├── products.json             # 500+ sample products (216KB)
│   ├── sales_history.json        # 6 months sales data (3.5MB)
│   └── purchase_orders.json      # Sample purchase orders
│
├── scripts/                      # Utility scripts
│   ├── setup_azure_search.py     # Azure Search index setup
│   ├── generate_sample_data.py   # Generate sample product/sales data
│   ├── test_gradio_ui.py         # UI testing script
│   ├── test_phase2.py            # Integration testing
│   └── test_phase3.py            # E2E scenario testing
│
├── tests/
│   ├── __init__.py
│   ├── unit/                     # Unit tests
│   │   ├── test_context_utils.py     # NEW: Context caching tests
│   │   ├── test_image_processor.py   # NEW: Image workflow tests
│   │   ├── test_observability.py
│   │   ├── test_inventory_tools.py
│   │   ├── test_session_store.py
│   │   ├── test_retriever.py
│   │   ├── test_azure_search_client.py
│   │   ├── test_azure_openai_client.py
│   │   ├── test_mcp_server.py
│   │   └── test_data_generator.py
│   ├── integration/              # Integration tests
│   │   ├── test_tool_context_integration.py  # NEW: Context-aware tool tests
│   │   └── test_state_manager.py
│   └── test_chatassistant_retail.py  # Main tests
│
├── docs/                         # Sphinx documentation
│   ├── conf.py
│   ├── index.rst
│   └── usage.rst
│
├── .github/
│   └── workflows/
│       └── test.yml              # CI/CD pipeline
│
├── pyproject.toml                # Project metadata and dependencies
├── justfile                      # Task automation
├── CLAUDE.md                     # Claude Code guidance
├── README.md                     # This file
├── HISTORY.md                    # Changelog
└── LICENSE                       # MIT License

Key Directories

app.py: HuggingFace Spaces entry point (sets deployment mode and launches Gradio on 0.0.0.0:7860)
ui/: Gradio-based web interface with multi-modal chat
state/: LangGraph state machine with three session backends (Memory/Redis/PostgreSQL)
llm/: Azure OpenAI integration (GPT-4o-mini) with prompt engineering and multi-modal support
tools/: Inventory and purchase order tools with MCP server integration
rag/: Hybrid search with Azure Cognitive Search (vector+keyword+semantic) and local fallback
data/: Pydantic data models and Faker-based sample data generation
observability/: LangFuse tracing, metrics collection, and monitoring
config/: Pydantic settings with environment variable support and validation
scripts/: Setup scripts (Azure Search index, data generation, testing)
data/ (root): Sample JSON files (products, sales history, purchase orders)
tests/: Comprehensive PyTest suite (unit and integration tests)

Installation

Prerequisites

Python >= 3.10
uv (Rust-based Python package manager) - Required
just (Command runner for task automation) - Required for development
Azure OpenAI API access
Azure Cognitive Search instance (optional, for RAG)
PostgreSQL database (optional, for persistent sessions)
Redis instance (optional, for caching)
LangFuse account (optional, for observability)

Installing uv and just

Both uv and just need to be installed system-wide (not in a virtual environment):

macOS:

# Install with Homebrew (recommended)
brew install uv just

# Or install uv via curl
curl -LsSf https://astral.sh/uv/install.sh | sh

# And install just separately
brew install just

Linux:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install just
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash

Windows:

# Install uv
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Install just with cargo
cargo install just

After installation, verify both are available:

uv --version  # Should show: uv 0.9.x or later
just --version  # Should show: just 1.x.x or later

Install from PyPI (when published)

pip install chatassistant_retail

Development Installation

Clone the repository:

git clone https://github.com/samir72/chatassistant_retail.git
cd chatassistant_retail

Install dependencies with uv:
```
uv sync
```

Set up environment variables (create .env file):

# Azure OpenAI (Required)
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini                      # GPT-4o-mini deployment
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-ada-002  # For RAG embeddings
AZURE_OPENAI_API_VERSION=2024-02-15-preview

# Azure Cognitive Search (Optional - fallback to local data if not configured)
AZURE_COGNITIVE_SEARCH_ENDPOINT=https://your-search.search.windows.net
AZURE_COGNITIVE_SEARCH_API_KEY=your-search-key
AZURE_SEARCH_INDEX_NAME=products
# Note: Semantic search must be enabled in Azure Portal (Search Service → Semantic ranker → Free)

# Session Persistence (Optional - defaults to Memory store)
SESSION_STORE_TYPE=memory                    # Options: memory, redis, postgresql
REDIS_URL=redis://localhost:6379/0           # If using redis
POSTGRES_CONNECTION_STRING=postgresql://user:password@localhost:5432/chatbot  # If using postgresql

# Deployment Configuration (Optional)
DEPLOYMENT_MODE=local                        # Options: local, hf_spaces
LOG_LEVEL=INFO                               # DEBUG, INFO, WARNING, ERROR

# LangFuse Observability (Optional - recommended for production)
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com
LANGFUSE_ENABLED=true

Verify installation:
```
just test
```

Sample Data

The repository includes sample data files for immediate testing and development:

File	Size	Description
`data/products.json`	220 KB	500+ sample products across 8 retail categories (Electronics, Clothing, Groceries, etc.)
`data/sales_history.json`	3.6 MB	6 months of sales transaction history with seasonal patterns
`data/purchase_orders.json`	1.3 KB	Sample purchase orders (pending/fulfilled statuses)

Total size: ~3.7 MB

Why Sample Data is Included

Previously, the data/ directory was excluded from version control to keep the repository lean. As of version 0.1.1, sample data is included by default for:

✅ Faster setup for new developers
✅ Consistent test data across environments
✅ No need to run data generation scripts initially

Regenerating Sample Data (Optional)

If you want to regenerate or customize sample data:

python scripts/generate_sample_data.py

This will create fresh sample data with configurable parameters.

Usage

Setting Up Azure Cognitive Search

If you're using Azure Cognitive Search for RAG (Retrieval-Augmented Generation), you need to create the search index before running the chatbot.

Prerequisites

Azure Cognitive Search service created
Environment variables configured:
- AZURE_COGNITIVE_SEARCH_ENDPOINT
- AZURE_COGNITIVE_SEARCH_API_KEY
- AZURE_SEARCH_INDEX_NAME (optional, defaults to "products")

Create the Index

Run the setup script:

python scripts/setup_azure_search.py

This will:

Create the "products" index with the proper schema
Configure vector search (1536-dimensional embeddings)
Set up semantic search capabilities
Optionally load 500 sample products with embeddings

Verify Setup

Check index health:

from chatassistant_retail.rag import AzureSearchClient
from chatassistant_retail.config import get_settings

client = AzureSearchClient(get_settings())
health = client.check_index_health()
print(f"Status: {health['overall_status']}")
print(f"Document count: {health['stats']['document_count']}")

Note: If the index doesn't exist, the chatbot will still work but will fall back to local product data without vector search capabilities.

Running the Chatbot

Start Gradio Web Interface

# Using Python module
python -m chatassistant_retail

# Or using the CLI entry point
chatassistant-retail

# With custom port
chatassistant-retail --port 7860

The web interface will be available at http://localhost:7860 with:

Chat Interface: Main conversational UI with multi-modal support (text + images)

Command-Line Usage

# Interactive CLI mode
chatassistant-retail --cli

# Single query
chatassistant-retail --query "What is the stock level for SKU-12345?"

Example Conversations

Inventory Query

User: What is the current stock level for SKU-12345?
Assistant: Let me check the inventory for SKU-12345...
[Tool: inventory_tools.check_stock_level(sku="SKU-12345")]
The current stock level for SKU-12345 is 150 units across 3 warehouses.

Purchase Order Management

User: Create a purchase order for 500 units of SKU-67890 from Supplier ABC
Assistant: I'll create a purchase order for you...
[Tool: purchase_order_tools.create_po(sku="SKU-67890", quantity=500, supplier="ABC")]
Purchase order PO-2025-001 created successfully for 500 units of SKU-67890.
Expected delivery: 2025-01-15

Product Search with RAG

User: Tell me about our winter jacket inventory
Assistant: Let me search our product catalog...
[RAG: Azure Cognitive Search retrieval with semantic search]
We have 5 winter jacket styles in stock with a total of 1,250 units.
Most popular: "Alpine Puffer Jacket" (SKU-WJ-001) with 450 units available.

Multi-Modal Query (Image + Text)

User: [Uploads product image] Is this the same as SKU-12345? What's the stock level?
Assistant: Analyzing the product image...
[Multi-Modal: GPT-4o-mini processing]
[Tool: inventory_tools.check_stock_level(sku="SKU-12345")]
Yes, this appears to be the same product as SKU-12345 (Blue Athletic Sneaker).
Current stock: 85 units available across 2 warehouses.

Image-Based Product Workflow (NEW)

User: [Uploads product image] "Check inventory for this item"
Assistant: Let me analyze the product image...

[ImageProductProcessor workflow]
- Vision Extraction: Identifies "Wireless Optical Mouse" (Electronics, confidence: 0.89)
- Catalog Search: Finds 2 matching products via RAG
- Inventory Check:
  * SKU-10001: 50 units (adequate stock)
  * SKU-10002: 8 units (below reorder level of 20)
- Reorder Recommendation: Order 50 units of SKU-10002 (HIGH urgency, 5 days until stockout)

Response:
"🔍 Product Identification Results

I identified: Wireless Optical Mouse
Category: Electronics

📦 Matching Products in Inventory:

1. Wireless Optical Mouse (SKU-10001)
   - Price: $24.99
   - Current Stock: 50 units
   - Reorder Level: 10 units
   - Status: OK
   - Supplier: TechSupplies Inc

2. Ergonomic Wireless Mouse (SKU-10002)
   - Price: $34.99
   - Current Stock: 8 units
   - Reorder Level: 20 units
   - Status: LOW STOCK
   - Supplier: ComfortTech

💡 Recommendations:

⚠️  Ergonomic Wireless Mouse (SKU-10002) is running low:
   - Days until stockout: 5
   - Suggested order quantity: 50 units
   - Urgency: HIGH

Would you like me to create a purchase order?"

Development

This project uses just for task automation. All development commands are defined in the justfile.

Quick Start

# List all available commands
just list

# Run full QA pipeline (format, lint, type-check, test)
just qa

# Run tests
just test

# Run tests with debugger on failure
just pdb

# Generate coverage report
just coverage

Development Workflow

Make changes to the codebase
Run QA: just qa (formats, lints, type-checks, and tests)
Debug failures: just pdb if tests fail
Check coverage: just coverage to ensure adequate test coverage
Build package: just build when ready to release

Code Quality Tools

Ruff: Fast Python linter and formatter (line length: 120)
isort: Import sorting (integrated with Ruff)
ty: Type checking with all rules enabled
pytest: Testing framework with async support and coverage reporting

Local Development Setup

# Activate virtual environment (if not using uv)
source .venv/bin/activate

# Install pre-commit hooks (optional)
pre-commit install

# Run development server with auto-reload
python -m chatassistant_retail --reload

# Run tests in watch mode
pytest-watch

Data Generation

The project includes tools for generating realistic sample data for development and testing.

Generating Sample Data

Use the generate_sample_data.py script to create sample products and sales history:

# Generate default data (500 products, 6 months sales history)
python scripts/generate_sample_data.py

# Custom data generation
python scripts/generate_sample_data.py --count 1000 --months 12

What Gets Generated

Products (data/products.json):

500+ realistic retail products (configurable)
Categories: Electronics, Clothing, Home & Garden, Sports, Books
Realistic pricing, descriptions, and metadata
Auto-generated embeddings for Azure Search (1536 dimensions)

Sales History (data/sales_history.json):

6 months of transactional data (configurable)
Realistic sales patterns (seasonality, trends)
Multiple warehouses and channels
Customer demographics

Purchase Orders (data/purchase_orders.json):

Sample PO data for testing
Various suppliers and statuses
Delivery tracking information

Integration with Azure Search

The generated data includes pre-computed embeddings for immediate upload to Azure Search:

# 1. Generate sample data with embeddings
python scripts/generate_sample_data.py

# 2. Upload to Azure Search
python scripts/setup_azure_search.py --load-data

Customization

The data generator uses Faker for realistic data generation. Customize by modifying src/chatassistant_retail/data/generator.py.

Scripts Reference

The scripts/ directory contains utility scripts for setup, testing, and data management.

setup_azure_search.py

Purpose: Create and configure Azure Cognitive Search index with proper schema for RAG.

Usage:

# Create index (prompts to load sample data)
python scripts/setup_azure_search.py

# Create index and auto-load data
python scripts/setup_azure_search.py --load-data

# Recreate existing index
python scripts/setup_azure_search.py --recreate

What It Does:

Creates "products" index with hybrid search configuration
Configures vector search (HNSW algorithm, 1536 dimensions)
Sets up semantic search capabilities
Optionally loads 500 sample products with embeddings
Uploads in batches of 100 for efficiency
Performs health check and displays statistics

Requirements:

AZURE_COGNITIVE_SEARCH_ENDPOINT environment variable
AZURE_COGNITIVE_SEARCH_API_KEY environment variable
AZURE_OPENAI_EMBEDDING_DEPLOYMENT for embedding generation

generate_sample_data.py

Purpose: Generate realistic retail data for development and testing.

Usage:

# Default: 500 products, 6 months sales
python scripts/generate_sample_data.py

# Custom counts
python scripts/generate_sample_data.py --count 1000 --months 12

# Dry run (don't save files)
python scripts/generate_sample_data.py --dry-run

Output:

data/products.json - Product catalog with embeddings
data/sales_history.json - Sales transactions
data/purchase_orders.json - PO data

test_gradio_ui.py

Purpose: Interactive testing of Gradio UI components.

Usage:

python scripts/test_gradio_ui.py

Launches the Gradio interface for manual testing and validation.

test_phase2.py / test_phase3.py

Purpose: Integration and end-to-end testing scripts.

Usage:

# Integration testing
python scripts/test_phase2.py

# E2E scenario testing
python scripts/test_phase3.py

Tests the complete chatbot workflow including LangGraph state management, tool execution, and RAG retrieval.

Session Management

The chatbot supports three different session storage backends for conversation state persistence, allowing you to choose the right balance between simplicity, performance, and durability.

Session Store Backends

Memory Store (Default)

The in-memory session store is the default and simplest option, ideal for development and HuggingFace Spaces deployment.

Characteristics:

✅ Fast: No network latency, instant access
✅ Simple: No external dependencies required
✅ Auto-configured: Default for HF Spaces deployment
❌ Ephemeral: Sessions lost on restart
❌ Single-instance: Not shared across multiple app instances

Configuration:

SESSION_STORE_TYPE=memory  # or omit (default)

Use When:

Developing locally
Deploying to HuggingFace Spaces
Session persistence not critical
Running single instance

Redis Store

Redis provides distributed session storage with persistence, ideal for production deployments with multiple instances.

Characteristics:

✅ Fast: In-memory with disk persistence
✅ Distributed: Shared across multiple app instances
✅ Persistent: Survives app restarts (with RDB/AOF)
✅ TTL Support: Automatic session expiration
⚠️ Requires Redis: External service dependency

Configuration:

SESSION_STORE_TYPE=redis
REDIS_URL=redis://localhost:6379/0
# Or for production with auth:
REDIS_URL=redis://:password@redis-host:6379/0

Use When:

Running multiple app instances (load balanced)
Need distributed session sharing
Want automatic session expiration
Production deployment with high availability

PostgreSQL Store

PostgreSQL provides full persistence with queryable session history, ideal for audit requirements and analytics.

Characteristics:

✅ Fully Persistent: Durable storage with ACID guarantees
✅ Queryable: SQL access to session data and history
✅ Audit Trail: Complete conversation history
✅ Backup/Recovery: Standard database backup tools
⚠️ Slower: Disk I/O overhead vs in-memory stores
⚠️ Requires PostgreSQL: External database dependency

Configuration:

SESSION_STORE_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://user:password@localhost:5432/chatbot

Use When:

Need complete audit trail
Compliance/regulatory requirements
Want to query conversation history
Long-term session retention needed
Analytics on conversation patterns

Configuration Examples

Local Development (Memory):

# .env
SESSION_STORE_TYPE=memory  # Fast, simple, ephemeral

HuggingFace Spaces (Memory):

# Automatically configured via app.py
DEPLOYMENT_MODE=hf_spaces
SESSION_STORE_TYPE=memory  # Default for HF Spaces

Production (Redis):

# .env
SESSION_STORE_TYPE=redis
REDIS_URL=redis://:your-password@redis.example.com:6379/0

Enterprise (PostgreSQL):

# .env
SESSION_STORE_TYPE=postgresql
POSTGRES_CONNECTION_STRING=postgresql://chatbot:password@db.example.com:5432/chatbot

Choosing a Backend

Criteria	Memory	Redis	PostgreSQL
Speed	⭐⭐⭐ Fastest	⭐⭐ Very Fast	⭐ Fast
Persistence	❌ None	⭐⭐ Configurable	⭐⭐⭐ Full
Multi-Instance	❌ No	✅ Yes	✅ Yes
Setup Complexity	⭐⭐⭐ None	⭐⭐ Moderate	⭐ Complex
Cost	Free	$ Low	$$ Moderate
Best For	Dev, HF Spaces	Production	Enterprise, Audit

Multi-Modal Processing

The chatbot supports multi-modal input, allowing users to send both text and images for visual product analysis, comparison, and identification.

Overview

Analyze product images to identify items
Compare products visually against catalog images
Extract product details from photos (color, style, features)
Verify product authenticity and condition
Assist with visual inventory checks

Supported Formats

Image Formats:

PNG (.png)
JPEG (.jpg, .jpeg)
WebP (.webp)

Size Limits:

Maximum file size: 20MB (Azure OpenAI limit)
Recommended resolution: 2048x2048 pixels or less
Images automatically resized if too large

Usage Examples

Text-Only Query

# Via Gradio UI: Type in chat box
User: "What is the stock level for SKU-12345?"

Image + Text Query

# Via Gradio UI: Click image upload button, select image, then type query
User: [Uploads product photo] "Is this product in our catalog? Check inventory?"

Assistant analyzes the image using GPT-4o-mini, compares it against the catalog, and provides inventory information.

Product Image Analysis

User: [Uploads warehouse photo showing multiple items] "Identify all products in this image and check stock levels"

The chatbot can identify multiple products in a single image and provide bulk inventory information.

Best Practices

Image Quality:

Use clear, well-lit photos
Ensure products are centered and in focus
Avoid excessive image compression

Query Construction:

Combine images with specific questions for best results
Reference SKUs or product names when known
Ask focused questions (inventory, identification, comparison)

Supported Use Cases:

✅ Product identification from photos
✅ Visual comparison against catalog
✅ Quality/authenticity verification
✅ Bulk identification from warehouse photos
❌ Image generation or editing (not supported)

Observability

The chatbot includes comprehensive observability using LangFuse for distributed tracing and monitoring.

LangFuse Integration

LangFuse is integrated throughout the application for automatic tracing of:

LLM Calls: All Azure OpenAI requests with prompts, completions, and token usage
Tool Invocations: Inventory queries and purchase order operations
State Transitions: LangGraph state changes and workflow steps
RAG Operations: Document retrieval and embedding generation

Configuration

Set up LangFuse in your .env file:

LANGFUSE_PUBLIC_KEY=pk-lf-your-public-key
LANGFUSE_SECRET_KEY=sk-lf-your-secret-key
LANGFUSE_HOST=https://cloud.langfuse.com  # or self-hosted
LANGFUSE_ENABLED=true

Using the @trace Decorator

Automatically trace any function:

from chatassistant_retail.observability import trace

@trace(name="inventory_check", trace_type="tool")
async def check_inventory(sku: str):
    # Function automatically traced in LangFuse
    return await inventory_service.get_stock(sku)

Manual Tracing (Advanced Usage)

Note: For most use cases, the @trace decorator is recommended. Use manual tracing only when you need fine-grained control over span lifecycle.

from chatassistant_retail.observability import create_span, log_event

# Create a span (must call .end() when done)
span = create_span(
    name="complex_workflow",
    input_data={"query": "user input"},
    metadata={"user_id": "123"}
)

try:
    # Do work
    result = perform_operation()

    # Log events within the span
    log_event(
        name="operation_milestone",
        level="INFO",
        input_data={"checkpoint": "halfway"}
    )

    # Update span with output
    span.update(output={"result": result})
finally:
    # Always end the span
    span.end()

Important: Spans created with create_span() must be explicitly ended with .end() to avoid memory leaks. Use the @trace decorator for automatic lifecycle management.

Metrics Dashboard

Status: The Gradio UI metrics dashboard is currently disabled.

Access Metrics:

LangFuse Web Dashboard: https://cloud.langfuse.com (recommended for production monitoring)
Programmatic Access: Use the MetricsCollector class directly (see Metrics Collection section below)

All observability infrastructure remains active. The MetricsCollector class continues to aggregate data from LangFuse traces for programmatic access.

Metrics Collection

The MetricsCollector class aggregates data from LangFuse traces:

from chatassistant_retail.observability import MetricsCollector

collector = MetricsCollector()
metrics = collector.get_dashboard_data()

print(f"Total queries: {metrics['total_queries']}")
print(f"Avg response time: {metrics['avg_response_time']:.2f}s")
print(f"Success rate: {metrics['success_rate']:.1f}%")

Logging

Structured logging with Python's logging module:

import logging

logger = logging.getLogger(__name__)
logger.info("Processing user query", extra={
    "session_id": session_id,
    "query_length": len(query)
})

Log levels are configurable via environment variables:

LOG_LEVEL=INFO  # DEBUG, INFO, WARNING, ERROR, CRITICAL

Monitoring in Production

LangFuse provides:

Request Tracing: End-to-end visibility of each conversation
Performance Metrics: Latency, throughput, and error rates
Cost Tracking: Token usage and API costs per request
User Analytics: Session duration, query patterns, tool usage
Error Analysis: Exception tracking and debugging

Access your LangFuse dashboard at https://cloud.langfuse.com to view:

Real-time trace explorer
Analytics dashboards
Cost reports
User session replays

Deployment Options

The chatbot supports multiple deployment scenarios, from local development to production hosting on cloud platforms.

Local Development

Quick Start:

# Clone repository
git clone https://github.com/samir72/chatassistant_retail.git
cd chatassistant_retail

# Install dependencies
uv sync

# Set up environment variables
cp .env.example .env
# Edit .env with your Azure credentials

# Run locally
python -m chatassistant_retail

Features:

Hot reload with --reload flag
Full debugging capabilities
All features enabled (Azure Search, Redis, PostgreSQL, LangFuse)
Access at http://localhost:7860

Session Storage: Any (Memory, Redis, PostgreSQL)

HuggingFace Spaces

Deploy directly to HuggingFace Spaces for free hosting with automatic HTTPS and sharing.

Prerequisites

HuggingFace account (https://huggingface.co)
Azure OpenAI API credentials
(Optional) Azure Cognitive Search for RAG

Configuration

The app.py file is pre-configured for HF Spaces deployment:

# app.py sets deployment mode automatically
os.environ["DEPLOYMENT_MODE"] = "hf_spaces"

Important: src-layout Workaround

Due to HuggingFace Spaces' Docker build process, this project uses a sys.path manipulation workaround instead of standard package installation:

requirements.txt installs only dependencies (no package self-installation via .)
app.py adds src/ directory to Python path at startup
Imports work without formal package installation

This is intentional and necessary because HF Spaces' auto-generated Dockerfile mounts requirements.txt before copying pyproject.toml, preventing standard pip install . from working. For local development, continue using pip install -e . as normal.

Environment Variables (HF Spaces Secrets):

# Required
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=...
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini

# Recommended
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-ada-002
AZURE_COGNITIVE_SEARCH_ENDPOINT=...
AZURE_COGNITIVE_SEARCH_API_KEY=...

# Optional
LANGFUSE_PUBLIC_KEY=...
LANGFUSE_SECRET_KEY=...

Deployment Steps

Create Space:
- Go to https://huggingface.co/new-space
- Select "Gradio" as SDK
- Choose "Public" or "Private"

Upload Files:

git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME
cp -r chatassistant_retail/* .
git add .
git commit -m "Initial deployment"
git push

Set Secrets:
- Go to Space Settings → Repository secrets
- Add all required environment variables
- Space will automatically rebuild and deploy
Access:
- Your app will be live at: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

Limitations:

Uses Memory session store (sessions lost on restart)
No Redis or PostgreSQL (Spaces compute limitations)
Limited to 16GB RAM, 8 CPU cores (Free tier)
Automatic sleep after 48 hours of inactivity

Best For:

Demos and prototypes
Sharing with stakeholders
Testing without infrastructure setup
Free tier hosting

Production Deployment

Deploy to cloud platforms for scalable, production-grade hosting.

Azure App Service

Requirements:

Azure subscription
Azure App Service (B1 or higher)
Azure OpenAI, Azure Cognitive Search, Azure Database for PostgreSQL

Steps:

# 1. Install Azure CLI
az login

# 2. Create App Service
az webapp create \
  --resource-group retail-chatbot-rg \
  --plan retail-chatbot-plan \
  --name retail-chatbot-app \
  --runtime "PYTHON:3.12"

# 3. Configure environment variables
az webapp config appsettings set \
  --resource-group retail-chatbot-rg \
  --name retail-chatbot-app \
  --settings \
    DEPLOYMENT_MODE=production \
    SESSION_STORE_TYPE=postgresql \
    POSTGRES_CONNECTION_STRING="..." \
    AZURE_OPENAI_API_KEY="..."

# 4. Deploy
az webapp up \
  --resource-group retail-chatbot-rg \
  --name retail-chatbot-app

Recommended Configuration:

Compute: App Service B2 or higher (3.5GB RAM)
Session Store: Azure Database for PostgreSQL
Cache: Azure Cache for Redis (optional)
Monitoring: Azure Application Insights + LangFuse

Docker Deployment

Dockerfile Example:

FROM python:3.12-slim

WORKDIR /app

# Install uv
RUN pip install uv

# Copy project files
COPY . /app

# Install dependencies
RUN uv sync

# Expose port
EXPOSE 7860

# Run application
CMD ["python", "-m", "chatassistant_retail"]

Build and Run:

# Build image
docker build -t retail-chatbot .

# Run container
docker run -p 7860:7860 \
  -e AZURE_OPENAI_API_KEY="..." \
  -e DEPLOYMENT_MODE=production \
  retail-chatbot

Kubernetes Deployment

For high-availability, multi-replica deployments:

Key Considerations:

Use PostgreSQL or Redis for session storage (not Memory)
Configure horizontal pod autoscaling (HPA)
Set up ingress with TLS/SSL
Use Azure Key Vault for secrets management
Configure health checks and liveness probes

Session Store: Redis or PostgreSQL (required for multi-replica)

Architecture Recommendations

Deployment	Compute	Session Store	Cost	Best For
Local	Developer machine	Memory	Free	Development
HF Spaces	Free tier (16GB)	Memory	Free	Demos, prototypes
Azure App Service	B2+ (3.5GB+)	PostgreSQL	$$	Small-medium production
Docker	Custom	Redis/PostgreSQL	$	Flexible hosting
Kubernetes	Multi-node cluster	Redis/PostgreSQL	$$$	Enterprise, high-availability

Scaling Considerations

Vertical Scaling (Single Instance):

Increase CPU/RAM allocation
Use Memory or Redis session store
Suitable for up to 1000 concurrent users

Horizontal Scaling (Multiple Instances):

Deploy multiple replicas behind load balancer
Required: Redis or PostgreSQL session store
Configure sticky sessions (optional, for performance)
Use Azure Front Door or Application Gateway
Suitable for 1000+ concurrent users

Performance Optimization:

Enable LangFuse for monitoring bottlenecks
Use Redis for caching frequent queries
Optimize Azure Search index (partition keys, replicas)
Consider Azure OpenAI provisioned throughput for high volume

Testing

The project uses PyTest for comprehensive testing with both unit and integration tests.

Test Structure

tests/
├── unit/                           # Unit tests (isolated components)
│   ├── test_observability.py      # LangFuse client and metrics
│   ├── test_inventory_tools.py    # Inventory tool functions
│   ├── test_session_store.py      # Session persistence
│   ├── test_retriever.py          # RAG retrieval logic
│   └── test_data_generator.py     # Synthetic data generation
├── integration/                    # Integration tests (multiple components)
│   └── test_state_manager.py      # LangGraph state machine
└── test_chatassistant_retail.py   # Main chatbot tests

Running Tests

# Run all tests
just test

# Run with verbose output
just test -v

# Run specific test file
just test tests/unit/test_observability.py

# Run specific test function
just test tests/unit/test_observability.py::TestLangFuseClient::test_get_langfuse_client_disabled

# Run with keyword filter
just test -k "inventory"

# Test on all Python versions (3.10, 3.11, 3.12, 3.13)
just testall

Running Tests with Debugger

Use ipdb debugger on test failures:

# Drop into debugger on first failure
just pdb

# Debug specific test
just pdb tests/unit/test_inventory_tools.py

# Limit to first 10 failures
pytest --pdb --maxfail=10

Coverage Reporting

# Run tests with coverage
just coverage

# View coverage report in terminal
coverage report

# Generate HTML coverage report
coverage html
# Open htmlcov/index.html in browser

Target coverage: >= 90%

Test Fixtures

PyTest fixtures are used for common test setup:

import pytest

@pytest.fixture
def mock_langfuse_client():
    """Provide mocked LangFuse client."""
    from unittest.mock import MagicMock
    return MagicMock()

@pytest.fixture
async def inventory_tool():
    """Provide inventory tool instance."""
    from chatassistant_retail.tools import InventoryTools
    return InventoryTools()

def test_inventory_query(inventory_tool):
    result = inventory_tool.check_stock("SKU-123")
    assert result["stock_level"] > 0

Async Testing

Tests for async functions use pytest-asyncio:

import pytest

@pytest.mark.asyncio
async def test_async_llm_call():
    from chatassistant_retail.llm import AzureOpenAIClient

    client = AzureOpenAIClient()
    response = await client.chat("Test query")
    assert response is not None

Mocking External Services

Use pytest-mock for mocking Azure services:

def test_azure_search(mocker):
    # Mock Azure Cognitive Search
    mock_search = mocker.patch("azure.search.documents.SearchClient")
    mock_search.return_value.search.return_value = [
        {"sku": "SKU-123", "name": "Test Product"}
    ]

    # Test retriever
    from chatassistant_retail.rag import Retriever
    retriever = Retriever()
    results = retriever.search("test query")
    assert len(results) == 1

CI/CD Testing

GitHub Actions runs tests automatically:

# .github/workflows/test.yml
- Run tests on Python 3.12 and 3.13
- Check code formatting with Ruff
- Verify type hints with ty
- Generate coverage report

View test results in GitHub Actions: https://github.com/samir72/chatassistant_retail/actions

Contributing

Contributions are welcome! Please follow these guidelines:

Getting Started

Fork the repository on GitHub

Clone your fork:

git clone https://github.com/your-username/chatassistant_retail.git
cd chatassistant_retail

Install dependencies:
```
uv sync
```

Create a feature branch:

git checkout -b feature/your-feature-name

Development Process

Make your changes following the code standards below
Add tests for new functionality (maintain >= 90% coverage)

Run QA checks:

just qa  # Format, lint, type-check, and test

Update documentation if needed (README, docstrings, CLAUDE.md)

Commit your changes:

git commit -m "Add feature: description"

Push to your fork:

git push origin feature/your-feature-name

Submit a pull request to the main repository

Code Standards

Style Guide: PEP 8 (enforced by Ruff)
Line Length: 120 characters maximum
Type Hints: Required for all function signatures
Docstrings: Required for all public functions and classes (Google style)
Test Coverage: >= 90% for all new code
Import Sorting: Automatic with Ruff (isort rules)

Example Code Style

from typing import Optional

from chatassistant_retail.observability import trace


@trace(name="example_function", trace_type="function")
async def example_function(param1: str, param2: int = 0) -> Optional[dict]:
    """
    Brief description of function.

    Args:
        param1: Description of param1
        param2: Description of param2 (default: 0)

    Returns:
        Description of return value

    Raises:
        ValueError: When param2 is negative
    """
    if param2 < 0:
        raise ValueError("param2 must be non-negative")

    return {"param1": param1, "param2": param2}

Pull Request Guidelines

Title: Clear, concise description (e.g., "Add purchase order export feature")
Description: Explain what changed and why
Tests: Include test results showing all tests pass
Coverage: Show coverage hasn't decreased
Documentation: Update README/docs if needed
Breaking Changes: Clearly mark any breaking changes

Reporting Bugs

Use GitHub Issues: https://github.com/samir72/chatassistant_retail/issues

Include:

Python version
Environment (OS, dependencies)
Steps to reproduce
Expected vs actual behavior
Error messages/stack traces
Minimal code example

Credits

This package was created with Cookiecutter and the audreyfeldroy/cookiecutter-pypackage project template.

Technologies Used

LangGraph: Agentic workflow orchestration
LangChain: LLM framework and integrations
Azure OpenAI: GPT-4o-mini language model
Azure Cognitive Search: Vector search and RAG
LangFuse: Observability and tracing
Gradio: Web UI framework
FastMCP: Model Context Protocol server
PyTest: Testing framework
Ruff: Python linter and formatter
uv: Fast Python package manager

License

This project is licensed under the MIT License - see the LICENSE file for details.

chatassistant_retail

Table of Contents

Features

Core Capabilities

Technical Features

Retail-Specific Features

Deployment & Development

Architecture

Key Components

Design Principles

Project Structure

Key Directories

Installation

Prerequisites

Installing uv and just

Install from PyPI (when published)

Development Installation

Sample Data

Why Sample Data is Included

Regenerating Sample Data (Optional)

Usage

Setting Up Azure Cognitive Search

Prerequisites

Create the Index

Verify Setup

Running the Chatbot

Start Gradio Web Interface

Command-Line Usage

Example Conversations

Inventory Query

Purchase Order Management

Product Search with RAG

Multi-Modal Query (Image + Text)

Image-Based Product Workflow (NEW)

Development

Quick Start

Development Workflow

Code Quality Tools

Local Development Setup

Data Generation

Generating Sample Data

What Gets Generated

Integration with Azure Search

Customization

Scripts Reference

setup_azure_search.py

generate_sample_data.py

test_gradio_ui.py

test_phase2.py / test_phase3.py

Session Management

Session Store Backends

Memory Store (Default)

Redis Store

PostgreSQL Store

Configuration Examples

Choosing a Backend

Multi-Modal Processing

Overview

Supported Formats

Usage Examples

Text-Only Query

Image + Text Query

Product Image Analysis

Best Practices

Observability

LangFuse Integration

Configuration

Using the @trace Decorator

Manual Tracing (Advanced Usage)

Metrics Dashboard

Metrics Collection

Logging

Monitoring in Production

Deployment Options

Local Development

HuggingFace Spaces

Prerequisites

Configuration

Deployment Steps

Production Deployment