--- title: Amazon Multimodal RAG Assistant emoji: 🛒 colorFrom: yellow colorTo: blue sdk: docker pinned: false license: mit app_port: 7860 --- # Amazon Multimodal RAG Assistant An AI-powered e-commerce search assistant that combines multimodal embeddings (CLIP), vector search (ChromaDB), and large language models to provide intelligent product recommendations and natural language responses. ![Project Status](https://img.shields.io/badge/status-active-success.svg) ![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg) ## Features - **Multimodal Search**: Search products using text, images, or both simultaneously - **Intelligent Retrieval**: CLIP-based embeddings for semantic product matching - **Dual LLM Support**: Choose between OpenAI GPT-4 or local open-source models - **Natural Language Responses**: Context-aware answers powered by advanced LLMs - **Modern Web Interface**: Clean, responsive UI with real-time search - **Vector Database**: Persistent ChromaDB storage for fast retrieval - **Prompt Engineering**: Supports zero-shot, few-shot, and multi-shot prompting - **Chat History**: Multi-turn conversations with context awareness - **Flexible Configuration**: Environment-based setup for easy customization ## Architecture ``` ┌─────────────┐ │ Frontend │ (HTML/JS/TailwindCSS) └──────┬──────┘ │ HTTP/JSON ▼ ┌─────────────┐ │ FastAPI │ (REST API Server) └──────┬──────┘ │ ├─────────────────┐ ▼ ▼ ┌─────────────┐ ┌─────────────┐ │ LLM │ │ RAG │ │ (GPT-4 or │ │ (CLIP + │ │ Local HF) │ │ ChromaDB) │ └─────────────┘ └─────────────┘ ``` ### Components 1. **rag.py**: Retrieval system with CLIP embeddings and ChromaDB 2. **llm.py**: LLM interface with prompt engineering 3. **api_server.py**: FastAPI backend with singleton LLM pattern 4. **frontend/**: Modern web UI with drag-and-drop support 5. **config.py**: Centralized configuration management ## Requirements - Python 3.8+ - CUDA-compatible GPU (optional, but recommended for faster inference) - 8GB+ RAM (16GB+ recommended) - 10GB+ disk space for models and data ## Installation ### 1. Clone the Repository ```bash cd Multimodel ``` ### 2. Create Virtual Environment ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` ### 3. Install Dependencies ```bash pip install -r requirements.txt ``` **Note**: CLIP installation requires git. If you encounter issues: ```bash pip install git+https://github.com/openai/CLIP.git ``` ### 4. Configure Environment Create a `.env` file in the project root (copy from `.env.example`): ```bash cp .env.example .env ``` **For OpenAI GPT-4 (Recommended):** ```bash # .env file USE_OPENAI=true OPENAI_API_KEY=sk-proj-your-api-key-here OPENAI_MODEL=gpt-4o ``` **For Local Models (Free, but requires more compute):** ```bash # .env file USE_OPENAI=false LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3 ``` See [.env.example](.env.example) for all configuration options. ### 5. Prepare Data Place your Amazon product CSV file in the project root: ``` amazon_multimodal_clean.csv ``` Expected CSV columns: - `uniq_id`: Unique product identifier - `product_name`: Product name - `product_text`: Product description - `main_category`: Product category - `image`: Image URLs (pipe-separated) ## Usage ### Step 1: Build Vector Index ```bash python rag.py --build --csv amazon_multimodal_clean.csv --max 1000 ``` Options: - `--csv`: Path to your CSV file - `--max`: Maximum number of products to index (optional, removes limit if omitted) - `--db`: Database directory (default: `chromadb_store`) This will: - Download product images - Generate CLIP embeddings - Build ChromaDB vector index - Save to `chromadb_store/` ### Step 2: Start API Server ```bash python api_server.py ``` The server will start on `http://localhost:8000` **Startup Notes:** - **GPT-4 Mode**: Server starts instantly, first request takes 2-5 seconds (API call) - **Local Model Mode**: First request takes 10-60 seconds as the model loads into memory, subsequent requests are fast (model cached) ### Step 3: Open Web Interface Navigate to: `http://localhost:8000` #### Search Modes: - **Text Only**: Search using natural language queries - **Image Only**: Upload a product image to find similar items - **Multimodal**: Combine text and image for refined search #### Example Queries: - "Wireless earbuds with noise cancellation under $150" - "What is this product and how is it used?" (with image) - "Compare the top two smartwatches you found" ## 🔧 Configuration ### LLM Backend Selection The system supports two LLM backends that can be switched via environment variables: #### Option 1: OpenAI GPT-4 (Recommended) **Advantages:** - Superior response quality - Faster response times (2-5 seconds) - No GPU required - Lower memory footprint **Requirements:** - OpenAI API key - Internet connection - Cost: ~$0.01-0.03 per query **Configuration:** ```bash # .env file USE_OPENAI=true OPENAI_API_KEY=sk-proj-your-api-key-here OPENAI_MODEL=gpt-4o OPENAI_MAX_TOKENS=512 OPENAI_TEMPERATURE=0.2 ``` #### Option 2: Local Open-Source Models **Advantages:** - Free (no API costs) - Complete data privacy - Works offline - Customizable (fine-tuning possible) **Requirements:** - 16GB+ RAM (32GB+ for Mixtral) - GPU recommended (CUDA-compatible) **Supported Models:** - `mistralai/Mistral-7B-Instruct-v0.3` (7B params, recommended) - `meta-llama/Meta-Llama-3-8B-Instruct` (8B params) - `mistralai/Mixtral-8x7B-Instruct-v0.1` (47B params, requires 32GB+ RAM) **Configuration:** ```bash # .env file USE_OPENAI=false LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3 LLM_MAX_TOKENS=512 LLM_TEMPERATURE=0.2 ``` ### Other Configuration Options ```bash # Data paths CSV_PATH=amazon_multimodal_clean.csv CHROMA_DIR=chromadb_store IMAGE_DIR=images # CLIP model CLIP_MODEL=ViT-B/32 # Options: ViT-B/32, ViT-B/16, ViT-L/14 # API server API_HOST=0.0.0.0 API_PORT=8000 ALLOWED_ORIGINS=* # Retrieval settings TOP_K_PRODUCTS=5 MAX_TEXT_LENGTH=400 # Logging LOG_LEVEL=INFO # Options: DEBUG, INFO, WARNING, ERROR, CRITICAL ``` See [.env.example](.env.example) for the complete configuration template. ## Evaluation Evaluate retrieval quality: ```bash python rag.py --eval --csv amazon_multimodal_clean.csv ``` Metrics computed: - Accuracy@1: Top result category match - Recall@1, @5, @10: Category match in top K results ## Testing ### Test Retrieval Only ```bash # Text query python rag.py --text "wireless headphones" --db chromadb_store # Image query python rag.py --image path/to/product.jpg --db chromadb_store ``` ### Test LLM Generation ```bash python llm.py ``` ## Project Structure ``` Multimodel/ ├── rag.py # CLIP + ChromaDB retrieval system ├── llm.py # LLM interface with prompt engineering ├── api_server.py # FastAPI REST API ├── config.py # Configuration management ├── requirements.txt # Python dependencies ├── README.md # This file ├── .gitignore # Git ignore rules ├── frontend/ │ ├── index.html # Web UI │ ├── main.js # Frontend JavaScript │ └── amazon-logo.png # Logo asset ├── chromadb_store/ # Vector database (generated) ├── images/ # Downloaded product images (generated) └── amazon_multimodal_clean.csv # Your dataset ``` ## Troubleshooting ### Issue: "OpenAI API key is required" **Solution**: Ensure you've created a `.env` file and added `python-dotenv` dependency: ```bash # Install dotenv if missing pip install python-dotenv # Create .env file cp .env.example .env # Edit .env and add your API key USE_OPENAI=true OPENAI_API_KEY=sk-proj-your-actual-api-key-here ``` ### Issue: "TypeError: failed to extract enum MetadataValue" **Solution**: This occurs during index building with ChromaDB. Update to the latest version: ```bash pip install --upgrade chromadb ``` The code now handles None values properly by converting them to empty strings. ### Issue: "CUDA out of memory" (Local Models) **Solution**: Use CPU mode or reduce batch size ```bash # Force CPU mode export CUDA_VISIBLE_DEVICES=-1 python api_server.py ``` ### Issue: "Model loading takes too long" (Local Models) **Solution**: This is normal for first request (10-60s). The model is cached in memory for subsequent requests. Consider using GPT-4 for faster response times. ### Issue: "Image download failures" **Solution**: Some product URLs may be invalid or expired. This is normal and logged. The system will use text-only embeddings for those products. ### Issue: Port 8000 already in use **Solution**: Change port via environment variable ```bash export API_PORT=8080 python api_server.py ``` ### Issue: Duplicate products after multiple index builds **Solution**: ChromaDB uses `add()` which doesn't prevent duplicates. To rebuild the index, delete the database directory first: ```bash rm -rf chromadb_store python rag.py --build --csv amazon_multimodal_clean.csv ``` ## Security Notes - **CORS**: Currently set to `allow_origins=["*"]` for development - For production, configure `ALLOWED_ORIGINS` to specific domains - **Error Messages**: Generic errors are returned to clients; detailed logs are server-side only - **File Uploads**: Images are validated and temporarily stored, then cleaned up ## Performance Optimization ### Implemented Optimizations: 1. **LLM Singleton Pattern**: Model loads once at server startup and is reused across requests (5-20x speedup) 2. **CLIP Embedding Caching**: CLIP model stays in memory after first load 3. **ChromaDB HNSW Indexing**: Approximate nearest neighbor search with O(log N) complexity 4. **L2 Normalized Embeddings**: Cosine similarity computed via efficient dot products 5. **Graceful Error Handling**: Image download failures don't block indexing process ### Additional Optimizations for Production: 1. **Use GPU**: CUDA-enabled GPU for 10-50x faster CLIP inference (local models) 2. **Use GPT-4**: Cloud-based LLM eliminates model loading overhead 3. **Batch Processing**: Build index in batches for large datasets 4. **CDN for Images**: Serve product images via CDN 5. **Load Balancer**: Use multiple API instances behind a load balancer 6. **Redis Caching**: Cache frequent queries and embeddings ## Future Enhancements - [ ] Add user authentication - [ ] Implement product filtering (price, brand, etc.) - [ ] Add bookmark/favorites functionality - [ ] Support multilingual queries - [ ] Integrate with real Amazon API - [ ] Add A/B testing for different prompts - [ ] Implement caching layer (Redis) - [ ] Add monitoring and analytics ## Contributing Contributions are welcome! Please: 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/YourFeature`) 3. Commit changes (`git commit -m 'Add YourFeature'`) 4. Push to branch (`git push origin feature/YourFeature`) 5. Open a Pull Request ## License This project is for educational and research purposes. ## Acknowledgments - **OpenAI**: CLIP multimodal embeddings and GPT-4 API - **ChromaDB**: Vector database with HNSW indexing - **HuggingFace**: Transformers library and model hosting - **FastAPI**: Modern web framework - **Mistral AI / Meta**: Open-source LLM models - **Tailwind CSS**: Frontend styling framework --- ## Additional Documentation - **[Research Report](research_report.tex)**: Comprehensive technical report in LaTeX format covering implementation details, challenges, solutions, and future improvements - **[Quick Start Guide for GPT-4](QUICKSTART_GPT4.md)**: Step-by-step guide for setting up with OpenAI GPT-4 --- **Built with ❤️ using CLIP, ChromaDB, GPT-4, and Open-Source LLMs**