title: Amazon Multimodal RAG Assistant
emoji: π
colorFrom: yellow
colorTo: blue
sdk: docker
pinned: false
license: mit
app_port: 7860
Amazon Multimodal RAG Assistant
An AI-powered e-commerce search assistant that combines multimodal embeddings (CLIP), vector search (ChromaDB), and large language models to provide intelligent product recommendations and natural language responses.
Features
- Multimodal Search: Search products using text, images, or both simultaneously
- Intelligent Retrieval: CLIP-based embeddings for semantic product matching
- Dual LLM Support: Choose between OpenAI GPT-4 or local open-source models
- Natural Language Responses: Context-aware answers powered by advanced LLMs
- Modern Web Interface: Clean, responsive UI with real-time search
- Vector Database: Persistent ChromaDB storage for fast retrieval
- Prompt Engineering: Supports zero-shot, few-shot, and multi-shot prompting
- Chat History: Multi-turn conversations with context awareness
- Flexible Configuration: Environment-based setup for easy customization
Architecture
βββββββββββββββ
β Frontend β (HTML/JS/TailwindCSS)
ββββββββ¬βββββββ
β HTTP/JSON
βΌ
βββββββββββββββ
β FastAPI β (REST API Server)
ββββββββ¬βββββββ
β
βββββββββββββββββββ
βΌ βΌ
βββββββββββββββ βββββββββββββββ
β LLM β β RAG β
β (GPT-4 or β β (CLIP + β
β Local HF) β β ChromaDB) β
βββββββββββββββ βββββββββββββββ
Components
- rag.py: Retrieval system with CLIP embeddings and ChromaDB
- llm.py: LLM interface with prompt engineering
- api_server.py: FastAPI backend with singleton LLM pattern
- frontend/: Modern web UI with drag-and-drop support
- config.py: Centralized configuration management
Requirements
- Python 3.8+
- CUDA-compatible GPU (optional, but recommended for faster inference)
- 8GB+ RAM (16GB+ recommended)
- 10GB+ disk space for models and data
Installation
1. Clone the Repository
cd Multimodel
2. Create Virtual Environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
3. Install Dependencies
pip install -r requirements.txt
Note: CLIP installation requires git. If you encounter issues:
pip install git+https://github.com/openai/CLIP.git
4. Configure Environment
Create a .env file in the project root (copy from .env.example):
cp .env.example .env
For OpenAI GPT-4 (Recommended):
# .env file
USE_OPENAI=true
OPENAI_API_KEY=sk-proj-your-api-key-here
OPENAI_MODEL=gpt-4o
For Local Models (Free, but requires more compute):
# .env file
USE_OPENAI=false
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3
See .env.example for all configuration options.
5. Prepare Data
Place your Amazon product CSV file in the project root:
amazon_multimodal_clean.csv
Expected CSV columns:
uniq_id: Unique product identifierproduct_name: Product nameproduct_text: Product descriptionmain_category: Product categoryimage: Image URLs (pipe-separated)
Usage
Step 1: Build Vector Index
python rag.py --build --csv amazon_multimodal_clean.csv --max 1000
Options:
--csv: Path to your CSV file--max: Maximum number of products to index (optional, removes limit if omitted)--db: Database directory (default:chromadb_store)
This will:
- Download product images
- Generate CLIP embeddings
- Build ChromaDB vector index
- Save to
chromadb_store/
Step 2: Start API Server
python api_server.py
The server will start on http://localhost:8000
Startup Notes:
- GPT-4 Mode: Server starts instantly, first request takes 2-5 seconds (API call)
- Local Model Mode: First request takes 10-60 seconds as the model loads into memory, subsequent requests are fast (model cached)
Step 3: Open Web Interface
Navigate to: http://localhost:8000
Search Modes:
- Text Only: Search using natural language queries
- Image Only: Upload a product image to find similar items
- Multimodal: Combine text and image for refined search
Example Queries:
- "Wireless earbuds with noise cancellation under $150"
- "What is this product and how is it used?" (with image)
- "Compare the top two smartwatches you found"
π§ Configuration
LLM Backend Selection
The system supports two LLM backends that can be switched via environment variables:
Option 1: OpenAI GPT-4 (Recommended)
Advantages:
- Superior response quality
- Faster response times (2-5 seconds)
- No GPU required
- Lower memory footprint
Requirements:
- OpenAI API key
- Internet connection
- Cost: ~$0.01-0.03 per query
Configuration:
# .env file
USE_OPENAI=true
OPENAI_API_KEY=sk-proj-your-api-key-here
OPENAI_MODEL=gpt-4o
OPENAI_MAX_TOKENS=512
OPENAI_TEMPERATURE=0.2
Option 2: Local Open-Source Models
Advantages:
- Free (no API costs)
- Complete data privacy
- Works offline
- Customizable (fine-tuning possible)
Requirements:
- 16GB+ RAM (32GB+ for Mixtral)
- GPU recommended (CUDA-compatible)
Supported Models:
mistralai/Mistral-7B-Instruct-v0.3(7B params, recommended)meta-llama/Meta-Llama-3-8B-Instruct(8B params)mistralai/Mixtral-8x7B-Instruct-v0.1(47B params, requires 32GB+ RAM)
Configuration:
# .env file
USE_OPENAI=false
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3
LLM_MAX_TOKENS=512
LLM_TEMPERATURE=0.2
Other Configuration Options
# Data paths
CSV_PATH=amazon_multimodal_clean.csv
CHROMA_DIR=chromadb_store
IMAGE_DIR=images
# CLIP model
CLIP_MODEL=ViT-B/32 # Options: ViT-B/32, ViT-B/16, ViT-L/14
# API server
API_HOST=0.0.0.0
API_PORT=8000
ALLOWED_ORIGINS=*
# Retrieval settings
TOP_K_PRODUCTS=5
MAX_TEXT_LENGTH=400
# Logging
LOG_LEVEL=INFO # Options: DEBUG, INFO, WARNING, ERROR, CRITICAL
See .env.example for the complete configuration template.
Evaluation
Evaluate retrieval quality:
python rag.py --eval --csv amazon_multimodal_clean.csv
Metrics computed:
- Accuracy@1: Top result category match
- Recall@1, @5, @10: Category match in top K results
Testing
Test Retrieval Only
# Text query
python rag.py --text "wireless headphones" --db chromadb_store
# Image query
python rag.py --image path/to/product.jpg --db chromadb_store
Test LLM Generation
python llm.py
Project Structure
Multimodel/
βββ rag.py # CLIP + ChromaDB retrieval system
βββ llm.py # LLM interface with prompt engineering
βββ api_server.py # FastAPI REST API
βββ config.py # Configuration management
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ .gitignore # Git ignore rules
βββ frontend/
β βββ index.html # Web UI
β βββ main.js # Frontend JavaScript
β βββ amazon-logo.png # Logo asset
βββ chromadb_store/ # Vector database (generated)
βββ images/ # Downloaded product images (generated)
βββ amazon_multimodal_clean.csv # Your dataset
Troubleshooting
Issue: "OpenAI API key is required"
Solution: Ensure you've created a .env file and added python-dotenv dependency:
# Install dotenv if missing
pip install python-dotenv
# Create .env file
cp .env.example .env
# Edit .env and add your API key
USE_OPENAI=true
OPENAI_API_KEY=sk-proj-your-actual-api-key-here
Issue: "TypeError: failed to extract enum MetadataValue"
Solution: This occurs during index building with ChromaDB. Update to the latest version:
pip install --upgrade chromadb
The code now handles None values properly by converting them to empty strings.
Issue: "CUDA out of memory" (Local Models)
Solution: Use CPU mode or reduce batch size
# Force CPU mode
export CUDA_VISIBLE_DEVICES=-1
python api_server.py
Issue: "Model loading takes too long" (Local Models)
Solution: This is normal for first request (10-60s). The model is cached in memory for subsequent requests. Consider using GPT-4 for faster response times.
Issue: "Image download failures"
Solution: Some product URLs may be invalid or expired. This is normal and logged. The system will use text-only embeddings for those products.
Issue: Port 8000 already in use
Solution: Change port via environment variable
export API_PORT=8080
python api_server.py
Issue: Duplicate products after multiple index builds
Solution: ChromaDB uses add() which doesn't prevent duplicates. To rebuild the index, delete the database directory first:
rm -rf chromadb_store
python rag.py --build --csv amazon_multimodal_clean.csv
Security Notes
- CORS: Currently set to
allow_origins=["*"]for development- For production, configure
ALLOWED_ORIGINSto specific domains
- For production, configure
- Error Messages: Generic errors are returned to clients; detailed logs are server-side only
- File Uploads: Images are validated and temporarily stored, then cleaned up
Performance Optimization
Implemented Optimizations:
- LLM Singleton Pattern: Model loads once at server startup and is reused across requests (5-20x speedup)
- CLIP Embedding Caching: CLIP model stays in memory after first load
- ChromaDB HNSW Indexing: Approximate nearest neighbor search with O(log N) complexity
- L2 Normalized Embeddings: Cosine similarity computed via efficient dot products
- Graceful Error Handling: Image download failures don't block indexing process
Additional Optimizations for Production:
- Use GPU: CUDA-enabled GPU for 10-50x faster CLIP inference (local models)
- Use GPT-4: Cloud-based LLM eliminates model loading overhead
- Batch Processing: Build index in batches for large datasets
- CDN for Images: Serve product images via CDN
- Load Balancer: Use multiple API instances behind a load balancer
- Redis Caching: Cache frequent queries and embeddings
Future Enhancements
- Add user authentication
- Implement product filtering (price, brand, etc.)
- Add bookmark/favorites functionality
- Support multilingual queries
- Integrate with real Amazon API
- Add A/B testing for different prompts
- Implement caching layer (Redis)
- Add monitoring and analytics
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/YourFeature) - Commit changes (
git commit -m 'Add YourFeature') - Push to branch (
git push origin feature/YourFeature) - Open a Pull Request
License
This project is for educational and research purposes.
Acknowledgments
- OpenAI: CLIP multimodal embeddings and GPT-4 API
- ChromaDB: Vector database with HNSW indexing
- HuggingFace: Transformers library and model hosting
- FastAPI: Modern web framework
- Mistral AI / Meta: Open-source LLM models
- Tailwind CSS: Frontend styling framework
Additional Documentation
- Research Report: Comprehensive technical report in LaTeX format covering implementation details, challenges, solutions, and future improvements
- Quick Start Guide for GPT-4: Step-by-step guide for setting up with OpenAI GPT-4
Built with β€οΈ using CLIP, ChromaDB, GPT-4, and Open-Source LLMs