|
|
--- |
|
|
title: Amazon Multimodal RAG Assistant |
|
|
emoji: π |
|
|
colorFrom: yellow |
|
|
colorTo: blue |
|
|
sdk: docker |
|
|
pinned: false |
|
|
license: mit |
|
|
app_port: 7860 |
|
|
--- |
|
|
|
|
|
# Amazon Multimodal RAG Assistant |
|
|
|
|
|
An AI-powered e-commerce search assistant that combines multimodal embeddings (CLIP), vector search (ChromaDB), and large language models to provide intelligent product recommendations and natural language responses. |
|
|
|
|
|
 |
|
|
 |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Multimodal Search**: Search products using text, images, or both simultaneously |
|
|
- **Intelligent Retrieval**: CLIP-based embeddings for semantic product matching |
|
|
- **Dual LLM Support**: Choose between OpenAI GPT-4 or local open-source models |
|
|
- **Natural Language Responses**: Context-aware answers powered by advanced LLMs |
|
|
- **Modern Web Interface**: Clean, responsive UI with real-time search |
|
|
- **Vector Database**: Persistent ChromaDB storage for fast retrieval |
|
|
- **Prompt Engineering**: Supports zero-shot, few-shot, and multi-shot prompting |
|
|
- **Chat History**: Multi-turn conversations with context awareness |
|
|
- **Flexible Configuration**: Environment-based setup for easy customization |
|
|
|
|
|
## Architecture |
|
|
|
|
|
``` |
|
|
βββββββββββββββ |
|
|
β Frontend β (HTML/JS/TailwindCSS) |
|
|
ββββββββ¬βββββββ |
|
|
β HTTP/JSON |
|
|
βΌ |
|
|
βββββββββββββββ |
|
|
β FastAPI β (REST API Server) |
|
|
ββββββββ¬βββββββ |
|
|
β |
|
|
βββββββββββββββββββ |
|
|
βΌ βΌ |
|
|
βββββββββββββββ βββββββββββββββ |
|
|
β LLM β β RAG β |
|
|
β (GPT-4 or β β (CLIP + β |
|
|
β Local HF) β β ChromaDB) β |
|
|
βββββββββββββββ βββββββββββββββ |
|
|
``` |
|
|
|
|
|
### Components |
|
|
|
|
|
1. **rag.py**: Retrieval system with CLIP embeddings and ChromaDB |
|
|
2. **llm.py**: LLM interface with prompt engineering |
|
|
3. **api_server.py**: FastAPI backend with singleton LLM pattern |
|
|
4. **frontend/**: Modern web UI with drag-and-drop support |
|
|
5. **config.py**: Centralized configuration management |
|
|
|
|
|
## Requirements |
|
|
|
|
|
- Python 3.8+ |
|
|
- CUDA-compatible GPU (optional, but recommended for faster inference) |
|
|
- 8GB+ RAM (16GB+ recommended) |
|
|
- 10GB+ disk space for models and data |
|
|
|
|
|
## Installation |
|
|
|
|
|
### 1. Clone the Repository |
|
|
|
|
|
```bash |
|
|
cd Multimodel |
|
|
``` |
|
|
|
|
|
### 2. Create Virtual Environment |
|
|
|
|
|
```bash |
|
|
python -m venv venv |
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
|
``` |
|
|
|
|
|
### 3. Install Dependencies |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
**Note**: CLIP installation requires git. If you encounter issues: |
|
|
|
|
|
```bash |
|
|
pip install git+https://github.com/openai/CLIP.git |
|
|
``` |
|
|
|
|
|
### 4. Configure Environment |
|
|
|
|
|
Create a `.env` file in the project root (copy from `.env.example`): |
|
|
|
|
|
```bash |
|
|
cp .env.example .env |
|
|
``` |
|
|
|
|
|
**For OpenAI GPT-4 (Recommended):** |
|
|
```bash |
|
|
# .env file |
|
|
USE_OPENAI=true |
|
|
OPENAI_API_KEY=sk-proj-your-api-key-here |
|
|
OPENAI_MODEL=gpt-4o |
|
|
``` |
|
|
|
|
|
**For Local Models (Free, but requires more compute):** |
|
|
```bash |
|
|
# .env file |
|
|
USE_OPENAI=false |
|
|
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3 |
|
|
``` |
|
|
|
|
|
See [.env.example](.env.example) for all configuration options. |
|
|
|
|
|
### 5. Prepare Data |
|
|
|
|
|
Place your Amazon product CSV file in the project root: |
|
|
|
|
|
``` |
|
|
amazon_multimodal_clean.csv |
|
|
``` |
|
|
|
|
|
Expected CSV columns: |
|
|
- `uniq_id`: Unique product identifier |
|
|
- `product_name`: Product name |
|
|
- `product_text`: Product description |
|
|
- `main_category`: Product category |
|
|
- `image`: Image URLs (pipe-separated) |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Step 1: Build Vector Index |
|
|
|
|
|
```bash |
|
|
python rag.py --build --csv amazon_multimodal_clean.csv --max 1000 |
|
|
``` |
|
|
|
|
|
Options: |
|
|
- `--csv`: Path to your CSV file |
|
|
- `--max`: Maximum number of products to index (optional, removes limit if omitted) |
|
|
- `--db`: Database directory (default: `chromadb_store`) |
|
|
|
|
|
This will: |
|
|
- Download product images |
|
|
- Generate CLIP embeddings |
|
|
- Build ChromaDB vector index |
|
|
- Save to `chromadb_store/` |
|
|
|
|
|
### Step 2: Start API Server |
|
|
|
|
|
```bash |
|
|
python api_server.py |
|
|
``` |
|
|
|
|
|
The server will start on `http://localhost:8000` |
|
|
|
|
|
**Startup Notes:** |
|
|
- **GPT-4 Mode**: Server starts instantly, first request takes 2-5 seconds (API call) |
|
|
- **Local Model Mode**: First request takes 10-60 seconds as the model loads into memory, subsequent requests are fast (model cached) |
|
|
|
|
|
### Step 3: Open Web Interface |
|
|
|
|
|
Navigate to: `http://localhost:8000` |
|
|
|
|
|
#### Search Modes: |
|
|
- **Text Only**: Search using natural language queries |
|
|
- **Image Only**: Upload a product image to find similar items |
|
|
- **Multimodal**: Combine text and image for refined search |
|
|
|
|
|
#### Example Queries: |
|
|
- "Wireless earbuds with noise cancellation under $150" |
|
|
- "What is this product and how is it used?" (with image) |
|
|
- "Compare the top two smartwatches you found" |
|
|
|
|
|
## π§ Configuration |
|
|
|
|
|
### LLM Backend Selection |
|
|
|
|
|
The system supports two LLM backends that can be switched via environment variables: |
|
|
|
|
|
#### Option 1: OpenAI GPT-4 (Recommended) |
|
|
|
|
|
**Advantages:** |
|
|
- Superior response quality |
|
|
- Faster response times (2-5 seconds) |
|
|
- No GPU required |
|
|
- Lower memory footprint |
|
|
|
|
|
**Requirements:** |
|
|
- OpenAI API key |
|
|
- Internet connection |
|
|
- Cost: ~$0.01-0.03 per query |
|
|
|
|
|
**Configuration:** |
|
|
```bash |
|
|
# .env file |
|
|
USE_OPENAI=true |
|
|
OPENAI_API_KEY=sk-proj-your-api-key-here |
|
|
OPENAI_MODEL=gpt-4o |
|
|
OPENAI_MAX_TOKENS=512 |
|
|
OPENAI_TEMPERATURE=0.2 |
|
|
``` |
|
|
|
|
|
#### Option 2: Local Open-Source Models |
|
|
|
|
|
**Advantages:** |
|
|
- Free (no API costs) |
|
|
- Complete data privacy |
|
|
- Works offline |
|
|
- Customizable (fine-tuning possible) |
|
|
|
|
|
**Requirements:** |
|
|
- 16GB+ RAM (32GB+ for Mixtral) |
|
|
- GPU recommended (CUDA-compatible) |
|
|
|
|
|
**Supported Models:** |
|
|
- `mistralai/Mistral-7B-Instruct-v0.3` (7B params, recommended) |
|
|
- `meta-llama/Meta-Llama-3-8B-Instruct` (8B params) |
|
|
- `mistralai/Mixtral-8x7B-Instruct-v0.1` (47B params, requires 32GB+ RAM) |
|
|
|
|
|
**Configuration:** |
|
|
```bash |
|
|
# .env file |
|
|
USE_OPENAI=false |
|
|
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.3 |
|
|
LLM_MAX_TOKENS=512 |
|
|
LLM_TEMPERATURE=0.2 |
|
|
``` |
|
|
|
|
|
### Other Configuration Options |
|
|
|
|
|
```bash |
|
|
# Data paths |
|
|
CSV_PATH=amazon_multimodal_clean.csv |
|
|
CHROMA_DIR=chromadb_store |
|
|
IMAGE_DIR=images |
|
|
|
|
|
# CLIP model |
|
|
CLIP_MODEL=ViT-B/32 # Options: ViT-B/32, ViT-B/16, ViT-L/14 |
|
|
|
|
|
# API server |
|
|
API_HOST=0.0.0.0 |
|
|
API_PORT=8000 |
|
|
ALLOWED_ORIGINS=* |
|
|
|
|
|
# Retrieval settings |
|
|
TOP_K_PRODUCTS=5 |
|
|
MAX_TEXT_LENGTH=400 |
|
|
|
|
|
# Logging |
|
|
LOG_LEVEL=INFO # Options: DEBUG, INFO, WARNING, ERROR, CRITICAL |
|
|
``` |
|
|
|
|
|
See [.env.example](.env.example) for the complete configuration template. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Evaluate retrieval quality: |
|
|
|
|
|
```bash |
|
|
python rag.py --eval --csv amazon_multimodal_clean.csv |
|
|
``` |
|
|
|
|
|
Metrics computed: |
|
|
- Accuracy@1: Top result category match |
|
|
- Recall@1, @5, @10: Category match in top K results |
|
|
|
|
|
## Testing |
|
|
|
|
|
### Test Retrieval Only |
|
|
|
|
|
```bash |
|
|
# Text query |
|
|
python rag.py --text "wireless headphones" --db chromadb_store |
|
|
|
|
|
# Image query |
|
|
python rag.py --image path/to/product.jpg --db chromadb_store |
|
|
``` |
|
|
|
|
|
### Test LLM Generation |
|
|
|
|
|
```bash |
|
|
python llm.py |
|
|
``` |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
Multimodel/ |
|
|
βββ rag.py # CLIP + ChromaDB retrieval system |
|
|
βββ llm.py # LLM interface with prompt engineering |
|
|
βββ api_server.py # FastAPI REST API |
|
|
βββ config.py # Configuration management |
|
|
βββ requirements.txt # Python dependencies |
|
|
βββ README.md # This file |
|
|
βββ .gitignore # Git ignore rules |
|
|
βββ frontend/ |
|
|
β βββ index.html # Web UI |
|
|
β βββ main.js # Frontend JavaScript |
|
|
β βββ amazon-logo.png # Logo asset |
|
|
βββ chromadb_store/ # Vector database (generated) |
|
|
βββ images/ # Downloaded product images (generated) |
|
|
βββ amazon_multimodal_clean.csv # Your dataset |
|
|
``` |
|
|
|
|
|
## Troubleshooting |
|
|
|
|
|
### Issue: "OpenAI API key is required" |
|
|
|
|
|
**Solution**: Ensure you've created a `.env` file and added `python-dotenv` dependency: |
|
|
```bash |
|
|
# Install dotenv if missing |
|
|
pip install python-dotenv |
|
|
|
|
|
# Create .env file |
|
|
cp .env.example .env |
|
|
|
|
|
# Edit .env and add your API key |
|
|
USE_OPENAI=true |
|
|
OPENAI_API_KEY=sk-proj-your-actual-api-key-here |
|
|
``` |
|
|
|
|
|
### Issue: "TypeError: failed to extract enum MetadataValue" |
|
|
|
|
|
**Solution**: This occurs during index building with ChromaDB. Update to the latest version: |
|
|
```bash |
|
|
pip install --upgrade chromadb |
|
|
``` |
|
|
|
|
|
The code now handles None values properly by converting them to empty strings. |
|
|
|
|
|
### Issue: "CUDA out of memory" (Local Models) |
|
|
|
|
|
**Solution**: Use CPU mode or reduce batch size |
|
|
```bash |
|
|
# Force CPU mode |
|
|
export CUDA_VISIBLE_DEVICES=-1 |
|
|
python api_server.py |
|
|
``` |
|
|
|
|
|
### Issue: "Model loading takes too long" (Local Models) |
|
|
|
|
|
**Solution**: This is normal for first request (10-60s). The model is cached in memory for subsequent requests. Consider using GPT-4 for faster response times. |
|
|
|
|
|
### Issue: "Image download failures" |
|
|
|
|
|
**Solution**: Some product URLs may be invalid or expired. This is normal and logged. The system will use text-only embeddings for those products. |
|
|
|
|
|
### Issue: Port 8000 already in use |
|
|
|
|
|
**Solution**: Change port via environment variable |
|
|
```bash |
|
|
export API_PORT=8080 |
|
|
python api_server.py |
|
|
``` |
|
|
|
|
|
### Issue: Duplicate products after multiple index builds |
|
|
|
|
|
**Solution**: ChromaDB uses `add()` which doesn't prevent duplicates. To rebuild the index, delete the database directory first: |
|
|
```bash |
|
|
rm -rf chromadb_store |
|
|
python rag.py --build --csv amazon_multimodal_clean.csv |
|
|
``` |
|
|
|
|
|
## Security Notes |
|
|
|
|
|
- **CORS**: Currently set to `allow_origins=["*"]` for development |
|
|
- For production, configure `ALLOWED_ORIGINS` to specific domains |
|
|
- **Error Messages**: Generic errors are returned to clients; detailed logs are server-side only |
|
|
- **File Uploads**: Images are validated and temporarily stored, then cleaned up |
|
|
|
|
|
## Performance Optimization |
|
|
|
|
|
### Implemented Optimizations: |
|
|
|
|
|
1. **LLM Singleton Pattern**: Model loads once at server startup and is reused across requests (5-20x speedup) |
|
|
2. **CLIP Embedding Caching**: CLIP model stays in memory after first load |
|
|
3. **ChromaDB HNSW Indexing**: Approximate nearest neighbor search with O(log N) complexity |
|
|
4. **L2 Normalized Embeddings**: Cosine similarity computed via efficient dot products |
|
|
5. **Graceful Error Handling**: Image download failures don't block indexing process |
|
|
|
|
|
### Additional Optimizations for Production: |
|
|
|
|
|
1. **Use GPU**: CUDA-enabled GPU for 10-50x faster CLIP inference (local models) |
|
|
2. **Use GPT-4**: Cloud-based LLM eliminates model loading overhead |
|
|
3. **Batch Processing**: Build index in batches for large datasets |
|
|
4. **CDN for Images**: Serve product images via CDN |
|
|
5. **Load Balancer**: Use multiple API instances behind a load balancer |
|
|
6. **Redis Caching**: Cache frequent queries and embeddings |
|
|
|
|
|
## Future Enhancements |
|
|
|
|
|
- [ ] Add user authentication |
|
|
- [ ] Implement product filtering (price, brand, etc.) |
|
|
- [ ] Add bookmark/favorites functionality |
|
|
- [ ] Support multilingual queries |
|
|
- [ ] Integrate with real Amazon API |
|
|
- [ ] Add A/B testing for different prompts |
|
|
- [ ] Implement caching layer (Redis) |
|
|
- [ ] Add monitoring and analytics |
|
|
|
|
|
## Contributing |
|
|
|
|
|
Contributions are welcome! Please: |
|
|
|
|
|
1. Fork the repository |
|
|
2. Create a feature branch (`git checkout -b feature/YourFeature`) |
|
|
3. Commit changes (`git commit -m 'Add YourFeature'`) |
|
|
4. Push to branch (`git push origin feature/YourFeature`) |
|
|
5. Open a Pull Request |
|
|
|
|
|
## License |
|
|
|
|
|
This project is for educational and research purposes. |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- **OpenAI**: CLIP multimodal embeddings and GPT-4 API |
|
|
- **ChromaDB**: Vector database with HNSW indexing |
|
|
- **HuggingFace**: Transformers library and model hosting |
|
|
- **FastAPI**: Modern web framework |
|
|
- **Mistral AI / Meta**: Open-source LLM models |
|
|
- **Tailwind CSS**: Frontend styling framework |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Additional Documentation |
|
|
|
|
|
- **[Research Report](research_report.tex)**: Comprehensive technical report in LaTeX format covering implementation details, challenges, solutions, and future improvements |
|
|
- **[Quick Start Guide for GPT-4](QUICKSTART_GPT4.md)**: Step-by-step guide for setting up with OpenAI GPT-4 |
|
|
|
|
|
--- |
|
|
|
|
|
**Built with β€οΈ using CLIP, ChromaDB, GPT-4, and Open-Source LLMs** |
|
|
|