amazon_product / README.md
d-e-e-k-11's picture
Upload folder using huggingface_hub
4416e3b verified
# Retail Product Knowledge Assistant (RAG Model)
This project builds a Retrieval-Augmented Generation (RAG) model using retail product data (Kindle reviews and details). It uses a vector database to store product information and an LLM to answer questions naturally.
## Tech Stack
- **LLM**: Google Gemini (via `langchain-google-genai`)
- **Embeddings**: HuggingFace (`all-MiniLM-L6-v2`)
- **Vector Store**: ChromaDB
- **Framework**: LangChain
## Setup
1. **API Key**: Add your Google Gemini API Key to the `.env` file:
```env
GOOGLE_API_KEY=your_actual_key_here
```
2. **Build Knowledge Base**:
Run the following command to process the data and build the vector database:
```bash
python main.py
```
(It will automatically detect if the database needs to be built).
## Usage
Run `main.py` and ask questions about the products, such as:
- "Which Kindle model has the best resolution?"
- "What do users say about the battery life of the Paperwhite?"
- "Is the Kindle Voyage worth the extra money?"
## Files
- `7817_1.csv`: Raw product data.
- `preprocess.py`: Cleans and formats data into JSON.
- `rag_model.py`: Contains the logic for the RAG pipeline.
- `main.py`: Interactive CLI for user queries.
- `chroma_db/`: Directory where the vector store is persisted.