| # Retail Product Knowledge Assistant (RAG Model) | |
| This project builds a Retrieval-Augmented Generation (RAG) model using retail product data (Kindle reviews and details). It uses a vector database to store product information and an LLM to answer questions naturally. | |
| ## Tech Stack | |
| - **LLM**: Google Gemini (via `langchain-google-genai`) | |
| - **Embeddings**: HuggingFace (`all-MiniLM-L6-v2`) | |
| - **Vector Store**: ChromaDB | |
| - **Framework**: LangChain | |
| ## Setup | |
| 1. **API Key**: Add your Google Gemini API Key to the `.env` file: | |
| ```env | |
| GOOGLE_API_KEY=your_actual_key_here | |
| ``` | |
| 2. **Build Knowledge Base**: | |
| Run the following command to process the data and build the vector database: | |
| ```bash | |
| python main.py | |
| ``` | |
| (It will automatically detect if the database needs to be built). | |
| ## Usage | |
| Run `main.py` and ask questions about the products, such as: | |
| - "Which Kindle model has the best resolution?" | |
| - "What do users say about the battery life of the Paperwhite?" | |
| - "Is the Kindle Voyage worth the extra money?" | |
| ## Files | |
| - `7817_1.csv`: Raw product data. | |
| - `preprocess.py`: Cleans and formats data into JSON. | |
| - `rag_model.py`: Contains the logic for the RAG pipeline. | |
| - `main.py`: Interactive CLI for user queries. | |
| - `chroma_db/`: Directory where the vector store is persisted. | |