## Methodology This project follows a standard **RAG (Retrieval-Augmented Generation)** workflow with conversational memory: 1. **Document Ingestion** - Load a fixed manual from `temp_docs/samsung_manual.txt` using `TextLoader` with UTF-8 to avoid encoding issues. - If the file is missing, initialization fails early with a clear error. 2. **Preprocessing & Chunking** - Split the document with `RecursiveCharacterTextSplitter` (`chunk_size=1000`, `chunk_overlap=200`) to balance recall (overlap) and retrieval speed (chunk size). 3. **Embedding** - Convert each chunk to a dense vector using `sentence-transformers/all-MiniLM-L6-v2` via `HuggingFaceEmbeddings`. - This small, fast model offers a good latency/quality trade-off for semantic search. 4. **Vector Store (Persistence)** - Store embeddings in **ChromaDB** (`persist_directory=chroma_db`). - On startup: - If `chroma_db/` is empty → build the index from the document and persist it. - If `chroma_db/` exists → load the persisted index directly (fast startup). 5. **Retriever** - Expose the vector store as a retriever with `k=2` to fetch the two most relevant chunks per query. 6. **LLM Generation** - Use `google/flan-t5-base` through a Hugging Face `pipeline("text2text-generation")`: - `max_length=512`, `temperature=0.1`, `top_p=0.95`, `repetition_penalty=1.2`. - The LLM receives the user question plus retrieved context and generates a grounded answer. 7. **Conversational Orchestration** - Wrap everything with `ConversationalRetrievalChain` to: - Retrieve relevant chunks for each turn. - Generate answers conditioned on both **context** and **chat history**. 8. **Memory** - Maintain multi-turn context using `ConversationBufferMemory (return_messages=True)`, enabling follow-ups like “and what about the warranty?” without repeating details. 9. **UI Layer (Gradio)** - `gr.Blocks()` app with: - Status banner showing whether the DB was built or loaded. - `gr.Chatbot` for messages and a `Textbox` + `Button` for input. - `submit` event calls a wrapper that: - Appends the user message to `chat_history`. - Invokes the chain and appends the assistant’s answer. 10. **Operational Notes** - **Force re-indexing**: delete `chroma_db/` and restart. - **Swap documents**: replace `temp_docs/samsung_manual.txt` (keep plain text for best results). - **Model changes**: update `MODEL_NAME_EMBEDDINGS` or `MODEL_ID_LLM` in `app.py`. ### Quality & Evaluation (Lightweight) - **Grounding check**: ask questions whose answers are known to be in the manual and verify the response cites the right details. - **Follow-up coherence**: ask a sequence of related questions to ensure memory works. - **Latency tracking**: note first-run time (indexing) vs. warm start (loading persisted DB). ### Limitations - Works best with **clean, textual manuals**; PDFs should be converted to text first. - `flan-t5-base` is compact; for higher fidelity, upgrade to a stronger model (with GPU if available). - Retrieval uses `k=2`; adjust if answers miss context or include irrelevant details.