Spaces:
Runtime error
Runtime error
| ## Methodology | |
| This project follows a standard **RAG (Retrieval-Augmented Generation)** workflow with conversational memory: | |
| 1. **Document Ingestion** | |
| - Load a fixed manual from `temp_docs/samsung_manual.txt` using `TextLoader` with UTF-8 to avoid encoding issues. | |
| - If the file is missing, initialization fails early with a clear error. | |
| 2. **Preprocessing & Chunking** | |
| - Split the document with `RecursiveCharacterTextSplitter` (`chunk_size=1000`, `chunk_overlap=200`) to balance recall (overlap) and retrieval speed (chunk size). | |
| 3. **Embedding** | |
| - Convert each chunk to a dense vector using `sentence-transformers/all-MiniLM-L6-v2` via `HuggingFaceEmbeddings`. | |
| - This small, fast model offers a good latency/quality trade-off for semantic search. | |
| 4. **Vector Store (Persistence)** | |
| - Store embeddings in **ChromaDB** (`persist_directory=chroma_db`). | |
| - On startup: | |
| - If `chroma_db/` is empty → build the index from the document and persist it. | |
| - If `chroma_db/` exists → load the persisted index directly (fast startup). | |
| 5. **Retriever** | |
| - Expose the vector store as a retriever with `k=2` to fetch the two most relevant chunks per query. | |
| 6. **LLM Generation** | |
| - Use `google/flan-t5-base` through a Hugging Face `pipeline("text2text-generation")`: | |
| - `max_length=512`, `temperature=0.1`, `top_p=0.95`, `repetition_penalty=1.2`. | |
| - The LLM receives the user question plus retrieved context and generates a grounded answer. | |
| 7. **Conversational Orchestration** | |
| - Wrap everything with `ConversationalRetrievalChain` to: | |
| - Retrieve relevant chunks for each turn. | |
| - Generate answers conditioned on both **context** and **chat history**. | |
| 8. **Memory** | |
| - Maintain multi-turn context using `ConversationBufferMemory (return_messages=True)`, enabling follow-ups like “and what about the warranty?” without repeating details. | |
| 9. **UI Layer (Gradio)** | |
| - `gr.Blocks()` app with: | |
| - Status banner showing whether the DB was built or loaded. | |
| - `gr.Chatbot` for messages and a `Textbox` + `Button` for input. | |
| - `submit` event calls a wrapper that: | |
| - Appends the user message to `chat_history`. | |
| - Invokes the chain and appends the assistant’s answer. | |
| 10. **Operational Notes** | |
| - **Force re-indexing**: delete `chroma_db/` and restart. | |
| - **Swap documents**: replace `temp_docs/samsung_manual.txt` (keep plain text for best results). | |
| - **Model changes**: update `MODEL_NAME_EMBEDDINGS` or `MODEL_ID_LLM` in `app.py`. | |
| ### Quality & Evaluation (Lightweight) | |
| - **Grounding check**: ask questions whose answers are known to be in the manual and verify the response cites the right details. | |
| - **Follow-up coherence**: ask a sequence of related questions to ensure memory works. | |
| - **Latency tracking**: note first-run time (indexing) vs. warm start (loading persisted DB). | |
| ### Limitations | |
| - Works best with **clean, textual manuals**; PDFs should be converted to text first. | |
| - `flan-t5-base` is compact; for higher fidelity, upgrade to a stronger model (with GPU if available). | |
| - Retrieval uses `k=2`; adjust if answers miss context or include irrelevant details. | |