File size: 3,159 Bytes
c1fc6b8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
## Methodology

This project follows a standard **RAG (Retrieval-Augmented Generation)** workflow with conversational memory:

1. **Document Ingestion**
   - Load a fixed manual from `temp_docs/samsung_manual.txt` using `TextLoader` with UTF-8 to avoid encoding issues.
   - If the file is missing, initialization fails early with a clear error.

2. **Preprocessing & Chunking**
   - Split the document with `RecursiveCharacterTextSplitter` (`chunk_size=1000`, `chunk_overlap=200`) to balance recall (overlap) and retrieval speed (chunk size).

3. **Embedding**
   - Convert each chunk to a dense vector using `sentence-transformers/all-MiniLM-L6-v2` via `HuggingFaceEmbeddings`.
   - This small, fast model offers a good latency/quality trade-off for semantic search.

4. **Vector Store (Persistence)**
   - Store embeddings in **ChromaDB** (`persist_directory=chroma_db`).
   - On startup:
     - If `chroma_db/` is empty → build the index from the document and persist it.
     - If `chroma_db/` exists → load the persisted index directly (fast startup).

5. **Retriever**
   - Expose the vector store as a retriever with `k=2` to fetch the two most relevant chunks per query.

6. **LLM Generation**
   - Use `google/flan-t5-base` through a Hugging Face `pipeline("text2text-generation")`:
     - `max_length=512`, `temperature=0.1`, `top_p=0.95`, `repetition_penalty=1.2`.
   - The LLM receives the user question plus retrieved context and generates a grounded answer.

7. **Conversational Orchestration**
   - Wrap everything with `ConversationalRetrievalChain` to:
     - Retrieve relevant chunks for each turn.
     - Generate answers conditioned on both **context** and **chat history**.

8. **Memory**
   - Maintain multi-turn context using `ConversationBufferMemory (return_messages=True)`, enabling follow-ups like “and what about the warranty?” without repeating details.

9. **UI Layer (Gradio)**
   - `gr.Blocks()` app with:
     - Status banner showing whether the DB was built or loaded.
     - `gr.Chatbot` for messages and a `Textbox` + `Button` for input.
   - `submit` event calls a wrapper that:
     - Appends the user message to `chat_history`.
     - Invokes the chain and appends the assistant’s answer.

10. **Operational Notes**
    - **Force re-indexing**: delete `chroma_db/` and restart.
    - **Swap documents**: replace `temp_docs/samsung_manual.txt` (keep plain text for best results).
    - **Model changes**: update `MODEL_NAME_EMBEDDINGS` or `MODEL_ID_LLM` in `app.py`.

### Quality & Evaluation (Lightweight)
- **Grounding check**: ask questions whose answers are known to be in the manual and verify the response cites the right details.
- **Follow-up coherence**: ask a sequence of related questions to ensure memory works.
- **Latency tracking**: note first-run time (indexing) vs. warm start (loading persisted DB).

### Limitations
- Works best with **clean, textual manuals**; PDFs should be converted to text first.
- `flan-t5-base` is compact; for higher fidelity, upgrade to a stronger model (with GPU if available).
- Retrieval uses `k=2`; adjust if answers miss context or include irrelevant details.