Anvit25 mandarmgd-03 commited on
Commit
c1fc6b8
·
verified ·
1 Parent(s): edbb0e4

Create methodology.md (#3)

Browse files

- Create methodology.md (9a7878e409dd451821a0d11123eff2d82e4efca7)


Co-authored-by: Mandar Garud <mandarmgd-03@users.noreply.huggingface.co>

Files changed (1) hide show
  1. methodology.md +59 -0
methodology.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Methodology
2
+
3
+ This project follows a standard **RAG (Retrieval-Augmented Generation)** workflow with conversational memory:
4
+
5
+ 1. **Document Ingestion**
6
+ - Load a fixed manual from `temp_docs/samsung_manual.txt` using `TextLoader` with UTF-8 to avoid encoding issues.
7
+ - If the file is missing, initialization fails early with a clear error.
8
+
9
+ 2. **Preprocessing & Chunking**
10
+ - Split the document with `RecursiveCharacterTextSplitter` (`chunk_size=1000`, `chunk_overlap=200`) to balance recall (overlap) and retrieval speed (chunk size).
11
+
12
+ 3. **Embedding**
13
+ - Convert each chunk to a dense vector using `sentence-transformers/all-MiniLM-L6-v2` via `HuggingFaceEmbeddings`.
14
+ - This small, fast model offers a good latency/quality trade-off for semantic search.
15
+
16
+ 4. **Vector Store (Persistence)**
17
+ - Store embeddings in **ChromaDB** (`persist_directory=chroma_db`).
18
+ - On startup:
19
+ - If `chroma_db/` is empty → build the index from the document and persist it.
20
+ - If `chroma_db/` exists → load the persisted index directly (fast startup).
21
+
22
+ 5. **Retriever**
23
+ - Expose the vector store as a retriever with `k=2` to fetch the two most relevant chunks per query.
24
+
25
+ 6. **LLM Generation**
26
+ - Use `google/flan-t5-base` through a Hugging Face `pipeline("text2text-generation")`:
27
+ - `max_length=512`, `temperature=0.1`, `top_p=0.95`, `repetition_penalty=1.2`.
28
+ - The LLM receives the user question plus retrieved context and generates a grounded answer.
29
+
30
+ 7. **Conversational Orchestration**
31
+ - Wrap everything with `ConversationalRetrievalChain` to:
32
+ - Retrieve relevant chunks for each turn.
33
+ - Generate answers conditioned on both **context** and **chat history**.
34
+
35
+ 8. **Memory**
36
+ - Maintain multi-turn context using `ConversationBufferMemory (return_messages=True)`, enabling follow-ups like “and what about the warranty?” without repeating details.
37
+
38
+ 9. **UI Layer (Gradio)**
39
+ - `gr.Blocks()` app with:
40
+ - Status banner showing whether the DB was built or loaded.
41
+ - `gr.Chatbot` for messages and a `Textbox` + `Button` for input.
42
+ - `submit` event calls a wrapper that:
43
+ - Appends the user message to `chat_history`.
44
+ - Invokes the chain and appends the assistant’s answer.
45
+
46
+ 10. **Operational Notes**
47
+ - **Force re-indexing**: delete `chroma_db/` and restart.
48
+ - **Swap documents**: replace `temp_docs/samsung_manual.txt` (keep plain text for best results).
49
+ - **Model changes**: update `MODEL_NAME_EMBEDDINGS` or `MODEL_ID_LLM` in `app.py`.
50
+
51
+ ### Quality & Evaluation (Lightweight)
52
+ - **Grounding check**: ask questions whose answers are known to be in the manual and verify the response cites the right details.
53
+ - **Follow-up coherence**: ask a sequence of related questions to ensure memory works.
54
+ - **Latency tracking**: note first-run time (indexing) vs. warm start (loading persisted DB).
55
+
56
+ ### Limitations
57
+ - Works best with **clean, textual manuals**; PDFs should be converted to text first.
58
+ - `flan-t5-base` is compact; for higher fidelity, upgrade to a stronger model (with GPU if available).
59
+ - Retrieval uses `k=2`; adjust if answers miss context or include irrelevant details.